Skip to content

Privacy Policy Classification with XLNet

The popularisation of privacy policies has become an attractive subject of research in recent years, notably after the General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is through assigning understandable categories to every paragraph or segment in said documents. The current state of the art in privacy policy analysis has established a baseline in multi-label classification on the dataset containing 115 privacy policies, using BERT Transformers. In this paper, we propose a new classification model based on the XLNet. Trained on the same dataset, our model improves the baseline F1 macro and micro averages by 1-3% for both majority vote and union-based gold standards. Moreover, the results reported by our XLNet-based model have been achieved without fine-tuning on domain-specific data, which reduces the training time and complexity, compared to the BERT-based model. To make our method reproducible, we report our hyper-parameters and provide access to all used resources, including code. This work may, therefore, be considered as a first step to establishing a new baseline for privacy policy classification.

Majd Mustapha, Katsiaryna Krasnashchok, Anas Al Bassit and Sabri Skhiri, Privacy Policy Classification with XLNet, Proc. of the 15th DPM International Workshop on Data Privacy Management, Surrey, UK, 2020.

Click here to access the paper in its preprint form.

Share on linkedin
Share on twitter
Share on email

Releated Posts

INTERNSHIPS 2022

This document presents internships supervised by our software engineering department or by our research & development department. Each project is an opportunity to feel both empowered and responsible for your own professional development and for your contribution to the company.
Read More

Multimodal Classifier For Space Target Recognition

We propose a multi-modal framework to tackle the SPARK Challenge by classifying satellites using RGB and depth images. Our framework is mainly based on Auto-Encoders to embed the two modalities in a common latent space in order to exploit redundant and complementary information between the two types of data.
Read More