Skip to content

Privacy Policy Classification with XLNet

The popularisation of privacy policies has become an attractive subject of research in recent years, notably after the General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is through assigning understandable categories to every paragraph or segment in said documents. The current state of the art in privacy policy analysis has established a baseline in multi-label classification on the dataset containing 115 privacy policies, using BERT Transformers. In this paper, we propose a new classification model based on the XLNet. Trained on the same dataset, our model improves the baseline F1 macro and micro averages by 1-3% for both majority vote and union-based gold standards. Moreover, the results reported by our XLNet-based model have been achieved without fine-tuning on domain-specific data, which reduces the training time and complexity, compared to the BERT-based model. To make our method reproducible, we report our hyper-parameters and provide access to all used resources, including code. This work may, therefore, be considered as a first step to establishing a new baseline for privacy policy classification.

Majd Mustapha, Katsiaryna Krasnashchok, Anas Al Bassit and Sabri Skhiri, Privacy Policy Classification with XLNet, Proc. of the 15th DPM International Workshop on Data Privacy Management, Surrey, UK, 2020.

Click here to access the paper in its preprint form.

Releated Posts

Augment to Interpret: Unsupervised and Inherently Interpretable Graph Embeddings

In this paper, we study graph representation learning and show that data augmentation that preserves semantics can be learned and used to produce interpretations. Our framework, which we named INGENIOUS, creates inherently interpretable embeddings and eliminates the need for costly additional post-hoc analysis.
Read More

SANGEA: Scalable and Attributed Network Generation

In this paper, we present SANGEA, a sizeable synthetic graph generation framework that extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph.
Read More