Towards a Continuous Evaluation of Calibration

For safety-critical systems involving AI components (such as in planes, cars, or healthcare), safety and associated certification tasks are one of the main challenges, which can become costly and difficult to address.

One key aspect is to ensure that the decisions a machine-learning classifier makes are properly calibrated.

Continue reading

Privacy Policy Classification with XLNet

The popularisation of privacy policies has become an attractive subject of research in recent years, notably after the General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is through assigning understandable categories to every paragraph or segment in said documents. The current state of the art in privacy policy analysis has established a baseline in multi-label classification on the dataset containing 115 privacy policies, using BERT Transformers. In this paper, we propose a new classification model based on the XLNet. Trained on the same dataset, our model improves the baseline F1 macro and micro averages by 1-3% for both majority vote and union-based gold standards. Moreover, the results reported by our XLNet-based model have been achieved without fine-tuning on domain-specific data, which reduces the training time and complexity, compared to the BERT-based model. To make our method reproducible, we report our hyper-parameters and provide access to all used resources, including code. This work may, therefore, be considered as a first step to establishing a new baseline for privacy policy classification.

Majd Mustapha, Katsiaryna Krasnashchok, Anas Al Bassit and Sabri Skhiri, Privacy Policy Classification with XLNet, Proc. of the 15th DPM International Workshop on Data Privacy Management, Surrey, UK, 2020.

Click here to access the paper in its preprint form.

Towards Privacy Policy Conceptual Modeling

After GDPR enforcement in May 2018, the problem of implementing privacy by design and staying compliant with regulations has been more prominent than ever for businesses of all sizes, which is evident from frequent cases against companies and significant fines paid due to non-compliance. Consequently, numerous research works have been emerging in this area. Yet, to this moment, no publicly available model can offer a comprehensive representation of privacy policies written in natural language, that is machine-readable, interoperable and suitable for automatic compliance checking. Meanwhile, privacy policies stay one of the main means of communication between a business (Data Controller) and a Data Subject, when it comes to the use of personal data. In this paper, we propose a conceptual model for fine-grained representation of privacy policies. We reuse and adapt existing Semantic Web resources in the spirit of interoperability. We represent our model as an ODRL profile and demonstrate how existing privacy policies can be translated into ODRL-like policies, consisting of deontic rules. We enrich our model with vocabularies for describing personal data processing in great detail, making it suitable for further usage in downstream applications, such as access control tools, to support adoption and implementation of privacy by design. We also demonstrate our model’s capability of handling personal data processing rules in other types of documents, namely data processing agreements, essential for controlling data privacy in a relationship between a Controller and a Processor.

The paper is available online on Springer. Currently, it is unfortunately freely available only to subscribers, but do not hesitate to reach out to us for more information!

Krasnashchok K., Mustapha M., Al Bassit A., Skhiri S. Towards Privacy Policy Conceptual Modeling. In Dobbie G., Frank U., Kappel G., Liddle S.W., Mayr H.C. (eds), Proc. of the 39th International Conference on Conceptual Modeling, LNCS 12400, 2020. Springer, Cham.


Applying Machine Learning Modeling to Enhance Runway Throughput at A Big European Airport

One of the factors limiting busiest airport’s runway throughput capacity is the spacing to be applied between landing aircraft in order to ensure that the runway is vacated when the follower aircraft reaches the runway threshold. Today, because the Controller is not able to always anticipate the runway occupancy time (ROT) of the leader aircraft, significant spacing buffers are added to the minimum required spacing in order to cover all possible cases, which negatively affects the resulting arrival throughput. The present paper shows how a Machine Learning (ML) analysis can support the development of accurate, yet operational, models for ROT prediction depending on all impact parameters. Based on Gradient Boosting Regressors, those ML models make use of flight plan information (such as aircraft type, airline, flight data) and weather information to model the ROT. This paper shows how it can be used operationally to increase runway capacity while maintaining or reducing the risk of delivery of separations below runway occupancy time. The methodology and related benefits are assessed using three years of field measurements gathered at Zurich airport.

You can find the slide here and the paper here.

Guillaume Stempfel, Victor Brossard, Ivan De Visscher, Antoine Bonnefoy, Mohamed Ellejmi,  Vincent Treve ̧ Applying Machine Learning Modeling to Enhance Runway Throughput at A Big European Airport, Proc. of the 10th EASN International Conference on “Innovation in Aviation & Space to the Satisfaction of the European Citizens, Naples, Italy, 2020.

Pruning Random Forest with Orthogonal Matching Trees

In this paper we propose a new method to reduce the size of Breiman’s Random Forests. Given a RandomForest and a target size, our algorithm builds a linear combination of trees which minimizes the training error. Selected trees, as well as weights of the linear combination are obtained by means of the Orthogonal Matching Pursuit algorithm. We test our method on many public benchmark datasets both on regression and binary classification, and we compare it to other pruning techniques. Experiments show that our technique performs significantly better or equally good on many datasets1. We also discuss the benefit and short-coming of learning weights for the pruned forest which lead us to propose to use a non-negative constraint on the OMP weights for better empirical results.

Luc Giffon, Charly Lamothe, Léo Bouscarrat, Paolo Milanesi, Farah Cherfaoui, and Sokol Ko, Pruning Random Forest with Orthogonal Matching Trees, Proc. of CAP 2020.

Click here to access the paper.

Multilingual Enrichment of Disease Biomedical Ontologies

Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both, plus Arabic, Chinese and Russian for the second. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation.

Léo Bouscarrat, Antoine Bonnefoy, Cécile Capponi, Carlos Ramisch, Multilingual Enrichment of Disease Biomedical Ontologies, Proc. of MultilingualBIO 2020.

Click here to access the paper.

TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes

Graphs are a fundamental structure that provides an intuitive abstraction for modelling and analyzing complex and highly interconnected data. Given the potential complexity of such data, some approaches proposed extending decision-support systems with multidimensional analysis capabilities over graphs. In this paper, we introduce TopoGraph, an end-to-end framework for building and analyzing graph cubes. TopoGraph extends the existing graph cube models by defining new types of dimensions and measures and organizing them within a multidimensional space that guarantees multidimensional integrity constraints. This results in defining three new types of graph cubes: property graph cubes, topological graph cubes, and graph-structured cubes. Afterwards, we define the algebraic OLAP operations for such novel cubes. We implement and experimentally validate TopoGraph with different types of real-world datasets.


The paper will be published soon in Information Systems Frontiers, and is already available online on Springer. Currently, it is unfortunately available only to subscribers, but do not hesitate to reach out to us for more information!


Amine Ghrab, Oscar Romero, Sabri Skhiri, Esteban Zimányi, TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes, published in Information Systems Frontiers (2020).