Euranova has 3 fundamental pillars: explore, craft and serve. The explore pillar of Euranova is an independent research centre dedicated to data science, software engineering and AI.
Through the exploration of tomorrow’s engineering and data science to answer today’s problems, our research centre is dedicated to anticipating the challenges that European businesses face. We find solutions to current and future digital challenges with passion, creativity and integrity.

Privacy Policy Classification with XLNet

24.08.2020

The popularisation of privacy policies has become an attractive subject of research in recent years, notably after the General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is…

Towards Privacy Policy Conceptual Modeling

21.08.2020

After GDPR enforcement in May 2018, the problem of implementing privacy by design and staying compliant with regulations has been more prominent than ever for businesses of all sizes, which is evident from frequent cases against companies and significant fines paid due to non-compliance. Consequently, numerous research works have been emerging in this area….

Applying Machine Learning Modeling to Enhance Runway Throughput at A Big European Airport

10.08.2020

One of the factors limiting busiest airport’s runway throughput capacity is the spacing to be applied between landing aircraft in order to ensure that the runway is vacated when the follower aircraft reaches the runway threshold. Today, because the Controller is not able to always anticipate the runway occupancy time (ROT) of the leader aircraft, significant spacing buffers are added to the minimum required spacing in order to cover all possible cases, which negatively affects the resulting arrival throughput. The present paper shows how a Machine Learning (ML) analysis can support the development of accurate, yet operational, models for ROT prediction depending on all impact parameters. Based on Gradient Boosting Regressors, those ML models make use of flight plan information (such as aircraft type, airline, flight data) and weather information to model the ROT. This paper shows how it can be used operationally to increase runway capacity while maintaining or reducing the risk of delivery of separations below runway occupancy time. The methodology and related benefits are assessed using three years of field measurements gathered at Zurich airport. You can find the slide here and the paper here. Guillaume Stempfel, Victor Brossard, Ivan De Visscher, Antoine Bonnefoy, Mohamed Ellejmi, Vincent Treve ̧ Applying Machine Learning Modeling to Enhance Runway Throughput at A Big European Airport, Proc. of the 10th EASN International Conference on “Innovation in Aviation & Space to the Satisfaction of the European Citizens, Naples, Italy, 2020.

Pruning Random Forest with Orthogonal Matching Trees

24.07.2020

In this paper we propose a new method to reduce the size of Breiman’s Random Forests. Given a RandomForest and a target size, our algorithm builds a linear combination of trees which minimizes the training error. Selected trees, as well as weights of the linear combination are obtained by means of the Orthogonal Matching Pursuit algorithm. We test our method on many public benchmark datasets both on regression and binary classification, and we compare it to other pruning techniques. Experiments show that our technique performs significantly better or equally good on many datasets1. We also discuss the benefit and short-coming of learning weights for the pruned forest which lead us to propose to use a non-negative constraint on the OMP weights for better empirical results. Luc Giffon, Charly Lamothe, Léo Bouscarrat, Paolo Milanesi, Farah Cherfaoui, and Sokol Ko, Pruning Random Forest with Orthogonal Matching Trees, Proc. of CAP 2020. Click here to access the paper.

Multilingual Enrichment of Disease Biomedical Ontologies

24.07.2020

Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both, plus Arabic, Chinese and Russian for the second. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation. Léo Bouscarrat, Antoine Bonnefoy, Cécile Capponi, Carlos Ramisch, Multilingual Enrichment of Disease Biomedical Ontologies, Proc. of MultilingualBIO 2020. Click here to access the paper.

Internships 2020

5.06.2020

This document presents internships supervised by our software engineering department or by our research & development department. Each project is an opportunity to feel both empowered and responsible for your own professional development and for your contribution to the company. If you are interested in one of our offers, please send us your application to [email protected], including your CV and motivation regarding your top three internship positions (described in the document). If you wish to read the testimonies of students who have done an internship at EURA NOVA, visit our blog, or read directly their experiences: Renaud, Elouan [french-speaking article] Souheila, Léo If you are interested in working on a topic that is not in our range of offers, we would be delighted to hear your proposition and invite you get in touch. Internship subjects and application guidelines are available here: Internship Offers.

TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes

30.03.2020

Graphs are a fundamental structure that provides an intuitive abstraction for modelling and analyzing complex and highly interconnected data. Given the potential complexity of such data, some approaches proposed extending decision-support systems with multidimensional analysis capabilities over graphs. In this paper, we introduce TopoGraph, an end-to-end framework for building and analyzing graph cubes. TopoGraph extends the existing graph cube models by defining new types of dimensions and measures and organizing them within a multidimensional space that guarantees multidimensional integrity constraints. This results in defining three new types of graph cubes: property graph cubes, topological graph cubes, and graph-structured cubes. Afterwards, we define the algebraic OLAP operations for such novel cubes. We implement and experimentally validate TopoGraph with different types of real-world datasets. The paper will be published soon in Information Systems Frontiers, and is already available online on Springer. Currently, it is unfortunately available only to subscribers, but do not hesitate to reach out to us for more information! Amine Ghrab, Oscar Romero, Sabri Skhiri, Esteban Zimányi, TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes, published in Information Systems Frontiers (2020).

Thirty-Fourth AAAI Conference On Artificial Intelligence: A Summary

27.02.2020

Two weeks ago, our young research engineers Hounaida Zemzem and Rania Saidi were in New York for the Thirty-Fourth AAAI Conference On Artificial Intelligence. The conference promotes research in artificial intelligence and fosters scientific exchange between researchers, practitioners, scientists, students, and engineers in AI and its affiliated disciplines. Rania and Hounaida attended dozens of technical paper presentations, workshops, and tutorials on their favourite research areas: reinforcement learning for Hounaida and graph theory for Rania. What were the big trends and their favourite talks? Let’s find out with them! The Big Trends: Rania says: “The conference focused mostly on advanced AI topics such as graph theory, NLP, Online Learning, Neural Nets Theory and Knowledge Representation. It also looked into real-world applications such as online advertising, email marketing, health care, recommender systems, etc.” Hounaida adds: “I thought it was very successful given the large number of attendees as well as the quality of the accepted papers (7737 submissions were reviewed and 1,591 accepted). The talks showed the power of AI to tackle problems or improve situations in various domains.” Favourite talks and tutorials Hounaida explains: “Several of the sessions I attended were very insightful. My favourite talk was given by Mohammad Ghavamzadeh, an AI researcher at Facebook. He gave a tutorial on Exploration-Exploitation in Reinforcement Learning. The tutorial by William Yeoh, assistant professor at Washington University in St. Louis, was also amazing. He talked about Multi-Agent Distributed Constrained Optimization. Both their talks were clear and funny.” Rania’s feedback? “One of my favourite talks was given by Yolanda Gil, the president of the Association for the Advancement of Artificial Intelligence (AAAI). She gave a personal perspective on AI and its watershed moments, demonstrated the utility of AI in addressing future challenges, and insisted on the fact that AI is now

Schloss Dagstuhl: Where Computer Science Meets

20.02.2020

Which direction stream and complex event processing is going to take? Last week, the world’s best-known international researchers met in Schloss Dagstuhl, Germany, to present and discuss their research. Among the members were present Avigdor Gal, Professor at the Israel Institute of Technology, Alessandro Margara, Assistant Professor at the Polytechnic University of Milan, or Till Rohrmann, engineering lead at Veverica. Invited to talk about the requirements and needs from the industry, our R&D director Sabri Skhiri explains: “The seminar brought together world-class computer scientists and practitioners working on complex event recognition, distributed systems, databases, stream reasoning and artificial intelligence. Our objective was to disseminate the recent foundational results in each of these isolated fields among all participants, to identify the open problems that need to be resolved, and to establish new research collaborations among these fields”. What were the big trends and intakes gathered by those brilliant minds? Let’s find out with Sabri! The Big Trends This seminar is a bit particular as it does not show any trends but rather gives a picture of all the communities working on CER in a way or another. I was fascinated by the diversity of researchers. I did not expect to see such a rich variety of fields: knowledge representation, spatial reasoning, logic-based reasoning, data management, learning-based approaches, event-driven processing, process mining, database theory, stream mining,… According to me, the composite event recognition models that are the best at recognising complex events would include: Data flow model Ontology-based and reasoning model Symbolic reasoning model Automata-based model We also identified common challenges across these models and communities. The three priority topics areas we identified are: Expressivity: composability & hierarchies Evaluation strategy, parallelization and distribution Uncertainty management Favourite Talk Kurt Rothermel from TU Stuttgart – Time-sensitive Complex Event Processing My first

Throwback To 2019

21.01.2020

At EURA NOVA, we believe technology is a catalyst for change. To embrace it, we strive to stay at the edge of knowledge. Investing in research allows us to continuously become more proficient, to maintain our know-how at the cutting edge of IT, to share its benefits with our customers, and to incubate the products of tomorrow. As we look back on the year 2019, we are both proud and happy of the work achieved! Published papers: We are happy to say that our R&D department has published five peer-reviewed scientific papers last year. LEAD: A Formal Specification For Event Processing In June, our R&D engineer Anas presented his work on complex event processing at the 13Th ACM international Conference on distributed and event-based systems, which was taking place in Germany. Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event Processing, in 13Th ACM international Conference on distributed and event-based systems 2019 Coherence Regularization for Neural Topic Models In July, our R&D engineer Kate presented her paper on neural topic models at the 16th International Symposium on Neural Networks taking place in Moscow. Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural Topic Models. in 16th International Symposium on Neural Networks 2019 (ISNN 2019) STRASS: A Light and Effective Method for Extractive Summarization In August, our PhD student Léo was in Italy to present his paper at the 2019 ACL Student Research Workshop. Léo Bouscarrat, Antoine Bonnefoy, Thomas Peel, Cécile Pereira, STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings, in 2019 ACL Student Research Workshop, Florence, Italy. GraphOpt: Framework for Automatic Parameters Tuning of Graph Processing Frameworks In December, the paper written by our former intern and now full-time colleague Muaz was presented in Los

Fourth Workshop on Real-Time and Stream Analytics in Big Data: key takeaways

9.01.2020

Last December, Eura Nova’s research center held the fourth workshop on real-time and stream analytics in big data at the 2019 IEEE Conference on Big Data in Los Angeles. The workshop brought together leading players including Confluent, Apache Pulsar, the University of Virginia and Télécom Paris Tech as well as 8 renowned speakers from 6 different countries. We received more than 30 applications and we are proud to have hosted such interesting presentations of papers in stream mining, IoT, and industry 4.0. The workshop was a real success with many interesting questions and comments. If you could not attend, our R&D engineer Syrine Ferjaoui brought back important elements from the presentations for you. First keynote speaker: First of all, the workshop started with the keynote of Matteo Merli, PMC member at Apache Pulsar. His talk “Messaging and Streaming” explained how Pulsar can be a unified infrastructure that supports messaging and streaming. Matteo introduced messaging as events that are being created and streaming as analysing events that just happened. These are two different processing concepts but they need a single infrastructure. He then explained the architecture view of Pulsar, which has separate layers between the brokers and the bookies (BookKeeper instances that handle persistent storage of messages). This means that brokers and bookies can be added independently, traffic can be shifted very quickly across brokers, and new bookies will ramp up on traffic quickly. This segmented distribution makes the architecture of Pulsar more flexible and dynamic. Pulsar has other interesting features such as durability, low latency, high throughput, high availability, unified messaging model, high scalability, native computing, … The roadmap includes working on Pulsar storage API to allow direct access to data stored in Pulsar and to retrieve and process data more efficiently. They are also working on higher-level messaging

A Performance Prediction Model for Spark Applications

6.01.2020

Apache Spark is a popular open-source distributed-processing framework that enables efficient processing of massive amounts of data. It has a large number of parameters that need to be tuned to get the best performance. However, tuning these parameters manually is a complex and time-consuming task. Therefore, a robust performance model to predict applications execution time could greatly help in accelerating the deployment and optimization of big data applications relying on Spark. In this paper, we ran extensive experiments on a selected set of Spark applications that cover the most common workloads to generate a representative dataset of execution time. In addition, we extracted application and data features to build a machine learning-based performance model to predict Spark applications execution time. The experiments show that boosting algorithms achieved better results compared to other algorithms. Florian Demesmaeker, Amine Ghrab, Usama Javaid, Ahmed Amir Kanoun, A Performance Prediction Model for Spark Applications, in the proceedings of Big Data congress 2020. Click here to access the paper in its preprint form.

Privacy Policy Classification with XLNet

Towards Privacy Policy Conceptual Modeling

Applying Machine Learning Modeling to Enhance Runway Throughput at A Big European Airport

Pruning Random Forest with Orthogonal Matching Trees

Multilingual Enrichment of Disease Biomedical Ontologies

Internships 2020

TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes

Thirty-Fourth AAAI Conference On Artificial Intelligence: A Summary

Schloss Dagstuhl: Where Computer Science Meets

Throwback To 2019

Fourth Workshop on Real-Time and Stream Analytics in Big Data: key takeaways

A Performance Prediction Model for Spark Applications

Field of expertises

Data architecture

Data governance

Data science

Engineering

Academic collaboration

SERVE

Expertise

CRAFT

digazu

CONTACT

Belgium

France

Tunisia

CAREER

Job Offers

Social media