Euranova has 3 fundamental pillars: explore, craft and serve. The explore pillar of Euranova is an independent research centre dedicated to data science, software engineering and AI.
Through the exploration of tomorrow’s engineering and data science to answer today’s problems, our research centre is dedicated to anticipating the challenges that European businesses face. We find solutions to current and future digital challenges with passion, creativity and integrity.
Euranova has 3 fundamental pillars: explore, craft and serve. The explore pillar of Euranova is an independent research centre dedicated to data science, software engineering and AI.
Through the exploration of tomorrow’s engineering and data science to answer today’s problems, our research centre is dedicated to anticipating the challenges that European businesses face. We find solutions to current and future digital challenges with passion, creativity and integrity.
MASTER THESIS & PFE 2021
This document introduces you to master thesis and graduation projects supervised by our research & development department. Each project offers you the chance to be actively involved in the development of solutions to address tomorrow’s challenges in ICT and implementing them today!
ECML 2020 – The Keynotes
A few weeks ago, the biggest European conference on machine learning was held: ECML 2020. Our research engineer Nourchène, our R&D consultant Gianmarco, and our data scientist Ronan attended the event from Tunisia, Belgium and Marseille. In this article, they tell you about the different keynote talks they attended. Gemma Galdon-Clavell – Algorithmic Auditing: how to open the black-box of ML Nourchène says: “I loved the talk given by Gemma Galdon-Clavell during which she addressed the problem of ethics in AI, as computer science engineers do not often question what they are producing from a moral standpoint. In her talk, Gemma points out the importance of data used to train a machine learning model. Data are provided by humans, but people are not perfect, they are likely to make wrong decisions. The model will then learn to behave the same way. So we might end up creating an unethical model. This can lead to two different behaviours: users either will follow the system’s recommendations at any cost or decide not to if they find the decisions not reasonable. Data will then continue to be biased, which creates a sort of deadlock.” Ronan adds: “Algorithms do not produce biases from anywhere; they reproduce and amplify biases they can find in the data they ingest. As a result, we have to pay attention first to the quality of the data we use. Gemma emphasizes that algorithmic auditing is the key to understanding if the algorithm meets the expectations and if it complies with the regulations. The audit does not only cover the technical part and the way the algorithm was coded. It also focuses on how the problem was approached and the means deployed to solve it.” Nourchène explains: “The speaker suggests that before creating a product, computer science engineers and developers need to ask the following questions:
ECML 2020 – A Summary
A few weeks ago, the biggest European conference on machine learning was held: ECML 2020. Our research engineer Nourchène, our R&D consultant Gianmarco, and our data scientist Ronan attended the event from Tunisia, Belgium and Marseille. What were the big trends and their favourite talks? What did they think of the online remote format? Let’s find out with them! The Big Trends The overall conference was very well up-to-date with the outside world’s latest trends and needs. Gianmarco explains: “The conference was rich in presentations which covered nearly all possible topics in machine learning. However, I had the impression that Graph Neural Networks and Generative Models had a little more presence than other models. Transfer learning was also another topic that seemed to be very relevant throughout the conference.” Remote Format For The First Time Due to the COVID-19 pandemic, the conference was fully virtual. The talks were pre-recorded and made available prior to the conference. The live sessions were dedicated to questions and answers, with a very brief presentation at the beginning of the session. Nourchène explains: “The downside was that we had to watch the whole presentation beforehand, otherwise it was difficult to follow the discussion and to interact with the speaker. Fun fact: there was a session where even the moderator was not aware of this Q&A aspect and asked the speaker why the presentation was so short! The good thing is that, since the presentations were pre-recorded, it was possible to watch the presentations from sessions running in parallel.” Gianmarco adds: “I have not had many remote conferences in my life, but I was genuinely surprised to see how well-organised this one was. The remote framework was very well-designed, the web interface was fully functional, and they took advantage of all the benefits that
Our engineer Amine Ghrab presented his PhD public defense on the BI on Graph Project
Last Thursday, our engineer Amine Ghrab presented the BI on Graph project during his PhD public defense. Amine did an amazing job at the edge between Industry & Academia. Amine’s thesis was done in collaboration with the CODE/WIT Lab of the Université Libre de Bruxelles and the Universitat Politècnica de Catalunya, with the support of Prof. Oscar Romero & Prof. Esteban Zimanyi! In his PhD thesis, Amine defined how BI environments can be enriched with Graph Data structures. Over the past decade, business and social environments have become increasingly complex and interconnected. As a result, graphs have emerged as a widespread abstraction tool at the core of the information infrastructure that supports these environments. In particular, the integration of graphs into data warehouse systems has appeared as a way to extend current information systems with graphs management and analysis capabilities. Going forward, Amine redefined the concepts of multidimensional cube on graph and showed how it can open new doors for data analysts. Finally, he showed how a graph data warehouse architecture can be defined. Congratulation for your achievements! You can find below a list of related publications: TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes GraphOpt: a Framework for Automatic Parameters Tuning of Graph Processing Frameworks Graph BI & Analytics: Current State and Future Challenges Discovering interesting patterns in large graph cubes A Framework for Builidng OLAP Cubes on Graphs
Internship & Master Thesis Offer – 2021
Our master thesis and internships offers for the coming year, supervised by our software engineering department or by our research & development department, will be available in the course of November, and will cover the following research topics: Regarding data privacy: Legal entity relations with knowledge graph Legal NLP Privacy by design Topic modeling Text summarisation … Regarding data automation GAN for multimodal representation AutoML Optimization methods Computer vision Graph Embeddings … Regarding data pipelines Reinforcement learning Optimisation methods Stream Processing CEP Network compression … Regarding data quality Denoising technique GAN for missing data Semi-Supervised learning Data cleaning Attention Model for Structural dep. … Each project is an opportunity to feel both empowered and responsible for your professional development and to address tomorrow’s challenges in ICT, coached by the Eura Nova crew. The detailed offers will be available mid-november. In the meantime, do not hesitate to contact us at career@euranova.eu for any question regarding internships and master thesis! As an example, the documents listed below present our 2020 master thesis and internships: Internships offers Master Thesis offers
Privacy Policy Classification with XLNet
The popularisation of privacy policies has become an attractive subject of research in recent years, notably after the General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is…
Towards Privacy Policy Conceptual Modeling
After GDPR enforcement in May 2018, the problem of implementing privacy by design and staying compliant with regulations has been more prominent than ever for businesses of all sizes, which is evident from frequent cases against companies and significant fines paid due to non-compliance. Consequently, numerous research works have been emerging in this area….
Applying Machine Learning Modeling to Enhance Runway Throughput at A Big European Airport
One of the factors limiting busiest airport’s runway throughput capacity is the spacing to be applied between landing aircraft in order to ensure that the runway is vacated when the follower aircraft reaches the runway threshold. Today, because the Controller is not able to always anticipate the runway occupancy time (ROT) of the leader aircraft, significant spacing buffers are added to the minimum required spacing in order to cover all possible cases, which negatively affects the resulting arrival throughput. The present paper shows how a Machine Learning (ML) analysis can support the development of accurate, yet operational, models for ROT prediction depending on all impact parameters. Based on Gradient Boosting Regressors, those ML models make use of flight plan information (such as aircraft type, airline, flight data) and weather information to model the ROT. This paper shows how it can be used operationally to increase runway capacity while maintaining or reducing the risk of delivery of separations below runway occupancy time. The methodology and related benefits are assessed using three years of field measurements gathered at Zurich airport. You can find the slide here and the paper here. Guillaume Stempfel, Victor Brossard, Ivan De Visscher, Antoine Bonnefoy, Mohamed Ellejmi, Vincent Treve ̧ Applying Machine Learning Modeling to Enhance Runway Throughput at A Big European Airport, Proc. of the 10th EASN International Conference on “Innovation in Aviation & Space to the Satisfaction of the European Citizens, Naples, Italy, 2020.
Pruning Random Forest with Orthogonal Matching Trees
In this paper we propose a new method to reduce the size of Breiman’s Random Forests. Given a RandomForest and a target size, our algorithm builds a linear combination of trees which minimizes the training error. Selected trees, as well as weights of the linear combination are obtained by means of the Orthogonal Matching Pursuit algorithm. We test our method on many public benchmark datasets both on regression and binary classification, and we compare it to other pruning techniques. Experiments show that our technique performs significantly better or equally good on many datasets1. We also discuss the benefit and short-coming of learning weights for the pruned forest which lead us to propose to use a non-negative constraint on the OMP weights for better empirical results. Luc Giffon, Charly Lamothe, Léo Bouscarrat, Paolo Milanesi, Farah Cherfaoui, and Sokol Ko, Pruning Random Forest with Orthogonal Matching Trees, Proc. of CAP 2020. Click here to access the paper.
Multilingual Enrichment of Disease Biomedical Ontologies
Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both, plus Arabic, Chinese and Russian for the second. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation. Léo Bouscarrat, Antoine Bonnefoy, Cécile Capponi, Carlos Ramisch, Multilingual Enrichment of Disease Biomedical Ontologies, Proc. of MultilingualBIO 2020. Click here to access the paper.
Internships 2020
This document presents internships supervised by our software engineering department or by our research & development department. Each project is an opportunity to feel both empowered and responsible for your own professional development and for your contribution to the company. If you are interested in one of our offers, please send us your application to career@euranova.eu, including your CV and motivation regarding your top three internship positions (described in the document). If you wish to read the testimonies of students who have done an internship at EURA NOVA, visit our blog, or read directly their experiences: Renaud, Elouan [french-speaking article] Souheila, Léo If you are interested in working on a topic that is not in our range of offers, we would be delighted to hear your proposition and invite you get in touch. Internship subjects and application guidelines are available here: Internship Offers.
TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes
Graphs are a fundamental structure that provides an intuitive abstraction for modelling and analyzing complex and highly interconnected data. Given the potential complexity of such data, some approaches proposed extending decision-support systems with multidimensional analysis capabilities over graphs. In this paper, we introduce TopoGraph, an end-to-end framework for building and analyzing graph cubes. TopoGraph extends the existing graph cube models by defining new types of dimensions and measures and organizing them within a multidimensional space that guarantees multidimensional integrity constraints. This results in defining three new types of graph cubes: property graph cubes, topological graph cubes, and graph-structured cubes. Afterwards, we define the algebraic OLAP operations for such novel cubes. We implement and experimentally validate TopoGraph with different types of real-world datasets. The paper will be published soon in Information Systems Frontiers, and is already available online on Springer. Currently, it is unfortunately available only to subscribers, but do not hesitate to reach out to us for more information! Amine Ghrab, Oscar Romero, Sabri Skhiri, Esteban Zimányi, TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes, published in Information Systems Frontiers (2020).