Euranova has 3 fundamental pillars: explore, craft and serve. The explore pillar of Euranova is an independent research centre dedicated to data science, software engineering and AI.
Through the exploration of tomorrow’s engineering and data science to answer today’s problems, our research centre is dedicated to anticipating the challenges that European businesses face. We find solutions to current and future digital challenges with passion, creativity and integrity.

Euranova has 3 fundamental pillars: explore, craft and serve. The explore pillar of Euranova is an independent research centre dedicated to data science, software engineering and AI.
Through the exploration of tomorrow’s engineering and data science to answer today’s problems, our research centre is dedicated to anticipating the challenges that European businesses face. We find solutions to current and future digital challenges with passion, creativity and integrity.

Thirty-Fourth AAAI Conference On Artificial Intelligence: A Summary

Two weeks ago, our young research engineers Hounaida Zemzem and Rania Saidi were in New York for the Thirty-Fourth AAAI Conference On Artificial Intelligence. The conference promotes research in artificial intelligence and fosters scientific exchange between researchers, practitioners, scientists, students, and engineers in AI and its affiliated disciplines. Rania and Hounaida attended dozens of technical paper presentations, workshops, and tutorials on their favourite research areas: reinforcement learning for Hounaida and graph theory for Rania. What were the big trends and their favourite talks? Let’s find out with them!   The Big Trends: Rania says: “The conference focused mostly on advanced AI topics such as graph theory, NLP, Online Learning, Neural Nets Theory and Knowledge Representation. It also looked into real-world applications such as online advertising, email marketing, health care, recommender systems, etc.” Hounaida adds: “I thought it was very successful given the large number of attendees as well as the quality of the accepted papers (7737 submissions were reviewed and 1,591 accepted). The talks showed the power of AI to tackle problems or improve situations in various domains.”   Favourite talks and tutorials Hounaida explains: “Several of the sessions I attended were very insightful. My favourite talk was given by Mohammad Ghavamzadeh, an AI researcher at Facebook. He gave a tutorial on Exploration-Exploitation in Reinforcement Learning. The tutorial by William Yeoh, assistant professor at Washington University in St. Louis, was also amazing. He talked about Multi-Agent Distributed Constrained Optimization. Both their talks were clear and funny.”   Rania’s feedback? “One of my favourite talks was given by Yolanda Gil, the president of the Association for the Advancement of Artificial Intelligence (AAAI). She gave a personal perspective on AI and its watershed moments, demonstrated the utility of AI in addressing future challenges, and insisted on the fact that AI is now

read more »

Schloss Dagstuhl: Where Computer Science Meets

Which direction stream and complex event processing is going to take? Last week, the world’s best-known international researchers met in Schloss Dagstuhl, Germany,  to present and discuss their research. Among the members were present Avigdor Gal, Professor at the Israel Institute of Technology, Alessandro Margara, Assistant Professor at the Polytechnic University of Milan, or Till Rohrmann, engineering lead at Veverica. Invited to talk about the requirements and needs from the industry, our R&D director Sabri Skhiri explains: “The seminar brought together world-class computer scientists and practitioners working on complex event recognition, distributed systems, databases, stream reasoning and artificial intelligence. Our objective was to disseminate the recent foundational results in each of these isolated fields among all participants, to identify the open problems that need to be resolved, and to establish new research collaborations among these fields”. What were the big trends and intakes gathered by those brilliant minds? Let’s find out with Sabri!     The Big Trends This seminar is a bit particular as it does not show any trends but rather gives a picture of all the communities working on CER in a way or another. I was fascinated by the diversity of researchers. I  did not expect to see such a rich variety of fields: knowledge representation, spatial reasoning, logic-based reasoning, data management, learning-based approaches, event-driven processing, process mining, database theory, stream mining,… According to me, the composite event recognition models that are the best at recognising complex events would include: Data flow model Ontology-based and reasoning model Symbolic reasoning model Automata-based model We also identified common challenges across these models and communities. The three priority topics areas we identified are: Expressivity: composability & hierarchies Evaluation strategy, parallelization and distribution Uncertainty management   Favourite Talk Kurt Rothermel from TU Stuttgart – Time-sensitive Complex Event Processing My first

read more »

Throwback To 2019

At EURA NOVA, we believe technology is a catalyst for change. To embrace it, we strive to stay at the edge of knowledge. Investing in research allows us to continuously become more proficient, to maintain our know-how at the cutting edge of IT, to share its benefits with our customers, and to incubate the products of tomorrow. As we look back on the year 2019, we are both proud and happy of the work achieved!   Published papers: We are happy to say that our R&D department has published five peer-reviewed scientific papers last year.   LEAD: A Formal Specification For Event Processing   In June, our R&D engineer Anas presented his work on complex event processing at the 13Th ACM international Conference on distributed and event-based systems, which was taking place in Germany. Anas Al Bassit, Skhiri Sabri, LEAD: A Formal Specification For Event Processing, in 13Th ACM international Conference on distributed and event-based systems 2019   Coherence Regularization for Neural Topic Models   In July, our R&D engineer Kate presented her paper on neural topic models at the 16th International Symposium on Neural Networks taking place in Moscow. Katsiaryna Krasnashchok, Aymen Cherif, Coherence Regularization for Neural Topic Models. in 16th International Symposium on Neural Networks 2019 (ISNN 2019)   STRASS: A Light and Effective Method for Extractive Summarization   In August, our PhD student Léo was in Italy to present his paper at the 2019 ACL Student Research Workshop. Léo Bouscarrat, Antoine Bonnefoy, Thomas Peel, Cécile Pereira, STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings, in 2019 ACL Student Research Workshop, Florence, Italy.   GraphOpt: Framework for Automatic Parameters Tuning of Graph Processing Frameworks   In December, the paper written by our former intern and now full-time colleague Muaz was presented in Los

read more »

Fourth Workshop on Real-Time and Stream Analytics in Big Data: key takeaways

Last December, Eura Nova’s research center held the fourth workshop on real-time and stream analytics in big data at the 2019 IEEE Conference on Big Data in Los Angeles. The workshop brought together leading players including Confluent, Apache Pulsar, the University of Virginia and Télécom Paris Tech as well as 8 renowned speakers from 6 different countries. We received more than 30 applications and we are proud to have hosted such interesting presentations of papers in stream mining, IoT, and industry 4.0. The workshop was a real success with many interesting questions and comments. If you could not attend, our R&D engineer Syrine Ferjaoui brought back important elements from the presentations for you.   First keynote speaker: First of all, the workshop started with the keynote of Matteo Merli, PMC member at Apache Pulsar. His talk “Messaging and Streaming” explained how Pulsar can be a unified infrastructure that supports messaging and streaming. Matteo introduced messaging as events that are being created and streaming as analysing events that just happened. These are two different processing concepts but they need a single infrastructure. He then explained the architecture view of Pulsar, which has separate layers between the brokers and the bookies (BookKeeper instances that handle persistent storage of messages). This means that brokers and bookies can be added independently, traffic can be shifted very quickly across brokers, and new bookies will ramp up on traffic quickly. This segmented distribution makes the architecture of Pulsar more flexible and dynamic. Pulsar has other interesting features such as durability, low latency, high throughput, high availability, unified messaging model, high scalability, native computing, … The roadmap includes working on Pulsar storage API to allow direct access to data stored in Pulsar and to retrieve and process data more efficiently. They are also working on higher-level messaging

read more »

A Performance Prediction Model for Spark Applications

Apache Spark is a popular open-source distributed-processing framework that enables efficient processing of massive amounts of data. It has a large number of parameters that need to be tuned to get the best performance. However, tuning these parameters manually is a complex and time-consuming task. Therefore, a robust performance model to predict applications execution time could greatly help in accelerating the deployment and optimization of big data applications relying on Spark. In this paper, we ran extensive experiments on a selected set of Spark applications that cover the most common workloads to generate a representative dataset of execution time. In addition, we extracted application and data features to build a machine learning-based performance model to predict Spark applications execution time. The experiments show that boosting algorithms achieved better results compared to other algorithms. Florian Demesmaeker, Amine Ghrab, Usama Javaid, Ahmed Amir Kanoun, A Performance Prediction Model for Spark Applications, in the proceedings of Big Data congress 2020. Click here to access the paper in its preprint form.

read more »

IEEE Big Data 2019 – A Summary

At the beginning of the month, our R&D director Sabri Skhiri and our R&D engineer Syrine Ferjaoui travelled to Los Angeles to attend IEEE Big Data Conference. It is one of the most influential academic gatherings in distributed machine learning. This year, it featured 879 authors, shortlisted from 2009 applicants. They came from 28 countries and presented 210 papers. Back in Belgium, Sabri and Syrine give you their opinion on the event itself and the important elements from the keynotes, the tutorials, the workshops and the interesting papers.   The Big Trends Sabri says: “The main trends were deep learning, NLP, privacy-preserving approaches, GAN, graph mining and stream mining. In my view, the level of the papers was quite good. Authors are becoming ever more skilled in data science, maths and algorithms. This goes to show that to be a good data scientist, you need an extensive set of advanced skills. Interestingly, there was almost nothing about distributed computing! This is a big move compared to the previous editions. The only presentations that had something to do with distributed systems were about optimisation strategies, an area similar to what our ECCO team researches. The Big Data Conference focuses on data science; it does not really look into its scalability.  Distributed computing topics tend to be dealt with at conferences like DEBS, VLDB, USENIX, SIGMOD, etc. As a result, this conference is an amazing place to see hundreds of data science use cases with, most of the time, an interesting contribution.”   The Keynotes   The keynotes were focused on data science as well. We even heard the term “Big Data Science”. Keynote 1: Responsible Data Science by Lise Getoor – Professor at UC Santa Cruz Syrine says: “The first keynote was my favourite. Lise started by comparing machine learning to

read more »

GraphOpt: Framework for Automatic Parameters Tuning of Graph Processing Frameworks

Finding the optimal configuration of a black-box system is a difficult problem that requires a lot of time and human labor. Big data processing frameworks are among the increasingly popular systems whose tuning is a complex and time consuming. The challenge of automatically finding the optimal parameters of big data frameworks attracted a lot of research in recent years. Some of the studies focused on optimizing specific frameworks such as distributed stream processing, or finding the best cloud configurations, while others proposed general services for optimizing any black-box system. In this paper, we introduce a new use case in the domain of automatic parameter tuning: optimizing the parameters of distributed graph processing frameworks. This task is notably difficult given the particular challenges of distributed graph processing that include the graph partitioning and the iterative nature of graph algorithms. To address this challenge, we designed and implemented GraphOpt: an efficient and scalable black-box optimization framework that automatically tunes distributed graph processing frameworks. GraphOpt implements state-of-the-art optimization algorithms and introduces a new hill-climbing-based search algorithm. These algorithms are used to optimize the performance of two major graph processing frameworks: Giraph and GraphX. Extensive experiments were run on GraphOpt using multiple graph benchmarks to evaluate its performance and show that it provides up to 47.8% improvement compared to random search and an average improvement of up to 5.7%. Muaz Twaty, Amine Ghrab, Skhiri Sabri: GraphOpt: a Framework for Automatic Parameters Tuning of Graph Processing Frameworks. 2019 IEEE International Conference on Big Data (Big Data) Workshops, Los Angeles, CA, USA. The paper was published at the third IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2019). You can access it here in its preprint version. Do not hesitate to contact our R&D department at research@euranova.eu to discuss how

read more »

Master Thesis 2020

This document introduces you to master thesis supervised by our research & development department. Each project offers you the chance to be actively involved in the development of solutions to address tomorrow’s challenges in ICT and implementing them today!   If you are interested in one of our offers, please send us your application to career@euranova.eu, including your CV and motivation regarding your top three master thesis subject (described in the document). If you are interested in working on a topic that is not in our range of offers, we would be delighted to hear your proposition and invite you get in touch. Master thesis subjects and application guidelines are available here: Master Thesis Offers.

read more »

Flink Forward: The Key Takeaways

Early October 2019, 6 EURA NOVA engineers travelled to Berlin to attend the Flink Forward Conference, dedicated to Apache Flink users and stream processing communities. In this article, they will give you their opinion about Ververica’s’ main announcement, the impact of Ververica acquisition by Alibaba, the big trends, and a selection of their favourite talks.   Alibaba! This is the first Flink Forward conference since the acquisition of Ververica (formerly known as data Artisans) by Alibaba, which has been one of the largest users of Flink and second-largest contributor for years. Our R&D director Sabri Skhiri says: “The only significant impact of this acquisition on the conference is that the venue is now at the Berlin Business Center instead of the Kulturbrauerei. There, we could see that the Apache Flink user’s community has grown significantly as well as their commits on Flink. This edition was a bit more business and enterprise-oriented than previous ones, although it still had its technical DNA and a lot of technical talks. All in all, this was a very good mix. Alibaba folks are deeply committed to open source and creating technology impact. We saw a lot of activities from them such as the integration of the Blink SQL runner, the hive integration or the new scheduling model. In summary, a great event.”   First Keynote Announcement Keynote: Stream Processing and Applications in the Modern Age (Stephan Ewen) During the first keynote, Ververica took the opportunity to announce the launch of Stateful Functions (statefun.io), an open-source framework built on top of Flink to run stateful serverless functions. It bridges the gap between Function as a Service and stream processing. Sabri says: ”Last year, they announced their streaming ledger that brings ACID transactions between states to stream processing applications. This year, they announced the launch of

read more »

Kafka Summit: The Key Takeaways

At the beginning of the month, our software engineer Christophe Philemotte was in San Francisco to make a presentation at the Kafka Summit organised by Confluent. The Kafka Summit is one of the main events for data architects, engineers, DevOps, and developers who want to learn about streaming data. In this article, Christophe shares with you the latest trends from the conference.   Main observations This year, one of the most important takeaways at the conference was that Confluent is working towards building an active database with KSQL. Christophe details: “KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka. With KSQL, Confluent is embracing the SQL streaming and the integration of its stack into it. They also aim to have the interactivity we already have with a classic database. In short, they are moving towards this new paradigm of active data and passive query where KSQL would make it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. Still, KSQL shouldn’t be chosen over Flink, for instance, without proper consideration of its limitations. For example, real checkpointing and savepoint are missing, as well as global shuffling. There are still constraints on partitioning in some operators and there is no global windowing.” While talking about SQL streaming, they also mentioned user-defined function or machine learning integration. Find more information on the summit website. Another interesting point was the shared approaches and themes that were addressed by different companies. For example, 30% of the talks were about the operations. About 5 talks were dedicated to methods how to deploy on Kubernetes, and several other speakers mentioned that deploying on Kubernetes was their target. Real-time analytics, integrations/ETL/DataOps, and of course data pipelines were also often mentioned.   Keynotes talks: During the first

read more »

4th Workshop on Real-time & Stream Analytics in Big Data

EURA NOVA Research centre is proud and excited to organize the fourth workshop on Real-time and Stream analytics in Big Data, collocated with the 2019 IEEE conference on Big Data. The workshop will take place in December in Los Angeles, USA. Stream processing and real-time analytics in data science have become some of the most important topics of Big Data. To refine new opportunities and use cases required by the industry, we are bringing together experts passioned about the subject.  This year, we are excited to have two amazing keynotes from Confluent KStream and Apache Pulsar:  Matteo Merli is one of the co-founders of Streamlio, he serves as the PMC chair for Apache Pulsar and he’s a member of the Apache BookKeeper PMC. Previously, he spent several years at Yahoo building database replication systems and multi-tenant messaging platforms. Matteo was the co-creator and lead developer for the Pulsar project within Yahoo. John Roesler is a software engineer at Confluent and a contributor to Apache Kafka, primarily to Kafka Streams. Before that, he spent eight years at Bazaarvoice, on a team designing and building a large-scale streaming database and a high-throughput declarative Stream Processing engine.   If you want to join us, authors from the industry and the academia are invited to contribute to the conference by submitting articles. Check out the workshop website to find all the information you will need. Your paper will be reviewed by a prestigious panel of international experts from both the academic and the industrial worlds.  

read more »

ACL 2019: Takeaways from the conference

Last month our R&D Project Director Cécile Pereira and our PhD student Léo Bouscarrat travelled to Florence to attend and present to ACL 2019. ACL is one of the biggest conferences in Natural Language Processing. This year all the records were broken with more than 3500 attendees, 660 accepted papers to the main conference, 9 tutorials and more than 20 workshops. All the talks of the main conference were recorded and are accessible online. In this article, Cécile and Léo share with you the latest trends from the conference!     Big trends   A new paradigm in NLP? This year, ACL’s selection of topics has shown the importance that has taken self-training methods such as BERT (Devlin et al., 2019) or XLNet (Yang et al., 2019). These methods consist of feeding huge models with a vast amount of data and then train them on easy tasks (for example, predict masked words in the original sentence or predict if two sentences are following each other). These models should be able to learn faster and with less data on a more specific and complex task. With this method, the way to train a model to solve an NLP task has changed. Here is this new paradigm: Select a pre-trained model (trained with self-training) Add a layer on the output of this model (it will depend on your task) and fine-tune the model by giving the inputs and outputs of your task Evaluate your model Many papers were using this paradigm to achieve state of the art on several tasks (out of the 660 papers of the main conference, 47 have the word BERT in their abstract). Contextual embeddings, like BERT, take into account the context of the sentence into the embeddings of the words. BERT can be used for a large

read more »