Flink Forward 2018: What You Want to Know and What You (Will) Need to Know.

Early September 2018, 8 EURA NOVA engineers travelled to Berlin to attend the Flink Forward Conference, dedicated to Apache Flink users and stream processing communities.

They came back with a lot to say about the hot topics in stream processing and the presented use cases! In this article, they will give you their opinion about data Artisans’ main announcement, the intakes of their favourite talks, and what they thought makes Flink Forward different from other conferences.

 

First keynote announcement:

During the keynote speech, data Artisans announced that they now bring ACID transactions directly on streaming data with data Artisans Streaming Ledger.

Charles Bonneau, our software architect, says: “This feature allows ACID transactions between multiple operators’ event-processing operations and internal states. This means that streaming applications can now update multiple states in one transaction. For example, an application that transfers money from one bank account to another can finally be implemented using Flink with strong consistency guarantees. Both bank accounts will have their balance updated at the same time as if there was a master data-management state”.

For Sabri Skhiri, our R&D director, this opens the doors to a brand new range of applications, especially in data-driven real-time services but also in streaming data management. He explains: “They are pushing forward the concept of streaming. Now, you could imagine a master data-management state that can be updated by operational streaming applications in real time. This will allow even more complex and advanced use cases of stream processing!”.

 

Favourite talks:

In 2 days, each Euranovian attended about 18 talks and use case presentations, with speakers from tech giants such as IBM, Netflix, Alibaba, and Uber as well as speakers from smaller companies.

Charles explains: “The conclusions are reassuring: most of them face the same issues that we see at our clients’ and our solutions are all valuable. They include a stream-first data architecture, a stream-first data pipeline product, and Flink developers skills. Even though a number of companies are at the very edge of the technology and their issues do not yet require continuous flows of a considerable amount of events, we are ready”.

For our R&D Director Sabri Skhiri, the keynote speech from Lightbend was one of the most interesting ones. He explains: “Viktor Klang, Lightbend deputy CTO, talked about the convergence between microservices and stream processing.  At EURA NOVA, we have been advocating for this convergence for more than a year in our architecture practice. The idea is simple: asynchronous microservices can be designed as stream processing stages. This is fantastic because it makes modern stateful stream processing frameworks the perfect target for implementing reactive microservices. With stateful deployment, exactly once semantics, high availability and ACID access to states, microservices can become stateful streaming apps.”

 

Vision-oriented Flink Conference:

Our colleagues came back with sparkles in their eyes. When we asked them how they felt about the event, Sabri Skhiri explained:

“Very often, this type of conferences tend to be business oriented. They are focused on how to make the framework easy to use and available to as many people as possible. By contrast, this year’s Flink Forward conference was all about innovation and vision. data Artisans shared their vision of what the Flink framework will be within 3 to 5 years and talked about what role stream processing and big data have within this vision.  In fact, almost all the talks were very technical. They were testimonies of big names in the industry, such as Alibaba, Netflix, and ING about problems encountered on the field and how they have been solved, which is often out of the box. The Flink-Alibaba partnership is a sharing one. Alibaba are way ahead with their technology. They keep their lead for 1 year and then they share their work and make their code open source. data Artisans have a great long-term vision of stream processing. I can see a lot of very interesting architecture discussions in the coming months!”

 

Stream Processing Technology:

When most frameworks cannot process considerable streams of live data and provide results in real time, Flink provides a single runtime for the streaming and batch processing while being highly scalable.

Cyrille Duverne, our Lead Data Architect, confirms: “Flink is definitely a real-time processor! We’re speaking about true real time, not only mini batches etc… Plus, the introduction of ACID transaction management in the new version of data Artisans’ Flink distribution creates a good marketing edge”.

Sabri Skirhi and our R&D engineer Florian Demesmaeker were at the Spark Summit this week. Stay tuned for part 2 with their feedback!

Data Mining and ML Techniques Supporting TBS Concept Deployment

Our paper “Data Mining and Machine Learning Techniques supporting Time-based Separation Concept Deployment”, co-written with Eurocontrol and WaPT, has been accepted by the 37th Digital Avionics Systems Conference (DASC) in London, U.K.

The paper presents two methods to allow air traffic controllers to deliver separation minima accurately and safely, on the basis of time intervals instead of distances.

Importantly, in strong headwind conditions,  the aircraft’s groundspeed during approach decreases, meaning that keeping the distance-based separation method results in  lower landing rates. At a time of intensified air traffic, this situation leads to considerable delays at airports with significant costs to operators and travellers.

With the new methods presented in the paper, capacity can increase by up to 14% in strong wind conditions, and by up to 8% in moderate wind conditions.

The paper has been presented in September at DASC 2018. If you wish to go deeper into the subject, do not hesitate to contact our research department at [email protected].

The abstract

The Time-Based Separation (TBS) concept consists in the definition of separation minima for aircraft on the final approach to a runway based on time intervals instead of distances, as applied in Distance-Based Separation (DBS) operations.

TBS allows for dynamic distance separation reductions in strong headwind conditions so as to preserve time spacing across all wind conditions. However, TBS application entails the use of a support tool providing separation distance indicators depending on the applicable time separation minimum, the aircraft speed profile which also depends on the headwind conditions.

This paper details two methodologies allowing a system to compute those TBS indicators so as to allow Air Traffic Controllers to accurately and safely deliver the TBS minima using a separation delivery support tool. The first approach is based on “analytical” data mining and modelling whereas the second one is based on a Machine Learning (M/L) procedure.

In the framework of the deployment of the TBS concept in Vienna airport (LOWW), those approaches are developed and tested using a database covering one year of traffic and corresponding local meteorological data.

The operation of TBS with indicators computed using either approaches leads to substantial diminution of time separations compared to a DBS strategy. However, given the large uncertainties related both to leader and follower aircraft speed profiles, the buffers could be designed only for the most frequent pairs. With the M/L approach (resp. the “analytical” approach), the capacity benefits related to the application of TBS with a separation support tool are of the order of 8% (resp. 2%) in moderate wind conditions, and up to 14% (resp. 10%) in strong wind conditions.

De Visscher, I.; Stempfel, G.; Rooseleer, F. & Treve, V.; Data mining and Machine Learning techniques supporting Time-Based Separation concept deployment, in 37th Digital Avionics Systems Conference (DASC), pp 594-603, London, UK, September 23-27, 2018

Third Workshop on Real-time & Stream Analytics in Big Data

EURA NOVA Research center is proud and excited to organize the third workshop on Real-time and Stream analytics in Big Data, collocated with the 2018 IEEE conference on Big Data. The workshop will take place in December in Seattle, USA.

As the world become more connected, flood of digital data is getting generated, in high volume, and in a high velocity. For industries such as financial markets, telecommunications, Smart Cities, manufacturing, or healthcare, there is an increasing need to process, and analyze, these data streams in real time.

These past two years, we have seen arriving another usage of Stream & complex event processing: the data management. New architecture patterns have been proposed to resolve data pipeline and data management within enterprise.

After the success of the two first edition, this is an excellent opportunity to engage in discussions with experts and researchers, to refine new opportunities and use cases required by the industry.

Authors are invited to contribute to the conference by submitting articles in the (among others) following areas: Scalable real-time decision algorithms, IoT analytics & stream mining, Data pipelines & Data management with Streams and Stream ETL and Real-Time Data Warehouse.

 

Want to submit a paper? Check out the workshop website to find all the information you  will need. Your paper will be reviewed by a prestigious panel of international experts from both the academic and the industrial worlds.

Graph BI & Analytics: Current State and Future Challenges

Our paper “Graph BI & Analytics: Current State and Future Challenges” has been accepted for publication at the 20th International Conference on Big Data Analytics and Knowledge Discovery, taking place in Regensburg, Germany.

The paper presents the state of the art of graph BI & analytics, with a focus on graph warehousing. We survey the topics of graph modelling, management, querying, and processing in graph warehouses. Then we conclude by discussing future research directions for solving complex graph problems, building native graph components and intelligent techniques to assist end-users in building and analysing the graph.

More importantly, the paper calls for the development of intelligent, efficient and industry-grade graph data warehousing systems to support the structure-driven management and analytics of data efficiently. While adopting a template that is similar to the traditional BI systems, the graph BI that is presented here extends current systems with graph analytics capabilities that deliver graph-derived insights.

The paper has been presented in September at DaWak 2018, you can now find the full version here. If you wish to go deeper into the subject, don’t hesitate to contact our research department at [email protected].

Abstract. In an increasingly competitive market, making well-informed decisions requires the analysis of a wide range of heterogeneous, large and complex data. This paper focuses on the emerging field of graph warehousing. Graphs are widespread structures that yield a great expressive power. They are used for modeling highly complex and interconnected domains, and efficiently solving emerging big data application. This paper presents the current status and open challenges of graph BI and analytics, and motivates the need for new warehousing frameworks aware of the topological nature of graphs. We survey the topics of graph modeling, management, processing and analysis in graph warehouses. Then we conclude by discussing future research directions and positioning them within a unified architecture of a graph BI & analytics framework.

Amine Ghrab, Oscar Romero, Salim Jouili, Sabri Skhiri, Graph BI & Analytics: Current State and Future Challenges. DaWaK 2018, 3-18

Second Spring School Big Data Analytics

EURA NOVA Research Center is both proud and happy to lead the Second Spring School Big Data Analytics that will be held in Tunis, from the 20th to the 22nd of March 2018. Sabri Skhiri and Aymen Cherif will talk about their favorite topics:

  • Deep Learning
  • TensorFlow
  • CNN Architecture
  • Unsupervised Learning
  • Complex Event Processing
  • Stream processing & micro-services

 

Check out the complete agenda and register on the event website : https://sites.google.com/view/ssbda2018/welcome

The conference is organised by the Ecole Polytechnique de Tunisie.

The Next Activities of our R&D Centre in Marseille

The French branch of EURA NOVA will take part in two great tech events in the following days and weeks.

 

On the 22nd of February, data scientist Thomas Peel will give a talk titled “Machine Learning à l’ère du RGPD” (Machine learning and the General Data Protection Regulation) on the opening day of the Colloquium intelligence artificielle, machine learning, data science to be held at the grand amphitheatre of the Saint-Charles campus in Marseille. Other great speakers from INRIA, Google, Provence Innovation, and Criteo will be featured. The event is free but registration is mandatory.

 

Practical information:

What? Colloquium intelligence artificielle, machine learning, data science

When? Thursday 22nd of February

Where? Grand amphithéâtre, campus Saint-Charles, – 3, place Victor Hugo – case 39 – 13331 MARSEILLE Cedex 03

Registration: : https://framaforms.org/conferences-ia-data-science-machine-learning-i2mlis-1518019875

 

On the 12th of March, the French branch of EURA NOVA is organising the Marseille Community Event, supported by the Neo4j GraphTour. Two speakers are already announced: R&D project manager Cécile Péreaira will present a text-mining use case with Neo4j in biology, and data scientist Antoine Bonnefoy will sum up the Parisian Neo4j conference, from technology and business viewpoints. After the talks, all attendees will be offered a casual dinner to pursue the discussion.

 

Practical information:

What? Marseille Community Event – Neo4j GraphTour

When? Monday the 12th of March, from 6:30 PM to 8:30 PM

Where? Le Wagon, 167 Rue Paradis,  Marseille

Registration: : https://www.eventbrite.fr/e/billets-neo4j-graphtour-marseille-community-event-42714338737?utm_campaign=new_event_email&utm_medium=email&utm_source=eb_email&utm_term=viewmyevent_button

Discovering Interesting Patterns in Large Graph Cubes

Due to the increasing importance and volume of highly interconnected data, such as in social or information networks, a plethora of graph mining techniques have been designed to enable the analysis of such data. In this work, we focus on the mining of associations between entity features in networks. We model each entity feature as a dimension to be analyzed. Consequently we build our approach on top of the existing graph cube framework which is an extension of the concept of the data cube to networks. Our task is particularly challenging because it requires the analysis of both the initial multidimensional network and all its subsequent aggregate forms. As soon as we deal with a big data situation it is impossible for an analyst to consider manually all the possible views of the network data. The aim of this work is to design an algorithm for the discovery of interesting patterns in large graph cubes. Thus, instead of examining all the possible aggregations manually, the proposed technique leads the analyst to the interesting associations or patterns in the multidimensional network. Furthermore, we study the application of existing algorithms from the frequent itemset mining literature on graph data and propose a mapping between the two settings.

Florian Demesmaeker, Amine Ghrab, Siegfried Nijssen, Sabri Skhiri: Discovering interesting patterns in large graph cubes. 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 2017, pp. 3322-3331.

Click here to access the paper.

Second Workshop on Real-Time and Stream Analytics in Big Data

EURA NOVA is thrilled to share the news with you: we are organizing our second workshop collocated with the 2017 IEEE International Conference on Big Data. The workshop will take place in December in Boston, MA, USA.

 

Stream processing and real-time analytics have caught the interest of the industry lately. Many use cases are waiting for relevant and efficient solutions to be developed. Such use cases include event-driven marketing, dynamic network management & optimization, real-time recommendation, context-aware applications and real-time fraud detection.

 

After the success of the first edition, this is an excellent opportunity to bring together the industry and academics  to discuss, to explore and to refine new opportunities and use cases in the area. The workshop will benefit  both researchers and practitioners interested in the latest research in real-time and stream processing. The workshop will showcase prototypes and products leveraging big data technologies as well as models, efficient algorithms for scalable complex event processors and context detection engines, or new architecture leveraging stream processing.
Want to submit a paper? Check out the workshop website to find all the information you  will need. Your paper will be reviewed by a prestigious panel of international experts from both the academic and the industrial worlds.

Next Workshop on Graph Business Intelligence

EURA NOVA is organizing their second workshop collocated with an international conference. This time, the workshop will be collocated with  the 21th European Conference on Advances in Databases and Information Systems. It will take place in September in Cyprus and will bring together industrial and academic stakeholders to discuss, explore and refine new opportunities and use cases in the area of Graph Business Intelligence.

 

Want to be part of the fun? Check out the workshop website to find all the information you need to know and submit your paper. Our researchers Sabri Skhiri, Salim Jouili and Amine Ghrab cannot wait to read your papers and meet you in Nicosia.

Big Data Architectures at Universitat Politècnica de Catalunya

Today and Wednesday (the 13rd and the 15th of March 2017), our R&D Director will be in Barcelona to give a course about Big Data Architectures.

The objective is to learn the basic concepts and details to take into account when designing a Big Data Architecture. The student will learn the impact of technical & functional constraints on the storage and processing choices. Going further the course will show, through industrial use cases, the raise of new architecture patterns. The course includes a practical part with hands-on session on distributed frameworks.

Contents :

  • Terminology & Concepts
  • Distributed architecture
  • Big Data Storage
  • Big Data Processing
  • Big Data Architecture Patterns (Hands-on session)
  • Distributed processing with Apache Flink / Spark
  • Data manipulation with Apache Pig

For more details, contact Oscar Romero ( [email protected] )

Want to host Sabri Skhiri for a course in your university? Contact [email protected]