Flink Forward 2015

Data architecture, Engineering

October 20, 2015

The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event. Here is our field report.

Global overview

The conference was an excellent opportunity to see (1) the evolution of the Flink framework and where they are heading, but also (2) what are the industrial use cases of Flink. Flink currently has a clear advantage compared to Spark in terms of

– Memory management

– Optimized logical & physical execution plan, especially for data processing and data queries

– Real streaming processing API with exactly-once semantics

Use cases from the big players

Interestingly, almost all the presented use cases systematically leverage the stream processing API in real time applications. The majority of the speakers however explained that they had first chosen Spark, but eventually switched to Flink because of memory and scalability issues.

We have seen a bunch of applications especially in Telecom with Bouygues (QoE) and Ericsson (failure detection), retail with Otto, travel (online reservation price variation) with Amadeus and Banking industry with Capital One (anomaly detection, fraud detection and online recommendation).

Tooling Integration

On the tooling side, we had a great demo of Zemplin the notebook for interactive queries and visualization. It can be used for Spark and for Flink. Very cool, if we want to show results and quick explorations.

The Evolution of Stream Processing Technologies

As explained before, we can see a clear evolution of the streaming technologies to take into account the out of order events. Google gave a really interesting key note the second day about Google dataflow and the unification API.

Streaming Patterns: Event Time Based Windows

Google Data flow semantics

Google dataflow is the evolution of MillWheel (MillWheel: Fault-Tolerant Stream Processing at Internet Scale, VLDB 2013) and FlumeJava (FlumeJava: Easy, Efficient Data-Parallel Pipelines, ACM SIGPLAN 2010) for Stream Processing. The real technical evolution of Google dataflow lays in the ability to stream data and define windows. The system then automatically defines a “watermark”, a point of time at which most of the events should have been received. This watermark can be below the window size, and then trigger results faster. In addition, this allows to register early triggers for results before the watermark and before the end of the window, and late trigger for late arrivals.

Why do this ? Google uses stream processing for click analytics, gaming, billing, ad exchanges, alarms, fraud detection, abuse detection, acquiring data & data transformation, and many other real time use cases. In each use case, they need to provide (1) completeness, (2) latency and (3) cost effectiveness. But depending of the use case, some will require full completeness and average latency, and are ready to pay the cost for that (such for billing), while abuse detection should have low latency in order to react as soon as possible.

By providing all those rich stream processing semantics but also only one unique DAG API for stream and batch, Google can cover a wide variety of different use cases, and it is up to the developer to define the right level of completeness, latency and costs by choosing between:

Batch
Batch with Window
Stream (latency but not completeness)
Stream & early/late trigger (Latency efficient and less costs)
Stream & retention (completeness, less costs and less latency efficient)

The unification API

Google Data flow provides only one API for building a DAG, a pipeline of operation. According to the type of data entering in the Pipe, Google data flow decides to compile and optimize that DAG into a batch processing job or in stream processing job.

Conclusion

Most of the use cases we have seen so far were focused on the Stream processing layer of flink that seems (1) much more flexible than Spark Streaming (that limits the sliding window usage by the RDDs) and (2) much more scalable and efficient than Spark (mainly due to the pipelining engine and their memory model). Finally, the out of order event handling is definitively a killer feature.

Google made the official announcement that they will support Flink as an official runner for running Google dataflow DAGs in a private datacenter rather that in Google Cloud Compute. That means that the same API will target two different frameworks, one in the cloud and one in a private data center. All of this makes Flink Stream a really good choice for a streaming platform.

Releated Posts

Insights from GTC Paris 2025

25.06.2025 / Engineering / Blog, Event

Among the NVIDIA GTC Paris crowd was our CTO Sabri Skhiri, and from quantum computing breakthroughs to the full-stack AI advancements powering industrial digital twins and robotics, there is a lot to share! Explore with Sabri GTC 2025 trends, keynotes, and what it means for businesses looking to innovate.

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

23.05.2025 / Engineering / Academic collaborations, Papers

Tumor tracking in PET/CT is essential for monitoring cancer progression and guiding treatment strategies. Traditionally, nuclear physicians manually track tumors, focusing on the five largest ones (PERCIST criteria), which is both time-consuming and imprecise. Automated tumor tracking can allow matching of the numerous metastatic lesions across scans, enhancing tumor change monitoring.

Flink Forward 2015

Global overview

Use cases from the big players

Tooling Integration

The Evolution of Stream Processing Technologies

Google Data flow semantics

The unification API

Conclusion

Releated Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Recent Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Insights from Data & AI Tech Summit Warsaw 2025

Insights From Flink Forward 2024

Tracks

Mjolnir

Rune

Vadgelmir

Yggdrasil

Field of expertises

Data architecture

Data governance

Data science

Engineering

Academic collaboration

SERVE

Expertise

CRAFT

digazu

CONTACT

Belgium

France

Tunisia

CAREER

Job Offers

Social media