Skip to content

Flink Forward 2015 – Slides & video

The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event (http://flink-forward.org/?session=stale-synchronous-parallel-iterations-on-flink).

Here is the talk we presented.

Stale Synchronous Parallel Iterations

We started the project earlier in April based on the publication of Cui et al. After discussions with the Flink team at Data Artisans, we went on and implemented SSP within Flink.

Along with SSP on Flink, we also proposed a distributed version of the Frank-Wolfe algorithm under SSP. The results has been published in the IEEE Big Data Conference 2015.

The abstract of the talk says:

While Bulk Synchronous Parallel is a model suitable for distributed bulk iterations, it has the overhead of synchronizing each worker with a fresh view of the working set between each iteration. It has however been shown that algorithms still converge when distributed workers hold an outdated or inconsistent view of the solution between iterations within defined bounds. This has led to the concept of Stale Synchronous Parallel iterations, in which workers work on cached model data from other workers covering previous iterations within defined bounds. This introduces two new notions: first the “clock” representing the smallest amount of work performed by a worker in an iteration, and second the “slack”, defining the maximum amount of clocks a worker can be ahead of the slowest.

In this project we implement the SSP iteration model on top of Flink iterations and introduce an important element of the model: the parameter server. We will present our contribution at the model level, our implementation within Flink using Apache Ignite and show the use cases benefitting from this iteration model.

Here is the video of the talk :
https://www.youtube.com/watch?v=SVnRFrEYE3s

Feeling adventurous?

The code of the contribution is available on github:
https://github.com/apache/flink/pull/1102
https://github.com/apache/flink/pull/967

 

Releated Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

Traditional RAG systems struggle to capture relationships and cross-references between different sources unless explicitly mentioned. This challenge is common in real-world scenarios, where information is often distributed and interlinked, making graphs a more effective representation. Our work provides a technical contribution through a comparative evaluation of retrieval strategies within GraphRAG, focusing on context relevance rather than abstract metrics. We aim to offer practitioners actionable insights into the retrieval component of the GraphRAG pipeline.
Read More

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.
Read More