In this section you will find EURA NOVA’s latest news and activities.
Last month, our R&D engineer Anas Albassit and director Sabri Skhiri travelled to Germany to attend and present at DEBS 2019, one of the most specialised conferences in Distributed Event-Based Systems. DEBS has a long history: from active databases to streaming engines, distributed publish-subscribe systems, etc. it has always been the pioneer of distributed and high-performance systems. In this article, Anas and Sabri share with us what they learned there and what struck them as particularly useful.
This edition focused around streaming language, scheduling, elasticity, distributed event processing, platform and middleware. Our R&D director Sabri Skhiri says: “For someone working in distributed computing and data management, DEBS is one of the major conferences with SIGMOD and VLDB. Even though it is quite small (80 participants vs 1000 for IEEE Big Data), this is a niche conference of experts from a small yet amazingly talented community of researchers. The keynotes were just great, with a good balance between pure research and industry. This conference tackling distributed computing and streaming is heaven for data scientists and architects like us!”
Tyler Akidau is the technical lead for the Data Processing Languages & Systems group at Google. He argues that, even though stream processing has gone from niche to mainstream, this is just the beginning. For him, the need for active exploration of new ideas is all the more pressing. Sabri reacts: “The stream has been there for 30 years. We have Spark, Flink, Dataflow, KStream, MSF Trill. But is it all we can do? Is there nothing to do anymore? Tyler Akidau brilliantly presented that stream processing as a field of research is alive and well.”
The talk was mainly about raising opened or partially opened questions in the streaming world.
Tyler Akidau concluded by pointing out that even though streaming systems are more capable and robust than ever, they often remain difficult to use, difficult to maintain, and difficult to understand.
[EDIT] Thank you Tyler for reaching out and for sharing your slide with us! They are available on the following link. If you would like to discuss more insights from the talk, do not hesitate to contact our researchers at email@example.com.
Hannaneh Najdataei, Researcher and PhD Student at the Chalmers University of Technology in Sweden, presented her framework STRETCH.
Anas explains: “The performance of a streaming engine depends on the throughput and latency of stateful analysis. To achieve the best performance, we need to process a large amount of data (i.e. to be scalable) while handling fluctuations in data rate (i.e. to be elastic). Distributed processing requires the ability to parallelise the processing elastically. Optimally, we should reduce the number of parallel operators when the workload decreases and add operators when more resources are needed. For stateful operators, elasticity reconfigurations require to redistribute the states according to the new cluster configuration (i.e. less or more operators). In this case, we need to find a tradeoff between a share-nothing and a share-all state architecture.”
Sabri adds: “The paper proposes STRECH, a virtual share-nothing parallelism concept that does not require state transfer. The idea is that all workers read the same sequence of input tuples through an intra-node streaming framework. What is surprising in this paper is the parallelism model: all workers get the same sequence of tuples to guarantee the deterministic execution of the stream. On the contrary, in streaming, you usually have a distribution of your tuples per key. Still, they have obtained impressive results matching the throughput and latency figures of the front of state-of-the-art solutions, while also achieving fast elastic reconfigurations.”
Nikos Giatrakos is a PhD researcher from the Technical University of Crete. He presented his work to do uncertainty-aware event analytics. Sabri reacts: “Getting high performance by sampling the input stream and sacrificing a bit of the result precision is the new trend in research. The idea is to parse only some of the events to be able to handle a bigger load, but still controlling the level of uncertainty you have on the result. I see 2 great applications: (1) get approximated results when needed but also (2) proactive detection before events happen.”
While the idea of filtering by controlling the probability of the error is not new, the paper had several novel points:
On the fourth day of the conference, our R&D engineer Anas presented his paper proposing a formal specification for CEP language.
Processing event streams is an increasingly important area for modern businesses aiming to detect and efficiently react to critical situations in near real-time. Due to CEP languages’ limitations and imprecise semantics, describing interesting situations remains challenging. In this paper, Anas presents a formal specification for processing complex events. The paper provides an algebra that consists of a set of operators for constructing complex events (patterns), temporally restricting the construction process and choosing among several selection and consumption policies.
The second day of the conference was dedicated to tutorials from experts in the field. Anas gives insights into his favourite training: Correctness & Consistency of Event-Based Systems. He explains: “The speaker was Opher Etzion, one of the pioneers in the domain of event processing. The tutorial lasted for about 4 hours. What is interesting is that the speaker demonstrates with examples that building an event-based system is not trivial. Even more, a lot of existing systems are incorrect and give inconsistent results due to some problems in their semantics. To ensure correctness, you have at least to understand the sources of latencies in your system and ensure fairness between all the agents, in addition to defining a set of policies to tell the system when, how, where and what events you are looking for.”