Which direction stream and complex event processing is going to take? Last week, the world’s best-known international researchers met in Schloss Dagstuhl, Germany, to present and discuss their research. Among the members were present Avigdor Gal, Professor at the Israel Institute of Technology, Alessandro Margara, Assistant Professor at the Polytechnic University of Milan, or Till Rohrmann, engineering lead at Veverica.
Invited to talk about the requirements and needs from the industry, our R&D director Sabri Skhiri explains: “The seminar brought together world-class computer scientists and practitioners working on complex event recognition, distributed systems, databases, stream reasoning and artificial intelligence. Our objective was to disseminate the recent foundational results in each of these isolated fields among all participants, to identify the open problems that need to be resolved, and to establish new research collaborations among these fields”.
What were the big trends and intakes gathered by those brilliant minds? Let’s find out with Sabri!
The Big Trends
This seminar is a bit particular as it does not show any trends but rather gives a picture of all the communities working on CER in a way or another. I was fascinated by the diversity of researchers. I did not expect to see such a rich variety of fields: knowledge representation, spatial reasoning, logic-based reasoning, data management, learning-based approaches, event-driven processing, process mining, database theory, stream mining,… According to me, the composite event recognition models that are the best at recognising complex events would include:
- Data flow model
- Ontology-based and reasoning model
- Symbolic reasoning model
- Automata-based model
We also identified common challenges across these models and communities. The three priority topics areas we identified are:
- Expressivity: composability & hierarchies
- Evaluation strategy, parallelization and distribution
- Uncertainty management
Kurt Rothermel from TU Stuttgart – Time-sensitive Complex Event Processing
My first reaction to load shedding was: “It is useless since customers do not want to lose any event, that is why so much effort is spent today on exactly once semantics…“. However, there is a trend today in stream processing, which is the trade-off between cost, latency, and correctness. Tyler Akidau described this challenge as a choice between one of three propositions: fast and correct, cheap and correct, or fast and cheap. Tyler was talking about streaming but that rule applies in the same way in a CEP context. The load shedding strategy directly falls in the third proposition. In this perspective, the work of Kurt is highly relevant.
Jacopo Urbani & Fredrik Heintz – Stream Reasoning
Concretely, stream reasoning is incremental reasoning over rapidly changing information. The tutorial opened new perspectives on stream processing for me. It tried to answer a very interesting question: how can you provide reasoning about context from streams of data? I definitely come from the database and event-based systems communities and I did not know at all that stream reasoning was so mature. This community has been evolving from having a continuous version of SPARKQL to a complete distributed stream reasoning semantics. It is interesting to see that the work we have done in the LEAD algebra and semantics is deeply inspired by this community. However, we have never used any reasoning logic on top of LEAD. But after a few hours of the tutorial, I realise that (1) reasoning can be used for query rewriting and optimisation (2) it is worth evaluating at least BigSR, the LARS implementation on Flink.
Avigdor Gal & Ruben Mayer – Distributed and Event-Based Systems
Avidgor is a kind of pop star for the stream processing and distributed systems community, or at least for me! The papers he published about a probabilistic CEP engine with late arrival and event uncertainty were visionary.
The speakers started by explaining the basics of stream processing then went deeper into the event recognition language and architecture. They detailed pub/sub applied to event recognition and explained the data flow model, which consists of a single unified data processing model where the stream and batch paradigms are the same. This last part was based on Tyler Akidau’s paper.
A second part of the talk focused on elasticity on streams. Stream fission puts operators among different categories:
- Firstly, key-based operators, that is a group by operation (as in SQL)
- Secondly, window-based operators enable to split processing that needs to have multiple event types correlated with different keys within the same operator
- Finally, pane-based operators enable a split-merge strategy where you distribute and merge the result.
Interestingly, Avigdor presented his work about late-arrival processing from a probabilistic viewpoint and not from the watermark perspective. Usually, modern stream processing frameworks use watermarks in order to take into account events that arrive later. Avigdor presented a probabilistic approach to this issue.
What are late-arrival events?
Imagine we want to count the number of cars entering a road segment every three minutes: we have a “tumbling window” every 3 minutes. If an event (ie a car) arrives at 2’55 second in the window but is stuck somewhere in the network for 6 sec, it is called a late-arrival event. The processing time (the time at which the CEP processes the event) is delayed compared to the event time (the time on which the event really occurs).
Note that for CEP, there is clearly a trade-off between timeliness and accuracy, because the slack time will increase the delay to deliver your result but will increase your accuracy. There is always a tradeoff between cost, latency and correctness, and usually, you can only pick two among the three.
Fun fact: If you need to explain what is event time & processing time to your mother (yeah, don’t underestimate the power of this kind of discussion at Christmas dinner), the best way is to take the Star Wars analogy. From an event time perspective (which is the time at which the story really happened) you should follow episode 1, 2, 3,4, 5, 6, 7,8, 9. But if you take the processing time (the time on which we received the episode), it is 4, 5, 6, 1, 2, 3, 7, 8, 9. Isn’t it great ?!
CER has been explored from many viewpoints. However, never in the research history was there a meeting gathering representatives of these communities. This was the objective of this seminar. Having all these people in a castle in the middle of nowhere was a blast! I had very passionate discussions during meals but also during the night at the library with the most brilliant brains on stream and CEP. On the other hand, I still had some fun discussions about comparing Star Trek DIscovery and Picard! Finally, the most important things I will remember after this seminar… are the endless ping pong games with Till Rohrmann and Alessandro Margara :-).