You got it, in this blog we are talking about CEP and Event Stream Processing (ESP). In recent years, we have seen a lot of interest in “context-aware” applications or, if you prefer, detecting in real-time interesting contexts. But, if you look at tech blogs or even at IT vendors you can sometimes see ESP, CEP or pattern matching engines for this kind of applications. So what? If I need to be able to react in Real-time to interesting situations (business wise) by processing Streams of Events which one should I use? Are they the same? If I do complex things when I process my event stream do I really do Complex Event Processing? Then, does it mean that I need a CEP ?” These are really interesting questions that I will try to answer in this post.
The confusion comes mainly from the fact that the IT industry has named CEP a subset of projects that are able to do complex calculation on Event Stream and to detect patterns. Then, by extension the frameworks which enable you to apply complex calculation on streams and those which are able to detect complex patterns have been called CEP as well. Even, if there is no official standard definition, I would like to present my vision about the clear differences between Event Stream Processing, CEP and Event Pattern Matching engines.
Event Stream Processing (ESP)
Those frameworks aim at providing the infrastructure and the programming model for applying processing on event streams. We can cite as examples Twitter Storm , Apache S4, IBM Infosphere , Dempsy. Most of these frameworks provide features such as high availability, acknowledgment of event between processors, scheduling of operators on a cluster, etc.
The idea is to implement your “Data Flow Graph” (DFG) of Event Processors, also called Topology, by wiring operators to each other (without any cycle). The events are then streamed through this DFG that represents the global processing we would like to apply. Usually, Event Stream Frameworks do not support temporal features such as time windows out of the box, but most of them allow operators to receive clock ticks, allowing the user to implement his own time windows. In order to define your DFG you usually have to:
- code or re-use operators (event processors, the nodes in your DFG)
- wire them either by XML, JSON or a graphical tool.
The typical use cases are aggregation of values (MAX, MIN, AVG, COUNT, etc.) and ranking.
Complex Event Processing (CEP)
The name CEP has been given by the industry to a set of products that are able to
- express operations to apply on event streams to compute KPIs over one or multiple event Streams
- define patterns as thresholds on those KPIs. The pattern detection can then lead to the execution of an action.
If you look at the CEP engines on the market we can find two main categories:
- Rule-based CEP, such as Drools Fusion, leveraging the RETE algorithm to recognize patterns
- Stream-based engines that leverage the ESP technology to implement the CEP layer.
Actually, the second category is much more present on the market through products such as StreamBASE or Sybase Aleri. Those kind of CEP come from a set of research projects led in the US from 2002 to 2006 such as STREAM, AURORA, BOREALIS and Cayuga which aimed at using Event Stream Processing models to implement the CEP layer on top. Those researchers were mainly focused on the dynamic generation of the DFG from the CEP queries, using different forms of the Continuous Query Language (CQL), on the optimization of the physical execution query plan of this DFG and finally on scheduling of operators.
Typical use cases are the detection of “interesting things” during a sliding window, such as stock exchanges tracking, traffic jam detection, overload control, Business Activity Monitoring, etc.
As CQL is essentially SQL using parts of infinite streams instead of tables, it is best suited to express relationships between the streams themselves, as opposed to individual events within these streams.
Pattern Matching Engines
Around 2006 the research community rethought the concept of pattern. The idea was to define a pattern not only as as set of KPIs over streams correlated by common values, but rather as a situation expressed by a set of events linked with each other by complex temporal relations (SASE , TESLA , NEXT). This was a complete paradigm shift, even about the language for defining these new contexts. Indeed, the CQL from CEP was more adapted to the definition of operations on streams and when the results of those calculations should trigger a pattern. The new way of expressing patterns does not attempt to express complex calculations on event streams but rather expresses temporal relationships between individual events.
Let’s imagine a game vendor that would like to evaluate whether its last game is a success. The vendor defines that the game is a success if the user has downloaded the game and within the 3 following day has played at least 10 times and has shared an invitation with at least 5 friends, and from 1 to 2 days after this sharing, he has bought between 1 and 10 artifacts on-line. In the same way we can state that the game is mild success for a user when he has downloaded the game and within the 3 following days he has played between 1 and 4 times and he was not abroad during this period.
As you can see here we are not talking about complex operations on streams nor about defining a pattern as a threshold on KPIs, but really to identify a pattern as a complex temporal relationship.
The most attentive readers will notice that for the temporal management aspect, there is an important difference between CEP and pattern matching engine. The firsts use things such as sliding window or jumping window, which are a kind of absolute time window during which we calculate what we need to. While the pattern matching engines are able to take the first root event (the installation date in our example) as temporal anchor to start the validity window.
These are actually complementary: in the game example, you would probably use an ESP layer to compute the number of successes you have during a sliding window of two weeks.
As you can guess, one of the fundamental findings of this new research was to realize that the stream processing infrastructure was not really needed for this kind of problem. The new approach involved automata and a complete temporal logic for defining transition between states.
OK, so they are not really the same, aren’t they?
Indeed! The goal is different, the tooling and the deployment models are not the same but more than anything else they do not solve the same problems!
At EURA NOVA , we have two activities related to these topics. First, our consultants had the opportunity to design, develop and optimize a Pattern Matching Engine for a Telecom Equipment Provider. Going further EURA NOVA has launched a PhD thesis two years ago about extending the Chronicle concept  (developed by Orange labs back in 2000) with Pattern Matching Features in order to make the temporal expressiveness richer.
Do not hesitate to share your context-aware challenges with us!
 Arasu, Arvind, et al. “STREAM: the stanford stream data manager (demonstration description).” Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM, 2003. – http://dl.acm.org/citation.cfm?id=872854
 Abadi, Daniel, et al. “Aurora: a data stream management system.” Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM, 2003. – http://dl.acm.org/citation.cfm?id=872855
 Abadi, Daniel J., et al. “The Design of the Borealis Stream Processing Engine.”CIDR. Vol. 5. 2005. – http://www.cs.harvard.edu/~mdw/course/cs260r/papers/borealis-cidr05.pdf
 Brenna, Lars, et al. “Cayuga: a high-performance event processing engine.”Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM, 2007. – http://dl.acm.org/citation.cfm?id=1247620
 Gyllstrom, Daniel, et al. “SASE: Complex event processing over streams.”arXiv preprint cs/0612128 (2006). – http://arxiv.org/abs/cs/0612128
 Cugola, Gianpaolo, and Alessandro Margara. “TESLA: a formally defined event specification language.” Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems. ACM, 2010. – http://dl.acm.org/citation.cfm?id=1827427
 Schultz-Møller, Nicholas Poul, Matteo Migliavacca, and Peter Pietzuch. “Distributed complex event processing with query rewriting.” Proceedings of the Third ACM International Conference on Distributed Event-Based Systems. ACM, 2009. – http://dl.acm.org/citation.cfm?id=1619264
 Dousson, Christophe, and Pierre Le Maigat. “Chronicle recognition improvement using temporal focusing and hierarchization.” Proceedings of the 20th international joint conference on Artifical intelligence. Morgan Kaufmann Publishers Inc., 2007. – http://dl.acm.org/citation.cfm?id=1625326