Skip to content

Next generation BI – Research overview (Part 1)

Last week I was at the European Business Intelligence Summer School (eBISS 2011) in Paris. The objective was to give a complete overview on the researches and evolutions of Bi, viewed by the best of bread researchers and industrials. For newbies in BI, I recommend to start by the wikipedia page (what a collaborative and web 2.0 world !).

Basically we can say that this good old OLAP cube and queries will see a lot of evolutions, not only from the architecture viewpoint but also from the paradigm, technique and approaches. Let me  briefly (with no more than 1 reference for each topic, for the sake of the post) introduce those main evolutions:

Trajectory BI [1]: finding trajectories in a DW containing mobile data (x, y, t). The idea is to collect  a set of data about mobiles such as cars, persons, etc. From those raw data we should be able to define a trajectory that can vary depending the temporal and spatial gaps between points, maximum speed, etc. The trajectory mining can provide traffic management recommendation, routing optimization, advertising, behavioural pattern, etc. In addition this particular mining requests a new expression language to query trajectory DW.

Example of trajectory data (Vaiseman eBISS 2011)
Example of trajectory data (Vaiseman eBISS 2011)

Recommended BI [2]: introducing recommendation for filtering data set to show to user according to its user profile (which is mainly a set of preferences). The main idea is to be able to re-formulate a query based on user preferences. The idea is to be able, for a given request a user profile, to compare with previous request sessions the most similar one. According those similar sessions, the framework is able to re-formulate the request to complete or to filter the result data.

Collaborative BI [3]: making collaborate different enterprises for creating a common knowledge but without integrating their DW. The idea is to consider a P2P network on which  each Peer is a DW. If there are similar data in different DW, the authors propose a semantic mapping techniques, enables to bind attributes representing the same concepts. As a result, a user can execute a request on a peer that will (1) execute the query locally and (2) re-formulate the request according to the mapping to other peers.

Spatio-temporal OLAP [4]: The idea is to improve the DW design for managing spatial data that changes over time. Then, we need new paradigms to design the OLAP hypercube and new predicates for extending the standard SQL in order to express notion of containment, cross, overlap, etc. You can find applications of SP-T OLAP in territory management, ecological studies, epidemic studies, etc.

Web Scale analytics [5]: This new kind of data mining involves the evolution of DW processing system toward distributed system. The presenters pointed out the HDFS, hadoop MR and Hive as a potential good evolution to analyse and mine a new dimension of data. Indeed, the point here is that the relational DB are suited for few hundred TB and structured data. However as soon as you have to deal with hundred thousand of TB and with unstructured data (such as the main part of data on the web) than, you need another kind of architecture more distributed and using parallel programming. There are a couple of applications that are already needed today, such as opinion mining on the web, social rating, etc.

In such a way, several researches define HDFS (the open source implementation of GFS) as the distributed storage, the Hadoop MR (the OS implementation of the Google MR) as the parallel processing framework and HIVE (the SQL like layer on Top of Hadoop) as the ability to mine data within an abstraction level. But this kind of approach needs to re-think the way we mine the data. You can find an example of this new mining approach with the GoOLAP search and mining engine [6].

However,  at least for a transition period that can last,  this new kind of architecture should deal with existing enterprise systems such as ERP, CRM, SCM, etc. That can clearly be the bottleneck but that is something we have to deal with.

Let me continue to browse the other topics  in the next part, stay tuned.

References

[1] J.P. Gardella, Leticia Gómez, and Alejandro Vaisman. Trajectory Sequential Patterns with Regular Expression Constraints Including Spatial Queries. 12th International  Conference on Data Warehousing and Knowledge Discovery. 4th Alberto Mendelzon Workshop (AMW 2010). 2010.

[2] Arnaud Giacometti, Patrick Marcel, Elsa Negre, Arnaud Soulet. Query Recommendations for OLAP Discovery-Driven Analysis. IJDWM 7(2): 1-25 (2011)

[3] M. Golfarelli, F. Mandreoli, W. Penzo, S. Rizzi, E. Turricchia. BIN: Business Intelligence Networks. To appear in Business Intelligence Applications and the Web: Models, Systems and Technologies, IGI Group, 2012

[4] E. Malinowski and E. Zimányi. A Conceptual Model for Temporal Warehouses and its Transformation to the ER and the Object-Relational Models. Data & Knowledge Engineering, 64(1):101-133, 2008.

[5] A. Löser, Beyond Search,Web-Scale Business Analytics. WISE 2009: 5

[6] The GoOLAP Framework http://www.goolap.info/

[7] W.H. Immon, DW 2.0. The Architecture for the Next Generation of Data Warehousing, in Morgan Kaufman Series in Data Management Systems, 2008.



Releated Posts

Calibrate to Interpret

Trustworthy machine learning is driving a large number of the ML community works in order to improve ML acceptance and adoption. In this paper, we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation.
Read More

Mass Estimation of Planck Galaxy Clusters using Deep Learning

Galaxy cluster masses can be inferred indirectly using measurements from X-ray band, Sunyaev-Zeldovich (SZ) effect signal or optical observations. Unfortunately, all of them are affected by some bias. Alternatively, we provide an independent estimation of the cluster masses from the Planck PSZ2 catalogue of galaxy clusters using a machine-learning method.
Read More