Currently EURA NOVA is leading two PhD thesis in collaboration with the CODE/WIT Lab of the Université Libre de Bruxelles. Both thesis are supervised by Professor Esteban Zimànyi from ULB and Sabri Skhiri from EURA NOVA.
The goal of my thesis is to build a multidimensional analysis framework on top of NoSQL data, and most importantly Graphs. Recently our paper entitled: An Analytics-Aware Conceptual Model For Evolving Graphs was accepted for presentation on the 15th International Conference on Data Warehousing and Knowledge Discovery – DaWaK 2013. The conference will be held in Prague, Czech Republic from 26 to 29 August 2013. DaWaK is a reference conference on data warehousing bringing together researcher working on BI related topics.
In this post, I’ll give an overview of the research and contributions of this paper.
The need for real graph data analysis
Graphs are ubiquitous data structures commonly used to represent highly connected data. Moreover, graphs are considered as one the four major NoSQL family data models. They have the benefit of revealing valuable insights from both the network structure and the data embedded within the structure. Complex real-world problems, such as intelligent transportation, social and biological network analysis, could be abstracted and solved using graphs structures and algorithms.
In this paper we introduce a new graph modeling approach for effective analysis of evolving graphs. In the corresponding non-evolving graph, the information would be discarded when the attributes or the topology changes.
Conceptual Modeling of Graph Data
The aim of our model is to provide analysts with a set of simple, well-defined, and adaptable conceptual components to perform rich analysis tasks on heterogeneous evolving graphs.
The following figure introduces an ER-equivalent conceptual meta-model for evolving graphs. Multiplicity and relationship between nodes are modeled according to the UML model.
The paper explains in detail each of the components shown on the figure. Furthermore, it introduces the set of operators used to handle these data structures. The operators are selection, projection and traversal.
The combination of these operators is useful for multiple analysis scenarios such a trend forecast and recommendation. A set of examples is given along the paper.
Multidimensional Analysis of Graph Data
Current approaches for OLAP on graphs are limited to homogeneous and static graphs [1, 2]. However, real world graphs are heterogeneous and dynamic. We suggest to extend the current state of the art to support heterogeneous evolving graphs.
The fourth part of the paper suggests an overview of the multi-dimensional analysis stack of an evolving network. OLAP major concepts are redefined in the case of heterogeneous evolving graphs. Structures such as dimensions and measures as well as operations such as slice&dice and roll-up are defined using the structures using the conceptual model defined above.
What’s Next?
Multidimensional analysis of real graph data still presents many challenges. In this paper we laid the foundation for defining a data model for graphs analysis. Our focus is on the OLAP analysis level. We believe that a lot of research is to be done in order to investigate the ETL layer. Also a declarative user friendly language is needed to support MDX-like queries on top of graphs.
The conference will be a great occasion to meet and gather feedback from other researchers working on related topics.
The accepted scientific paper is available for download here.
For more details about any of the above topics, don’t hesitate to comment or email me!
Amine Ghrab
Twitter: @AmineGhrab
References
[1] Zhao, Peixiang, et al. “Graph cube: on warehousing and OLAP multidimensional networks.” Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, ACM (2011) 853–864
[2] Chen, Chen, et al. “Graph OLAP: a multi-dimensional framework for graph data analysis.” Knowledge and information systems 21.1 (2009): 41-63