NAVI GATIONSEARCH BOX
Join us on LinkedIn Follow us on Twitter
Eura Nova RD
Eura Nova

Activity

In this section you will find EURA NOVA’s latest news and activities.

Activity

02-11-2015

Graph Data Management: Status and Trends

Today’s social environments are getting more interconnected and the business market is becoming increasingly open and competitive. Organisations require a better awareness of their state and an accurate prediction of their evolution. To cope with this surging demand, new models and tools need to be developed. In my opinion, graph models are of a crucial interest for addressing these challenges.


To get a better understanding of the broad field of graph management, I attended classes at the EDBT Summer School. The classes dealt with graph data management and analytics.
The school was held at “Mas Gorgoll” house (see picture), in the wonderful coastal town of Palamos on the Costa Brava in Spain. The lectures were given by leading researchers and experts in graph management. Participants were highly qualified and were mostly PhD students and industrial experts.

Venue of the summer school

In this post, I’ll share my thoughts on the current status of graph management, discuss some open questions, and suggest promising research directions.

Current Research Directions

Graphs have the advantage of naturally supporting the content and leveraging the complex relationships between entities. Thus, generating deeper insights from the data and providing clear competitive advantage.

Graph Applications

The scope of domains and applications covered by graphs is interestingly large. Various fields such as social, informational and transportation networks can be naturally represented with graph structures [1]. Complex real-world problems, such as intelligent transportation, social and biological network analysis, trends prediction and product recommendation can be efficiently solved using graph algorithms [2]. Consequently, the topic of graph management and analytics was positively received by the industry. Gartner qualifies graph analytics as “possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture” [3]. Forrester Research predicts that by 2017, 25% of enterprises will use graph databases [4].

Graphs vs Relational Model

The relational model is a well-established and accepted model for data management. It is therefore totally legitimate to first study whether a relational system is well-fitted to handle graph workloads. Using the relational model, tables can be used to represent each type of node and edge. This flexibility is due to the fact that conceptually, any model can be projected and represented using the structures of another. Afterwards, graph traversals can be simulated through joins. However, in my opinion, although the relational model and tools can handle graphs to some extent, they pose serious limitations to the intuitiveness of modelling and the performance of processing. Naturally, graph workload is centred around random graph traversals and matching operations that are very different from the relational workload, rather optimised for large sequential scans and aggregations. Therefore, graph management tools should be developed as standalone vertical systems that integrate and complement their relational counterparts.

Graphs Database Management and Analytics

As standalone systems, graph data management tools are required to provide intuitive models, reliable tools, and efficient frameworks to enable end-users extract rich insights from their graph data.

From the modelling perspective, two major representations are currently adopted by the graph management community. The first one is the semantic web stack (made of the standard RDF data model, ontologies and the SPARQL query language). The initial purpose of this representation is to enable interoperability and reasoning on semantic web data. The graph is commonly represented using RDF triples and stored in Triplestores. The second representation is property graphs. In my opinion, this is a less standard, but more natural way to represent and query graph data. Property graphs underpin most commercial graph databases such as Neo4j and Titan.

In graphs, relationships are considered as first-class citizens. Therefore, graph processing comes with hard challenges that are currently tackled by the scientific community. For example, problems such as graph colouring, graph pattern matching or graph partitioning are all known to be NP-complete. For graphs of relatively reasonable size (millions of nodes and edges), graph databases implement efficient indexing mechanisms to accelerate the traversal and matching operations. Some specific frameworks focus on efficient processing in single machine by leveraging modern hardware, such as GraphChi and Oracle PGX.

However, in the case of large graphs that cannot fit in a single machine, the issue of distributed processing of graphs is more complex. Graph computation is structure-driven, and exhibits poor memory locality. This results in difficult parallelism and causes transfer issues within distributed processing environments. This has called for designing novel programming models. Depending on the traversal characteristics, graph processing can be vertex-centric, edge-centric or graph-centric. The computation is performed either in a synchronous mode such as in BSP (Bulk Synchronous Parallel) implemented by Pregel-like frameworks, or in an asynchronous mode such as GraphLab.

Another hot topic in graph management is graph visualisation. In graphs, relationships receive equal or even more attention than entities themselves. Therefore, choosing the graph layout best suited to help the human eye capture interesting graph patterns is a tricky problem. Multiple approaches focus on clustering the entities and reducing intersection between relationships in a 2-D representation. Speaking about visualisation, I noticed that a lot of people still mix up graphs with graphics. I believe that this tendency will fade away as graph management matures.

What’s next in graph data management ?

Although a lot of work has been devoted to advance the state of the art on graph data management, many questions are still open.

At the database modelling level, there is still no completely defined graph database model, and no standard graph query language. Reference graph benchmarks are also required to help end-users choose the suitable products according to their specific needs. The LDBC council is leading interesting effort in this direction. Another promising research direction is the inclusion of the wide knowledge developed on graph theory and graph mining for the service of graph management. There is also a need to enable a seamless integration of graph management systems within current enterprise information systems. Reference tools such as data warehouses, geographical systems, master data management can evolve considerably if they support graph data.

Finally, the key question of the positioning of graphs and their critical domains and business applications needs to be investigated thoroughly to help the industry adopt them.

Graphs at EURA NOVA R&D

Our research team at EURA NOVA has been working on many of the challenges presented above. We designed models for graph database modelling [5] and OLAP analysis [6]. We built frameworks for graph management [7] and benchmarking [8]. We also investigated advanced graph applications such as the use of graphs for recommendation [9] and image retrieval [10].

We keep working with our industrial partners to bring efficient graph-based solutions that support their business and provide them with a true competitive advantage.

We are pursuing our research projects in promising directions such as distributed graph processing and graph BI. We are currently offering a multitude of master thesis and internships under our academic programme. So if you feel excited about graphs, and would like to work in a unique environment where you can apply your academic skills on real industrial problems, please apply here !

References

[1] Mark Newman. Networks: an introduction. Oxford University Press, 2010.

[2] David Jonker and Richard Brath. Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data. Wiley, 2015.

[3] IT Market Clock for Database Management Systems, 2014

[4] TechRadar™: Enterprise DBMS, Q1 2014

[5] Amine Ghrab, S. Skhiri dit Gabouje, Salim Jouili, and E. Zimányi, An Analytics-Aware Conceptual Model For Evolving Graphs. In Proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery – DaWak 2013, Prague, Czech Republic, August 2013.

[6] Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro Vaisman, and Esteban Zimany, A Framework for Builidng OLAP Cubes on Graphs. In Proceedings of the 19th East-European Conference on Advances in Databases and Information Systems, Poitiers, France, September 2015.

[7] Salim Jouili, and Aldemar Reynaga, imGraph: A distributed in-memory graph database. In Proceedings of the 2013 ASE/IEEE International Conference on Big Data, Washington D.C., USA, September 2013.

[8] Salim Jouili, and Valentin Vansteenberghe, An empirical comparison of graph databases. In Proceedings of the 2013 ASE/IEEE International Conference on Big Data, Washington D.C., USA, September 2013.

[9] Daire O’Doherty, Salim Jouili, and Peter Van Roy, Towards trust inference in bipartite social networks. In Proceedings of the 2d ACM SIGMOD Workshop on Databases and Social Networks, DBSocial 2012, Scottsdale, USA, ACM, June 2012.

[10] Salim Jouili, and Salvatore Tabbone, Hypergraph-based image retrieval for graph-based representation. Journal of the Pattern Recognition Society, April 2012. © 2012 Elsevier Ltd.

Leave a Reply