Skip to content

An empirical comparison of graph databases

In this blog post we briefly describe our new contribution to the big data domain, especially, graphDB benchmarking. This work was accepted for publication at 2013 ASE/IEEE International Conference on Big Data.

In this work, we presented a distributed graph database benchmarking framework. We used this tool to analyze the performance of four graph databases: Neo4j 1.9M05 , Titan 0.3, OrientDB 1.3 and DEX 4.7.

We developed a Java distributed benchmarking framework, in order to test and compare different Blueprints-compliant graph databases. This tool can be used to simulate real graph database workloads with any number of concurrent clients performing any type of operation on any type of graph.

The main purpose of our solution is to objectively compare graph databases using usual graph operations. Indeed, some operations like exploring a node neighborhood, finding the shortest path between two nodes, or simply getting all vertices that share a specific property are frequent when working with graph databases. It can thus be interesting to compare their behavior when performing this type of operation.

Concretely, we compared the performance of the graphDB by means of different workloads. A workload represents a unit of work for a graph database. We defined three types of workloads:

1. Load workload

2. Traversal workload

  • Shortest path workload
  • Neighborhood exploration workload

3. Intensive workload

  • GET vertices/edges by ID
  • GET vertices/edges by property
  • GET vertices by ID and UPDATE property
  • GET two vertices and ADD an edge between them


Architecture:
Our solution works as follows: the user defines a benchmark, that first contains a list of databases to compare and then a series of operations, the workloads to realize on each database. This benchmark is then executed by a module called Operational Module, whose responsibility is to start the databases and measure the time required to perform the operations specified in the benchmark. Finally, the Statistics Module gathers and aggregates all these measures and produces a summary report together with some results visualizations.
See the following figure:

GDBoverviewReduced

If you’re interested to know more about the results, please read the paper “An empirical comparison of graph databases” and/or contact us.

Salim Jouili
Twitter: @jouilis
E-mail: salim.jouili@euranova.eu

Releated Posts

We Collaborate on the TAUDoS Project

We started a new collaboration with Aix-Marseille University, Montreal University, Nantes University, and St-Etienne on a four-year project called TAUDoS, which focuses on Trustful AI.
Read More

DEBS 2022

In June 2022, our research director Sabri Skhiri and the head of the data science department Madalina Ciortan travelled to Copenhagen to attend DEBS 2022, the leading conference focusing on distributed and event-based systems.
Read More