Skip to content

An empirical comparison of graph databases

In this blog post we briefly describe our new contribution to the big data domain, especially, graphDB benchmarking. This work was accepted for publication at 2013 ASE/IEEE International Conference on Big Data.

In this work, we presented a distributed graph database benchmarking framework. We used this tool to analyze the performance of four graph databases: Neo4j 1.9M05 , Titan 0.3, OrientDB 1.3 and DEX 4.7.

We developed a Java distributed benchmarking framework, in order to test and compare different Blueprints-compliant graph databases. This tool can be used to simulate real graph database workloads with any number of concurrent clients performing any type of operation on any type of graph.

The main purpose of our solution is to objectively compare graph databases using usual graph operations. Indeed, some operations like exploring a node neighborhood, finding the shortest path between two nodes, or simply getting all vertices that share a specific property are frequent when working with graph databases. It can thus be interesting to compare their behavior when performing this type of operation.

Concretely, we compared the performance of the graphDB by means of different workloads. A workload represents a unit of work for a graph database. We defined three types of workloads:

1. Load workload

2. Traversal workload

  • Shortest path workload
  • Neighborhood exploration workload

3. Intensive workload

  • GET vertices/edges by ID
  • GET vertices/edges by property
  • GET vertices by ID and UPDATE property
  • GET two vertices and ADD an edge between them

Our solution works as follows: the user defines a benchmark, that first contains a list of databases to compare and then a series of operations, the workloads to realize on each database. This benchmark is then executed by a module called Operational Module, whose responsibility is to start the databases and measure the time required to perform the operations specified in the benchmark. Finally, the Statistics Module gathers and aggregates all these measures and produces a summary report together with some results visualizations.
See the following figure:


If you’re interested to know more about the results, please read the paper “An empirical comparison of graph databases” and/or contact us.

Salim Jouili
Twitter: @jouilis

Releated Posts

The Building Blocks of a Responsible AI Practice: An Outlook on the Current Landscape

Responsible AI comes with the challenge of implementation. This survey aims to bridge the gap between principles and practice through a study of different approaches taken in the literature and the proposition of a foundational framework.
Read More

TS-Relax : Interprétation des représentations apprises pour les séries temporelles

Les modèles d’apprentissage de représentations sont de plus en plus utilisés, mais des modèles d’IA explicables et de confiance sont nécessaires. Ce travail présente l’adaptation aux séries temporelles d’une méthode d’interprétation de représentation initialement conçue pour les images.
Read More