imGraph: A distributed in-memory graph database

Data architecture, Engineering

August 14, 2013

In this blog post we briefly describe our new contribution to the big data domain, especially, graph storage and querying. This work was accepted for publication at 2013 ASE/IEEE International Conference on Big Data.

Introduction
Big Data has to deal with two key issues: the growing size of the datasets and the increase of data complexity. Alternative database models such as graph databases are more and more used to address this second problem. Indeed, graphs can be used to model many interesting problems. In fact, managing graph-structured data has gained significant attention in both industrial and research communities (see, e.g., a survey [1]). This is due to the emergence of modern applications whose data structure is naturally represented as graphs in areas such as social networks, transportation, biological networks and semantic webs. In fact, the graph structure provides a flexible representation for modeling highly-connected and dynamic data.

The managing and processing of large graphs can be very challenging because the huge amount of data causes a high I/O latency. As data access on graphs generally requires random data access (no locality), there is a high number of I/O operations in traversal processing. Although memory caching can be used to reduce the number of I/O operations, the no locality of data access on graphs reduces the efficiency of those caches on large graphs because the cache contents will be frequently modified so the database engine will have to perform several data reads on disk.

Eura Nova contribution
Having these challenges in mind, we introduce a new graph database system called imGraph. We have considered the random access requirement for large graphs as a key factor on deciding the type of storage. Then, we have designed a graph database where all data is stored in memory so the speed of random access is maximized. However, as large graphs can not be completely loaded in the RAM of a single machine, we designed imGraph as distributed graph database. That is, the vertices and the edges are partitioned into subsets, and each subset is located in the memory of one machine belonging to the involved machines (see the following figure). Furthermore, we implemented on imGraph a graph traversal engine that takes advantage of distributed parallel computing and fast in-memory random access to gain performance.

Illustration 1: A simple example of distributed graph storage

The next figure shows the flow of messages among the machines of the cluster when a traversal is processed. The flow starts with a traversal request message sent from the client to the traversal manager. Then, the traversal manager sends a search request message to Machine B which in turn sends search request messages to machines A and C. Then, machines A and C process the received request messages, machine A sends an additional search request message to machine C. Later, all the machines send response message to the traversal manager, one response message is sent for each request message processed. Finally, the traversal manager sends a traversal response to the client.

Illustration 2: Flow of messages during a traversal in imGraph

Some results
The experiments were performed using four graph datasets: soc-Epinions (75K vertices and 508K edges) – a directed graph representing the who-trust-whom online social network of a general consumer review site Epinions.com; youTube (1.1M vertices and 3M edges) – an undirected graph representing the Youtube social network; ukgraph (4.5M vertices and 66M edges) – a directed graph representing a segment of a snapshot of the web network in UK; and soc-LiveJournal (4.8M vertices and 69M edges) – a directed graph representing LiveJournal (a free online community with almost 10 million members with a significant fraction highly active members).

Next figure shows the results (log-scale) of the traversal tests performed on the four datasets: the average execution time of 200 traversals at maximum 3 hops between two vertices having at least one path. We compare our results achieved in 5 and 10 machines to Neo4j and Titan in 5 and 10 machines.

NoPath3H — Illustration 3: Performance results of the Traversal – 3 Hops

If we compare the results of imGraph and Neo4J we can observe that Neo4J is faster than our implementation only when the smallest dataset (75K vertices) is tested; when larger datasets are tested, imGraph gets better results than Neo4J (up to x200 faster for ukgraph). We observe that Neo4J’s performance is more negatively affected by the large graphs than imGraph’s performance. When the graph is not too large, Neo4J is able to load a considerable part of it into the cache so the traversal will be done almost in memory. However, if the graph is larger, Neo4J will be able to load in the cache only a small portion of the graph so it will have to read from the disk more frequently.

Regarding Titan and imGraph, both using a distributed storage, we can observe that imGraph performs better (up to x150 faster for soc-LiveJournal) in all the traversal configurations for all the datasets. In this case, imGraph’s distributed parallel computing makes the difference. As Titan traversals run on a single machine there is a considerable amount of network communication to obtain graph data; furthermore, the computation power of Titan is limited to the capacity of a single machine. Another factor that makes imGraph perform better is the faster random access provided by the imGraph’s in-memory storage. Although Cassandra provides a caching functionality, this cache cannot store all data from a large graph so many disk reads will be required.

Conclusion
imGraph is a distributed graph database fully in-memory and optimized for complicated graph operations (e.g. Traversals). It provides very promising results comparing to alternative graphDB solutions, it implements the well-known blueprints-API from tinkerpop.

If you’re interested to know more about imGraph, please read the paper imGraph: A distributed in-memory graph database[2] and/or contact us.

Salim Jouili
Twitter: @jouilis
E-mail: [email protected]

References

[1] R. Angles and C. Gutiérrez. Survey of graph database models. ACM Comput. Surv., 40(1), 2008.
[2] Salim Jouili and Aldemar Reynaga, imGraph: A distributed in-memory graph database. To appear in Proceedings of the 2013 ASE/IEEE International Conference on Big Data, Washington D.C., USA, September 2013.

Releated Posts

Insights from GTC Paris 2025

25.06.2025 / Engineering / Blog, Event

Among the NVIDIA GTC Paris crowd was our CTO Sabri Skhiri, and from quantum computing breakthroughs to the full-stack AI advancements powering industrial digital twins and robotics, there is a lot to share! Explore with Sabri GTC 2025 trends, keynotes, and what it means for businesses looking to innovate.

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

23.05.2025 / Engineering / Academic collaborations, Papers

Tumor tracking in PET/CT is essential for monitoring cancer progression and guiding treatment strategies. Traditionally, nuclear physicians manually track tumors, focusing on the five largest ones (PERCIST criteria), which is both time-consuming and imprecise. Automated tumor tracking can allow matching of the numerous metastatic lesions across scans, enhancing tumor change monitoring.

imGraph: A distributed in-memory graph database

References

Releated Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Recent Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Insights from Data & AI Tech Summit Warsaw 2025

Insights From Flink Forward 2024

Tracks

Mjolnir

Rune

Vadgelmir

Yggdrasil

Field of expertises

Data architecture

Data governance

Data science

Engineering

Academic collaboration

SERVE

Expertise

CRAFT

digazu

CONTACT

Belgium

France

Tunisia

CAREER

Job Offers

Social media