An empirical comparison of graph databases

In recent years, more and more companies provide services that can not be anymore achieved efficiently using relational databases. As such, these companies are forced to use alternative database models such as XML databases, object-oriented databases, document-oriented databases and, more recently graph databases. Graph databases only exist for a few years. Although there have been some comparison attempts, they are mostly focused on certain aspects only.
In this paper, we present a distributed graph database comparison framework and the results we obtained by comparing four important players in the graph databases market: Neo4j, OrientDB, Titan and DEX.

 

Salim Jouili, and Valentin Vansteenberghe, An empirical comparison of graph databases, proceedings of the 2013 ASE/IEEE International Conference on Big Data, Washington D.C., USA, September 2013.

Click here to access the paper.

An analytics-aware conceptual model for evolving graphs

Graphs are ubiquitous data structures commonly used to represent highly connected data. Many real-world applications, such as social and biological networks, are modeled as graphs. To answer the surge for graph data management, many graph database solutions were developed. These databases are commonly classified as NoSQL graph databases, and they provide better support for graph data management than their relational counterparts. However, each of these databases implement their own operational graph data model, which differ among the products. Further, there is no commonly agreed conceptual model for graph databases.
In this paper, we introduce a novel conceptual model for graph databases. The aim of our model is to provide analysts with a set of simple, welldefined, and adaptable conceptual components to perform rich analysis tasks. These components take into account the evolving aspect of the graph. Our model is analytics-oriented, flexible and incremental, enabling analysis over evolving graph data. The proposed model provides a typing mechanism for the underlying graph, and formally defines the minimal set of data structures and operators needed to analyze the graph.

Amine Ghrab, Sabri Skhiri, Salim Jouili, and Esteban Zimányi, An Analytics-Aware Conceptual Model For Evolving Graphs, proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery – DaWak 2013, Prague, Czech Republic, August 2013.

Click here to access the paper.

Towards a standards-based cloud service manager

Migrating services to the cloud brings all the benefits of elasticity, scalability and cost-cutting. However, migrating services among different cloud infrastructures or outside of the cloud is not an obvious task. In addition, distributing services among multiple cloud providers, or on a hybrid installation requires a custom implementation effort that must be repeated at each infrastructure change. This situation raises the lock-in problem and discourages cloud adoption. Cloud computing open standards were designed to face this situation and to bring interoperability and portability to cloud environments. However, they target isolated resources, and do not take into account the notion of complete services. In this paper, we introduce an extension to OCCI, a cloud computing open standard, in order to support complete service definition and management automation. We support this proposal with an open-source framework for service management through compliant cloud infrastructures.

Amine Ghrab, Sabri Skhiri, Hervé Kœner, and Guy Ledu, Towards A Standards-Based Cloud Service Manager, proceedings of the 3rd International Conference on Cloud Computing and Services Science, CLOSER 2013, Aachen, Germany, May 2013.

Click here to access the paper.

Arom: processing big data with data flow graphs and functional programming

The development in computational processing has driven towards distributed processing frameworks performing tasks in parallel setups. The recent advances in Cloud Computing have widely contributed to this tendency. The MapReduce model proposed by Google is one of the most popular despite the well-known limitations inherent to the model which constrain the types of jobs that can be expressed. On the other hand models based on Data Flow Graphs (DFG) for the processing and the definition of the jobs, while more complex to express, are more general and suitable for a wider range of tasks, including iterative and pipelined tasks. In this paper we present AROM, a framework for large scale distributed processing based on DFG to express the jobs and which uses paradigms from functional programming to define the operators. The former leads to more natural handling of pipelined tasks while the latter enhances genericity and reusability of the operators, as shown by our tests on a parallel and pipelined job performing the calculation of PageRank.

Nam-Luc Tran, Sabri Skhiri, Esteban Zimányi, and Arthur Lesuisse. AROM: Processing Big Data With Data Flow Graphs and Functional Programming, proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science, IEEE CloudCom 2012. IEEE Computer Society Press, Taipei, Taiwan, December 2012.

Click here to access the paper.

Large graph mining: recent developments, challenges and potential solutions

With the recent growth of the graph-based data, the large graph processing becomes more and more important. In order to explore and to extract knowledge from such data, graph mining methods, like community detection, is a necessity. The legacy graph processing tools mainly rely on single machine computational capacity, which cannot process large graphs with billions of nodes. Therefore, the main challenge of new tools and frameworks lies on the development of new paradigms that are scalable, efficient and flexible. In this paper, we review the new paradigms of large graph processing and their applications to graph mining domains using the distributed and shared nothing approach used for large data by internet players.

 

Sabri Skhiri, and Salim Jouili, Large Graph Mining: Recent Developments, Challenges and Potential Solutions, presentation during the European Business Intelligence Summer School (eBISS 2012) organized by the Université Libre de Bruxelles and the Ecole Centrale Paris, Brussels, Belgium, July 2012.

Click here to access the paper in its preprint form.

Trust-based recommendation: an empirical analysis

The use of trust in recommender systems has been shown to improve the accuracy of rating predictions, especially in the case where a user’s rating significantly differs from the average. Different techniques have been used to incorporate trust into recommender systems, each showing encouraging results. However, the lack of trust information available in public datasets has limited the empirical analysis of these techniques and trust-based recommendation in general, with most analysis limited a single dataset.

In this paper, we provide a more complete empirical analysis of trust-based recommendation. By making use of a method that infers trust between users in a social graph, we are able to apply trust-based recommendation techniques to three separate datasets. From this, we measure the overall accuracy of each technique in terms of the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) as well as measuring the prediction coverage of each technique. We thus provide a comparison and analysis of each technique on all three datasets.

Daire O’Doherty, Salim Jouili, and Peter Van Roy, Trust-based recommendation: an empirical analysis, proceedings of the 6th ACM SIGKDD Workshop on Social Network Mining and Analysis SNA-KDD, Beijing, China, ACM, July 2012.

Click here to access the paper.

Towards trust inference from bipartite social networks

The emergence of trust as a key link between users in social networks has provided an effective means of enhancing the personalization of online user content. However, the availability of such trust information remains a challenge to the algorithms that use it, as the majority of social networks do not provide a means of explicit trust feedback. This paper presents an investigation into the inference of trust relations between actor pairs of a social network, based solely on the structural information of the bipartite graph typical of most on-line social networks. Using intuition inspired from real life observations, we argue that the popularity of an item in a social graph is inversely related to the level of trust between actor pairs who have rated it. From an existing bipartite social graph, this method computes a new social graph, linking actors together by means of symmetric weighted trust relations. Through a set of experiments performed on a real social network dataset, our method produces statistically significant results, showing strong trust prediction accuracy.

Daire O’Doherty, Salim Jouili, and Peter Van Roy, Towards trust inference in bipartite social networks, proceedings of the 2d ACM SIGMOD Workshop on Databases and Social Networks, DBSocial 2012, Scottsdale, USA, ACM, June 2012.

Click here to access the paper.

Hypergraph-based image retrieval for graph-based representation

In this paper, we introduce a novel method for graph indexing. We propose a hypergraph-based model for graph data sets by allowing cluster overlapping. More precisely, in this representation one graph can be assigned to more than one cluster. Using the concept of the graph median and a given threshold, the proposed algorithm detects automatically the number of classes in the graph database. We consider clusters as hyperedges in our hypergraph model and we index the graph set by the hyperedge centroids. This model is interesting to traverse the data set and efficient to retrieve graphs.

Salim Jouili, and Salvatore Tabbone, Hypergraph-based image retrieval for graph-based representation. Journal of the Pattern Recognition Society, April 2012. © 2012 Elsevier Ltd.

Click here to access the paper.

EQS: an elastic and scalable message queue for the cloud

With the emergence of cloud computing, on-demand resources usage is made possible. This allows applications to elastically scale out according to the load. One design pattern that suits this paradigm is the event-driven architecture (EDA) in which messages are sent asynchronously between distributed application instances using message queues. However, existing message queues are only able to scale for a certain number of clients and are not able to scale out elastically. We present the Elastic Queue Service (EQS), an elastic message queue architecture and a scaling algorithm which can be adapted to any message queue in order to make it scale elastically. EQS architecture is layered onto multiple distributed components and its management components can be integrated with the cloud infrastructure management. We have implemented a prototype of EQS and deployed it on a cloud infrastructure. A series of load testings have validated our elastic scaling algorithm and show that EQS is able to scale out in order to adapt to an applied load. We then discuss about the elastic scaling of the management layers of EQS and their possible integration with the cloud infrastructure management.

Nam-Luc Tran, Sabri Skhiri, and Esteban Zimány, EQS: An Elastic and Scalable Message Queue for the Cloud, proceedings of the 3rd International IEEE conference on Cloud computing technology and science (IEEE CloudCom 2011), Athens, Greece, November 2011.

Click here to access the paper.

Governance issues on heavy models in an industrial context

SWIFT is a member-owned cooperative providing secure messaging capabilities to the financial services industry. One critical mission of SWIFT is the standardization of the message flows between the industry players. The model-driven approach naturally came as a solution to the management of these message definitions. However, one of the most important challenges that SWIFT has been facing is the global governance of the message repository and the management of each element. Nowadays modeling tools exist but none of them enables the management of the complete life-cycle of the message models. In this paper wepresent the challenges that SWIFT had to face in the development of a dedicated platform.

 

Sabri Skhiri, Marc Delbaere, Yves Bontemps, Grégoire de Hemptinne, and Nam-Luc Tran, Governance issues on heavy models in an industrial context. Advances in Conceptual Modeling. Recent Developments and New Directions ER 2011, Brussels, Belgium, November 2011.

Click here to access the paper.