Due to the increasing importance and volume of highly interconnected data, such as in social or information networks, a plethora of graph mining techniques have been designed to enable the analysis of such data. In this work, we focus on the mining of associations between entity features in networks. We model each entity feature as a dimension to be analyzed. Consequently we build our approach on top of the existing graph cube framework which is an extension of the concept of the data cube to networks. Our task is particularly challenging because it requires the analysis of both the initial multidimensional network and all its subsequent aggregate forms. As soon as we deal with a big data situation it is impossible for an analyst to consider manually all the possible views of the network data. The aim of this work is to design an algorithm for the discovery of interesting patterns in large graph cubes. Thus, instead of examining all the possible aggregations manually, the proposed technique leads the analyst to the interesting associations or patterns in the multidimensional network. Furthermore, we study the application of existing algorithms from the frequent itemset mining literature on graph data and propose a mapping between the two settings.
Florian Demesmaeker, Amine Ghrab, Siegfried Nijssen, Sabri Skhiri: Discovering interesting patterns in large graph cubes. 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 2017, pp. 3322-3331.
Graphs are widespread structures providing a powerful abstraction for modeling networked data. Large and complex graphs have emerged in various domains such as social networks, bioinformatics, and chemical data. However, current warehousing frameworks are not equipped to handle efficiently the multidimensional modeling and analysis of complex graph data. In this paper, we propose a novel framework for building OLAP cubes from graph data and analyzing the graph topological properties. The framework supports the extraction and design of the candidate multidimensional spaces in property graphs. Besides property graphs, a new database model tailored for multidimensional modeling and enabling the exploration of additional candidate multidimensional spaces is introduced. We present novel techniques for OLAP aggregation of the graph, and discuss the case of dimension hierarchies in graphs.
Furthermore, the architecture and the implementation of our graph warehousing framework are presented and show the effectiveness of our approach.
Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro Vaisman, and Esteban Zimany, A Framework for Builidng OLAP Cubes on Graphs, proceedings of the 19th East-European Conference on Advances in Databases and Information Systems, Poitiers, France, September 2015.
Graphs are a fundamental structure for modeling many real world domains and applications. They have emerged in various fields such as social, informational and transportation networks. The hetero geneity and dynamicity of these networks pose challenges to traditional techniques for data modeling, storage and analysis of data.
Managing graph-structured data using native graph structures and algorithms is the key for its efficient analysis. Therefore, the graph should be modeled using nodes and edges, and explored using graph algorithms, such as pattern matching and k-neighborhood.
In this paper, we introduce a novel model for management of graph data. The aim of our model is to provide analysts with a set of simple, well-defined, and adaptable components to perform complex graph modeling and analysis tasks.
Amine Ghrab, Oscar Romero, Sabri Skhiri, and Esteban Zimanyi, Analytics-Aware Graph Database Modeling, EURA NOVA technical series.
The importance of graphs as the fundamental structure underpinning many real world applications is no longer to be proved. Large graphs have emerged in various fields such as biological, social and transportation networks. The sheer volume of these networks poses challenges to traditional techniques for storage and analysis of graph data. In particular, OLAP analysis requires access to large portions of data to extract key information and to feed strategic decision making. OLAP provides multilevel, multiperspective views of the data. Most of the current techniques are optimized for centralized graph processing. A distributed approach providing horizontal scalability is required in order to handle the analysis workload.
In this paper, we focus on applying OLAP analysis on large, distributed graph data. We describe Distributed Graph Cube, our distributed framework for graph-based OLAP cubes computation and aggregation. Experimental results on large, real-world datasets demonstrate that our method significantly outperforms its centralized counterparts. We also evaluate the performance of both Hadoop and Spark for distributed cubes computations.
Benoît Denis, Amine Ghrab, and Sabri Skhiri, A Distributed Approach for Graph-Oriented Multidimensional Analysis, proceedings of the 2013 IEEE International Conference on Big Data, Santa Clara, CA, USA, October 2013.
Graphs are ubiquitous data structures commonly used to represent highly connected data. Many real-world applications, such as social and biological networks, are modeled as graphs. To answer the surge for graph data management, many graph database solutions were developed. These databases are commonly classified as NoSQL graph databases, and they provide better support for graph data management than their relational counterparts. However, each of these databases implement their own operational graph data model, which differ among the products. Further, there is no commonly agreed conceptual model for graph databases.
In this paper, we introduce a novel conceptual model for graph databases. The aim of our model is to provide analysts with a set of simple, welldefined, and adaptable conceptual components to perform rich analysis tasks. These components take into account the evolving aspect of the graph. Our model is analytics-oriented, flexible and incremental, enabling analysis over evolving graph data. The proposed model provides a typing mechanism for the underlying graph, and formally defines the minimal set of data structures and operators needed to analyze the graph.
Amine Ghrab, Sabri Skhiri, Salim Jouili, and Esteban Zimányi, An Analytics-Aware Conceptual Model For Evolving Graphs, proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery – DaWak 2013, Prague, Czech Republic, August 2013.
With the recent growth of the graph-based data, the large graph processing becomes more and more important. In order to explore and to extract knowledge from such data, graph mining methods, like community detection, is a necessity. The legacy graph processing tools mainly rely on single machine computational capacity, which cannot process large graphs with billions of nodes. Therefore, the main challenge of new tools and frameworks lies on the development of new paradigms that are scalable, efficient and flexible. In this paper, we review the new paradigms of large graph processing and their applications to graph mining domains using the distributed and shared nothing approach used for large data by internet players.
Sabri Skhiri, and Salim Jouili, Large Graph Mining: Recent Developments, Challenges and Potential Solutions, presentation during the European Business Intelligence Summer School (eBISS 2012) organized by the Université Libre de Bruxelles and the Ecole Centrale Paris, Brussels, Belgium, July 2012.
The use of trust in recommender systems has been shown to improve the accuracy of rating predictions, especially in the case where a user’s rating significantly differs from the average. Different techniques have been used to incorporate trust into recommender systems, each showing encouraging results. However, the lack of trust information available in public datasets has limited the empirical analysis of these techniques and trust-based recommendation in general, with most analysis limited a single dataset.
In this paper, we provide a more complete empirical analysis of trust-based recommendation. By making use of a method that infers trust between users in a social graph, we are able to apply trust-based recommendation techniques to three separate datasets. From this, we measure the overall accuracy of each technique in terms of the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) as well as measuring the prediction coverage of each technique. We thus provide a comparison and analysis of each technique on all three datasets.
Daire O’Doherty, Salim Jouili, and Peter Van Roy, Trust-based recommendation: an empirical analysis, proceedings of the 6th ACM SIGKDD Workshop on Social Network Mining and Analysis SNA-KDD, Beijing, China, ACM, July 2012.
The emergence of trust as a key link between users in social networks has provided an effective means of enhancing the personalization of online user content. However, the availability of such trust information remains a challenge to the algorithms that use it, as the majority of social networks do not provide a means of explicit trust feedback. This paper presents an investigation into the inference of trust relations between actor pairs of a social network, based solely on the structural information of the bipartite graph typical of most on-line social networks. Using intuition inspired from real life observations, we argue that the popularity of an item in a social graph is inversely related to the level of trust between actor pairs who have rated it. From an existing bipartite social graph, this method computes a new social graph, linking actors together by means of symmetric weighted trust relations. Through a set of experiments performed on a real social network dataset, our method produces statistically significant results, showing strong trust prediction accuracy.
Daire O’Doherty, Salim Jouili, and Peter Van Roy, Towards trust inference in bipartite social networks, proceedings of the 2d ACM SIGMOD Workshop on Databases and Social Networks, DBSocial 2012, Scottsdale, USA, ACM, June 2012.
In this paper, we introduce a novel method for graph indexing. We propose a hypergraph-based model for graph data sets by allowing cluster overlapping. More precisely, in this representation one graph can be assigned to more than one cluster. Using the concept of the graph median and a given threshold, the proposed algorithm detects automatically the number of classes in the graph database. We consider clusters as hyperedges in our hypergraph model and we index the graph set by the hyperedge centroids. This model is interesting to traverse the data set and efficient to retrieve graphs.
Salim Jouili, and Salvatore Tabbone, Hypergraph-based image retrieval for graph-based representation. Journal of the Pattern Recognition Society, April 2012. © 2012 Elsevier Ltd.