Page 19 – Research Euranova

Arom: processing big data with data flow graphs and functional programming

The development in computational processing has driven towards distributed processing frameworks performing tasks in parallel setups. The recent advances in Cloud Computing have widely contributed to this tendency. The MapReduce model proposed by Google is one of the most popular despite the well-known limitations inherent to the model which constrain the types of jobs that can be expressed. On the other hand models based on Data Flow Graphs (DFG) for the processing and the definition of the jobs, while more complex to express, are more general and suitable for a wider range of tasks, including iterative and pipelined tasks. In this paper we present AROM, a framework for large scale distributed processing based on DFG to express the jobs and which uses paradigms from functional programming to define the operators. The former leads to more natural handling of pipelined tasks while the latter enhances genericity and reusability of the operators, as shown by our tests on a parallel and pipelined job performing the calculation of PageRank.

Nam-Luc Tran, Sabri Skhiri, Esteban Zimányi, and Arthur Lesuisse. AROM: Processing Big Data With Data Flow Graphs and Functional Programming, proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science, IEEE CloudCom 2012. IEEE Computer Society Press, Taipei, Taiwan, December 2012.

Click here to access the paper.

Papers

Euranova Academical Research 2012 Vintage

In total this academical year, five Master Theses and one internship have been conducted in collaboration with 3 different academical institutions in Belgium: Université Libre de Bruxelles (ULB), Université de Liège (ULg) and Université Catholique de Louvain (UCL). Let’s here quickly review their goals and contributions.

Blog

Data flow graph & stream processing closer than ever

This year we started a research project, named AROM, in collaboration with the Université Libre de Bruxelles. We wanted to evaluate a Data Flow Graph (DFG) processing framework and compare it to a traditional MapReduce (MR) one. In addition, we analyzed high level data manipulation languages such as PigLatin [1] and investigated whether a MR approach introduces an important overhead in the transformation of data operations to the physical execution plan. Going further, we analyzed a typical class of data analytic, the pipelines jobs and we studied the adequacy of DFG for ease of use and performances. Finally, we used functional programming in order to introduce higher order functions to simplify the expression of jobs and to promote the re-usability of operators.

Blog

Large Graph Mining

End of July, I was presenting a lecture to the European Business Intelligence Summer School organized by the Univeristé Libre de Bruxelles and the Ecole Centrale Paris. I presented a lecture on large graph mining. In this post I will quickly introduce this fascinating topic.

Academic collaborations

Large graph mining: recent developments, challenges and potential solutions

With the recent growth of the graph-based data, the large graph processing becomes more and more important. In order to explore and to extract knowledge from such data, graph mining methods, like community detection, is a necessity. The legacy graph processing tools mainly rely on single machine computational capacity, which cannot process large graphs with billions of nodes. Therefore, the main challenge of new tools and frameworks lies on the development of new paradigms that are scalable, efficient and flexible. In this paper, we review the new paradigms of large graph processing and their applications to graph mining domains using the distributed and shared nothing approach used for large data by internet players.

Sabri Skhiri, and Salim Jouili, Large Graph Mining: Recent Developments, Challenges and Potential Solutions, presentation during the European Business Intelligence Summer School (eBISS 2012) organized by the Université Libre de Bruxelles and the Ecole Centrale Paris, Brussels, Belgium, July 2012.

Click here to access the paper in its preprint form.

Papers

Releasing Okku

I am pleased to announce the release of the Okku library.

Okku is a Clojure wrapper around the Akka library. Very briefly, Akka brings the Actor model to the Scala programming language, and hence to the JVM. The Actor model is a model of computation based on small, asynchronous units that communicate only through message passing, without any shared memory.

Blog

Trust-based recommendation: an empirical analysis

The use of trust in recommender systems has been shown to improve the accuracy of rating predictions, especially in the case where a user’s rating significantly differs from the average. Different techniques have been used to incorporate trust into recommender systems, each showing encouraging results. However, the lack of trust information available in public datasets has limited the empirical analysis of these techniques and trust-based recommendation in general, with most analysis limited a single dataset.

In this paper, we provide a more complete empirical analysis of trust-based recommendation. By making use of a method that infers trust between users in a social graph, we are able to apply trust-based recommendation techniques to three separate datasets. From this, we measure the overall accuracy of each technique in terms of the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) as well as measuring the prediction coverage of each technique. We thus provide a comparison and analysis of each technique on all three datasets.

Daire O’Doherty, Salim Jouili, and Peter Van Roy, Trust-based recommendation: an empirical analysis, proceedings of the 6th ACM SIGKDD Workshop on Social Network Mining and Analysis SNA-KDD, Beijing, China, ACM, July 2012.

Click here to access the paper.

Papers

Structural trust inference for social recommendation

Every year EURA NOVA proposes and leads Master thesis with Faculty of Science Engineering of Belgian Universities. In this post I will quickly come back on one of those master thesis organized with the Polytechnic School of Louvain-La-Neuve and I will give an high level overview of the Daire O’Doherty’s work.

Academic collaborations

Towards trust inference from bipartite social networks

The emergence of trust as a key link between users in social networks has provided an effective means of enhancing the personalization of online user content. However, the availability of such trust information remains a challenge to the algorithms that use it, as the majority of social networks do not provide a means of explicit trust feedback. This paper presents an investigation into the inference of trust relations between actor pairs of a social network, based solely on the structural information of the bipartite graph typical of most on-line social networks. Using intuition inspired from real life observations, we argue that the popularity of an item in a social graph is inversely related to the level of trust between actor pairs who have rated it. From an existing bipartite social graph, this method computes a new social graph, linking actors together by means of symmetric weighted trust relations. Through a set of experiments performed on a real social network dataset, our method produces statistically significant results, showing strong trust prediction accuracy.

Daire O’Doherty, Salim Jouili, and Peter Van Roy, Towards trust inference in bipartite social networks, proceedings of the 2d ACM SIGMOD Workshop on Databases and Social Networks, DBSocial 2012, Scottsdale, USA, ACM, June 2012.

Click here to access the paper.

Papers

Hypergraph-based image retrieval for graph-based representation

In this paper, we introduce a novel method for graph indexing. We propose a hypergraph-based model for graph data sets by allowing cluster overlapping. More precisely, in this representation one graph can be assigned to more than one cluster. Using the concept of the graph median and a given threshold, the proposed algorithm detects automatically the number of classes in the graph database. We consider clusters as hyperedges in our hypergraph model and we index the graph set by the hyperedge centroids. This model is interesting to traverse the data set and efficient to retrieve graphs.

Click here to access the paper.

Papers

Arom: processing big data with data flow graphs and functional programming

Euranova Academical Research 2012 Vintage

Data flow graph & stream processing closer than ever

Large Graph Mining

Large graph mining: recent developments, challenges and potential solutions

Releasing Okku

Trust-based recommendation: an empirical analysis

Structural trust inference for social recommendation

Towards trust inference from bipartite social networks

Hypergraph-based image retrieval for graph-based representation

SERVE

Expertise

CRAFT

digazu

CONTACT

Belgium

France

Tunisia

CAREER

Job Offers

Social media