Data flow graph & stream processing closer than ever

This year we started a research project, named AROM, in collaboration with the Université Libre de Bruxelles. We wanted to evaluate a Data Flow Graph (DFG) processing framework and compare it to a traditional MapReduce (MR) one. In addition, we analyzed  high level data manipulation languages such as PigLatin [1] and investigated whether a MR approach introduces an important overhead in the transformation of data operations to the physical execution plan. Going further, we analyzed a typical class of data analytic, the pipelines jobs and we studied the adequacy of DFG for ease of use and performances. Finally, we used functional programming in order to introduce higher order functions to simplify the expression of jobs and to promote the re-usability of operators.

Continue reading

EURA NOVA Publications in 2011

As we usually do at the end of the year,  we will try to summarize the activities led at EURA NOVA R&D during this year. In this post I come back on our scientific publications we had this year. Those publications cover distributed data storage, model management & governance in finance and, finally, elastic architectures for cloud infrastructures. Let me briefly introduce you those papers.

Continue reading

EURA NOVA selected for the erasmus mundus joint doctorate (EMJD) in distributed computing

The EMJD in Distributed Computing (EMJD-DC) is an international joint doctoral school for the study of distributed computing. EMJD-DC combines a well-structured, interdisciplinary training programme with high-quality research work. The training and research is carried out in cooperation with leading research groups from four universities in different countries: UPC (Spain), KTH (Sweden), IST Lisbon (Portugal), and UCL (Belgium).

Data storage elasticity – quick view on master thesis work (part 1)

Master Theses

In this post I would like to speak about two master theses that EURA NOVA is managing with the Faculty of Science Engineering of the Université Libre de Bruxelles (ULB) and with the Université Catholique de Louvain (UCL).  The two students have been working on the same topic: the elasticity of  data storage on the cloud.  The first cool stuff to notice is that they are working on two different aspects of the elasticity by taking different directions, but at the end of the day, by their two contributions they draw a complete picture of the NoSQL benchmarking in the cloud. In this post I will give you a preview of their work that should be published in June 2011.

Continue reading

Convergence between Cloud Infrastructure management and Data Centre management

The convergence – a reality

If we look at the evolution of private cloud, we can clearly see a natural convergence between Data Center Management (DCM) and IaaS management. It is natural because it fits better to the enterprise organization. Let’s be clear, we are talking about large organization, having already data centers for their own IT or businesses. The Services and software department are focusing on the elastic applications they can build on top of the infrastructure management such as Eucalyptus [1], OpenStack[2] or commercial solutions such as VCloud Director [3].

Continue reading

Replacing Pig Latin’s storage engine

Today, we welcome Arthur Lessuise, a student in last year in Master in Computer Science at the Université Libre de Bruxelles (Belgium). He spent 6 weeks at Euranova R&D for its internship. He studied the ability to swap HDFS in Pig Latin by a NoSQL storage. This post is a summary of his amazing work. Enjoy!

Continue reading