This year we started a research project, named AROM, in collaboration with the Université Libre de Bruxelles. We wanted to evaluate a Data Flow Graph (DFG) processing framework and compare it to a traditional MapReduce (MR) one. In addition, we analyzed high level data manipulation languages such as PigLatin [1] and investigated whether a MR approach introduces an important overhead in the transformation of data operations to the physical execution plan. Going further, we analyzed a typical class of data analytic, the pipelines jobs and we studied the adequacy of DFG for ease of use and performances. Finally, we used functional programming in order to introduce higher order functions to simplify the expression of jobs and to promote the re-usability of operators.
Releasing Okku
I am pleased to announce the release of the Okku library.
Okku is a Clojure wrapper around the Akka library. Very briefly, Akka brings the Actor model to the Scala programming language, and hence to the JVM. The Actor model is a model of computation based on small, asynchronous units that communicate only through message passing, without any shared memory.
EURA NOVA Publications in 2011
As we usually do at the end of the year, we will try to summarize the activities led at EURA NOVA R&D during this year. In this post I come back on our scientific publications we had this year. Those publications cover distributed data storage, model management & governance in finance and, finally, elastic architectures for cloud infrastructures. Let me briefly introduce you those papers.
EURA NOVA selected for the erasmus mundus joint doctorate (EMJD) in distributed computing
The EMJD in Distributed Computing (EMJD-DC) is an international joint doctoral school for the study of distributed computing. EMJD-DC combines a well-structured, interdisciplinary training programme with high-quality research work. The training and research is carried out in cooperation with leading research groups from four universities in different countries: UPC (Spain), KTH (Sweden), IST Lisbon (Portugal), and UCL (Belgium).
Next generation BI – Research overview (Part 2)
In this post, I am going on the Research overview started in the last post. Last week I was at the European Business Intelligence Summer School (eBISS 2011) in Paris. The objective was to give a complete overview on the researches and evolution of Bi, viewed by the best of bread researchers and industrials. In this post we continue to describe the main important topics that was exposed.
Next generation BI – Research overview (Part 1)
Last week I was at the European Business Intelligence Summer School (eBISS 2011) in Paris. The objective was to give a complete overview on the researches and evolutions of Bi, viewed by the best of bread researchers and industrials. For newbies in BI, I recommend to start by the wikipedia page (what a collaborative and web 2.0 world !).
Data storage elasticity – quick view on master thesis work (part 2)
In this second part I welcome Nicolas Degroodt who explains how he has extended the YCSB for implementing TPC-C benchmark for NoSQL. In this post we call DBMS a storage framework whether it is RDBMS or a NoSQL.
Data storage elasticity – quick view on master thesis work (part 1)
Master Theses
In this post I would like to speak about two master theses that EURA NOVA is managing with the Faculty of Science Engineering of the Université Libre de Bruxelles (ULB) and with the Université Catholique de Louvain (UCL). The two students have been working on the same topic: the elasticity of data storage on the cloud. The first cool stuff to notice is that they are working on two different aspects of the elasticity by taking different directions, but at the end of the day, by their two contributions they draw a complete picture of the NoSQL benchmarking in the cloud. In this post I will give you a preview of their work that should be published in June 2011.
Convergence between Cloud Infrastructure management and Data Centre management
The convergence – a reality
If we look at the evolution of private cloud, we can clearly see a natural convergence between Data Center Management (DCM) and IaaS management. It is natural because it fits better to the enterprise organization. Let’s be clear, we are talking about large organization, having already data centers for their own IT or businesses. The Services and software department are focusing on the elastic applications they can build on top of the infrastructure management such as Eucalyptus [1], OpenStack[2] or commercial solutions such as VCloud Director [3].
Replacing Pig Latin’s storage engine
Today, we welcome Arthur Lessuise, a student in last year in Master in Computer Science at the Université Libre de Bruxelles (Belgium). He spent 6 weeks at Euranova R&D for its internship. He studied the ability to swap HDFS in Pig Latin by a NoSQL storage. This post is a summary of his amazing work. Enjoy!