Measuring elasticity for cloud databases

The rise of the Internet and the multiplication of data sources have multiplied the number of “Bigdata” storage problems. These data sets are not only very big but also tend to grow very fast, sometimes in a short period. Distributed databases that work well for such data sets need to be not only scalable but also elastic to ensure a fast response to growth in demand of computing power or storage. The goal of this article is to present measurement results that characterize the elasticity of three databases. We have chosen Cassandra, HBase, and mongoDB as three representative popular horizontally scalable NoSQL databases that are in production use. We have made measurements under realistic loads up to 48 nodes, using the Wikipedia database to create our dataset and using the Rackspace cloud infrastructure. We define precisely our methodology and we introduce a new dimensionless measure for elasticity to allow uniform comparisons of different databases at different scales. Our results show clearly that the technical choices taken by the databases have a strong impact on the way they react when new nodes are added to the clusters.

Thibault Dory, Boris Mejías, Peter Van Roy, and Nam-Luc Tran, Measuring Elasticity for Cloud Databases, proceedings of the Cloud Computing 2011 (Second International Conference on Cloud Computing, GRIDs, and Virtualization), Rome, Italy, September 2011.

Click here to access the paper.

Replacing Pig Latin’s storage engine

Today, we welcome Arthur Lessuise, a student in last year in Master in Computer Science at the Université Libre de Bruxelles (Belgium). He spent 6 weeks at Euranova R&D for its internship. He studied the ability to swap HDFS in Pig Latin by a NoSQL storage. This post is a summary of his amazing work. Enjoy!

Continue reading

Eclipse Build Technologies: What are the Current Trends?

I recently had the opportunity to attend the symposium entitled “What’s in a build? Best practice and Requirements” given by Henrik Lindberg (Cloudsmith [1]) and Nick Boldt (JBoss – RedHat [2]) at the Eclipse Summit Europe 2010 in Ludwigsburg [3]. The goal was to brainstorm and discuss about the practices and current concerns when it comes to the build of Eclipse plug-ins, features or RCP applications, and the automatic execution of post-build processes such as testing.

Continue reading

ESE 2010: EMF Symposium – overview

This Tuesday 2nd of November, the Eclipse Symposium opened the first afternoon of the ESE 2010 [1]. In this post we will give you an sightsee of the emergent projects presented during the session.  The EMF symposium is mainly focus around demos of new projects and initiatives in the modeling world. Going further it gives an overview of the important projects and the next directions of the modeling projects.

Continue reading

New SQL RDBMS Architectures Vs Old ones Vs NoSQL

Those recent years we have seen the NoSQL initiative emerging against the so-called “old, slow and legacy” relational DBs. But today the debate is extending with a newcomer “The New RDMS architecture”. The concept is simple, the RDMS architecture was developed a long time ago, at that time computer science,  computers and processors were extremely different from what we can find today. Why are we not able to gather the recent researches in storage, distributed computing and threading system to re-design a modern RDMS?

Continue reading

RH Partner summit: full-speed, all in the same direction toward cloud

During my career, I have seen a lot of companies, especially multi-national companies, being disrupted by internal wars, arguments between departments, jealousy between projects and much more that you can imagine. What’s really impressive after one day assisting at the Red Hat partner summit conferences, is the clear vision, direction and alignment of the whole Red Hat products and projects in the same and unique direction: enterprise, data-centre and cloud.

Continue reading

Red Hat EMEA Partner summit 2010 Keynote

Sunday 2nd of May, Euranova is at Valencia, Spain, for the Red Hat EMEA partner summit. The ceremony was opened by Jim Whitehurst, the Red Hat CEO in a keynote on the open source opportunity. The key message in the keynote was that every three years we can see an inflexion point in IT, in which business models, technologies and the delivery model completely change. In these last years we have seen this inflexion and arrival of virtualization, cloud and social networking. This keynote described how this point influences the IT and how it brings new challenges. The keynote was organized in three sections: (1) the problem, (2) The Red Hat business and (3) The solution.

Continue reading

FOSDEM 2010: The Raise of the NoSQL initiative

What’s NoSQL?

Even if the name is really meaningless, the NoSQL defines a new generation of Key/value pair storage. This initiative is gaining popularity but also maturity. The FOSDEM dedicated a complete day and dev. room for this subject. The wikipedia definition defines this movement as: “NoSQL is an umbrella term for a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. Data stores that fall under this term may not require fixed table schemas, and usually avoid join operations. The term was first popularised in early 2009. Trends in computer architectures are pressing databases in a direction that requires horizontal scalability. NoSQL-style data stores attempt to address this requirement. Prominent closed-source examples are Google‘s BigTable and Amazon‘s Dynamo. Several open-source variants exist including Facebook‘s Cassandra, Apache HBase, LinkedIn‘s Project Voldemort and many others.”

Continue reading

Super Size EMF Fast Food Demo: Add Complex Event Processing (Part 2)

Where were we?

In the last post we considered the problem of the Integrated fast food management. In step 1, we designed the models and in step 2 we weaved them. As a result the application were developed just by weaving live-models. We finished the last post by asking ourselves how could we answer the Fast Food manager’s issues:  the boss wanted to control the burger cooking rate according to current demand. In addition, soon, he will ask to include other contextual parameters, such as the number of available seats, or the location of the truck which delivers salads and burgers.  How could we define a flexible, maintainable, and intelligent system? We proposed to use CEP concepts.

Continue reading