In this section you will find EURA NOVA’s latest news and activities.
The Big Data Paris 2013 conference was held on April 3rd and 4th. I was quite disappointed by this event. Of course, I knew it was neither a scientific nor an ACM or IEEE event. I was thus coming with a solid mental preparation, ready to see marketing and business presentations.
But what I saw was incredibly far from what I expected. In order to keep this post objective, I will quickly skip the amazingly (Yes, I think this is the right term) poor level of the panel discussions and the talks. Instead, I want to summarize my overall feeling in this post.
The basic idea conveyed by the different talks were:
At least everybody agreed on two certainties: Hadoop is a revolution and Big Data equals Hadoop.
More interestingly, the big data session that was supposed to show real industrial usages of Big Data included talks that all started by “Huum, that is not really a big data problem here, but it can become one … one day…”
You find that incoherent and hard to hear from Big Data experts? Well, me too! Unfortunately, this was the reality of the presentations that I had to attend these two days.
Hey CTO, you don’t know yet but you have an extremely important issue with your data…
… and we have the right product to address it!
I am quite sure that this kind of event is organized for one unique objective: leading CTOs and IT director into thinking that Big Data is a real need for them and that they need to quickly buy a solution in order to stay competitive.
Not everybody needs to have an Hadoop or Spark cluster with Storm on top of it running distributed machine learning algorithms! Each project needs to clearly identify the business objectives to reach before starting talking about how Hadoop on high-end machines can help you. Making people think that buying a rack of [Insert the most expensive server of your favorite brand here] will magically resolve all their problems, is at the limit of the intellectual dishonesty!
There was a considerable confusion, in the talks I attended, between Big data and Data science. Often, Speakers combined at the same level the storage, distributed processing and the algorithmic. In addition, they considered that
Thus, some presentations concluded that their solutions (e.g. the ‘revolution’ Hadoop ) solve everything (All-in-One). We are here touching the essence of the problem: the three levels I was mentioning can be found in a data project but cannot be generalized as the Big Data Problem and then be used for selling hardware solutions.
The Big Data describes the important amount of data that we need to handle in a project. The Data Science is more about the intelligence, the algorithms we need to develop to create value from from our data (Big or Small).
What we have to keep in mind here is that business objectives are the priority. We need to think about how to reach those objectives, and this is where the Data Science can bring interesting solutions. Finally, according to the size of the data we can consider innovative architecture patterns or frameworks.
Before talking about the ‘how’ with a long description of trendy keywords, let’s talk about the ‘what’. Let’s look at a project aiming at extracting value from data:
I do not want to give you an exhaustive approach of this domain but simply highlight the fact that this is more a question of on-purpose optimizations for business objectives. Then, before thinking about the hardware and the product a customer needs, we should first think about defining its problem (if he really has one to resolve) and then we can design the best suited architecture and algorithmic. This shows the increasing importance of the Data Scientist at the core of those activities.