This week I was with Amine and Gary at the 3rd European Business Intelligence Summer School organized in the beautiful Dagstuhl castle in Germany. I was present at the two previous editions, it is a great way to have a condensed overview of top-level research in BI. This year two aspects have caught my attention: (1) the ever increasing skills required to be a good “data worker” and (2) the everlasting ontology and semantic web power.
Looking for Data Super Hero
I have intentionally spoken about data worker and not data scientist. Indeed, nowadays, working and being an expert in the data exploitation domain is like saying “Hey, I am an expert software developer I can build anything in JEE (JBOSS, GLASSFISH, WAS), JAVA, ECLIPSE, EMF, OSGI, Spring, Scala, C++, Python, Groovy, Clojure, .NET, ABAP and DELPHI and of course I manage distributed cache systems, DB, NoSQL and graph DB”. Well, working in the data exploitation is quite similar today, you need to master an incredible number of domains: BI, data warehouse, software engineering and conceptual modeling, pattern matching, machine learning, visualization, statistical approaches and few other topics. As in software development, you will hardly find someone having all those skills together. Going further, you will even find religious wars between statistical approaches, machine learning approaches and traditional data warehouse aka OLAP approaches for exploiting data. This highlights the need to continue working on different domains in parallel, as we do at EURA NOVA research to bridge and gather the best of each discipline.
Did you say… Ontologies? Oh Gosh!
It is not a secret that I am not a big fan of the Semantic Web, however, I always try to keep an objective view on new talks I see. Even if there are interesting research works made in this domain, I still believe that the added value that is bought by the Semantic Web can easily be replaced by a simpler approach. Let me give you an example: let’s talk about linked data and the ability to query external data providers such as companies, governments, public agencies, etc. If you consider using ontologies and first order logic in order to analyze your query and browse the different data sources and automatically select the right sources, files and attributes, now these are really interesting features. From the research viewpoint it brings a lot of interesting challenges. Now, let me take my Product manager hat:
Manager Sabri: “OK sounds great, but is there any other solution that would be much more cheaper to develop and to maintain while doing more or less the same?”
Amine aka the Expert: “Yes we could have a simple search tab in which the user types the data in which he is interested in and he would have a clear description of the data source, file and attributes, and with a check box he could select the sources he wants. Going further the research could be pre-launched according to the key words of the query.”
Manager Sabri: “Great, so why should I need the ontology and semantic web things then?”
Amine aka the Expert: “for making it fully automatic and avoid the user to explicitly click on check box …”
This is what I meant by saying that the value bought by the semantic web can be replaced by a not so boring step for the user. Don’t get me wrong, the semantic web research is an highly interesting topic and brings a lot of cool challenges, but I really think that the work for putting in place the right ontologies, to integrate them each other and the mechanics you have to put in place in the infrastructure are today a high price to pay comparing to the shortcut features we can propose to users. I will not play Hélène’s role by asking you to post your best semantic web applications, but if you have, I just ask to be convinced!
EBISS 2013 program: http://cs.ulb.ac.be/conferences/ebiss2013/program.html
Sabri Skhiri
Twitter: @sskhiri