I spent the last week at Prague, Czech Republic, to present my paper entitled “An Analytics-Aware Conceptual Model For Evolving Graphs”. More details on the paper could be found in my previous article.
In this post I will focus more on the trends, and share some of the lessons I learned during this nice travel.
My paper was accepted at DaWaK, however, the conference was jointly organised with many other international conferences such as DEXA and EC-Web. The full list could be found here.
Polyglot Persistence
In a nutshell, polyglot persistence [http://www.martinfowler.com/bliki/PolyglotPersistence.html] is about being able to manage multiple data stores, using the native driver compatible with the underlying data model of that store. The idea is in sync with the NoSQL movement, where native graph/document/key-value data stores are being developed to model, store and analyse data according to their native format.
The benefit of such approach, with regards to a transformation process to fit relational model, are obvious. This might require some additional work on the integration part. However, this will provide a significant gain in performance (as in graph traversal vs joins), in querying expressiveness, and provide a more realistic model.
One of the Globe conference stressed this need, and they have ongoing work to address some related challenges.
Time to go beyond Hadoop
During the conference, I attended two presentations where Hadoop was used as the processing layer. I agree that MR is a good paradigm that frees the user from taking care of all the details of data distribution and fail-over mechanism. However, again and again, there is no one-size-fits-all solution. It’s been a while since the MR paper was presented, and we have a lot of research done since then. New paradigm and new frameworks are available to handle the different workloads. For example, graph processing is better done with BSP-based frameworks such as Giraph. Iterative processing could be performed using HaLoop. And the recent SPARK/SHARK stack developed at Berkeley AMPLab and designed for efficient in-memory processing of RDD.[http://spark.incubator.apache.org/]
RDF VS. Graphs
During most of the discussions I had with people working on graph data management, the comparison with RDF was present. Actually, we proposed a conceptual model, until now independent of the storage layer. And in our implementation, we used graph databases built on top of the property graph model.
But to be back to RDF, we are aware of the amazing work done in the semantic web community and addressing the whole stack of modeling, storage and querying. However, we explained our point of view of this in a previous blog post.
And for my personal point of view I prefer continuing with native representation of graphs as edges/nodes for the following reasons:
- From a modeling perspective, I feel that the object-predicates-subject is limiting for easily supporting graph data
- RDF has less expressive power due to lack of properties on the predicates
- Provenance information capture is not obvious with RDF
Social Networking
“The most important moments in a conference are lunch and coffee breaks”. This is what another PhD student I met at the conference taught me, and I absolutely confirm. Actually, I’ve got the opportunity to meet many researchers, to present my research project and gather feedback. We also had some discussions for future cooperation and further participation of EURA NOVA in new research projects.
Overall I was most satisfied by the people I had the pleasure to meet, some of the presentations I attended were enriching, however I was a bit disappointed by the keynotes, either because of the materials or the presentation itself.
I would like also to mention that what I enjoyed is presentations applying research on real world scenarios such as applying association and other mining techniques on cancer prevention, or sensor networks deployment and data capturing and management. In this kind of projects, researchers face both theoretical, practical and even political challenges. Which definitely make their findings quite exciting!
For more details about any of the above topics, don’t hesitate to comment or email me!
Amine Ghrab
Twitter: @AmineGhrab