Similar to the previous years EURA NOVA R&D has been supervising Master students either for their internship or for their Master Thesis during the 2013-1014 academic year. This year, 5 students have had the opportunity to work in the fields of Machine Learning, GPU compute, distributed processing, metabolic pathways and social graphs. This blog post summarizes their breakthroughs.
Graph Processing and Storage
Building on the momentum we created last year on graph processing with STEFFI (formerly imGraph), our in-house in-memory graph database and traversal engine, we pursued research on graph processing by applying the graph technologies on concrete fields.
As such, Romain’s thesis focused on proposing a novel approach for indexing social graphs. These graphs have the particularity to be composed of a massive number of vertices and edges and the accesses consist in lots of requests and updates due to the number of users. The proposed solution, called iDUN and based on NESS, can successfully deal with such challenges and has been optimised to work in a distributed environment. It allows for searching on attributes and shows promising results after being tested on EC2.
Still on the graph domain, Roald’s thesis has focused on the biological metabolic pathways. Following interactions with biologists, he came up with a proposed graph model for representing and storing the biological networks. Based on these observations, different graph databases have been studied in order to determine the most suitable ones for this type of tasks.
Machine Learning
In the field of machine learning, Sahilu has worked on metric-learning in the scope of image classification. Techniques for supervised classification of images usually depend on the representation of local features of the image and on the metric used to calculate the similarity (or distance) between the images. Rather than use a simple metric given a priori, many recent studies have shown the interest to learn a metric. This approach is described in the literature as metric learning. In that context, Sahilu’s thesis has focused on defining a metric learning methodology applied to very large image databases.
Big Data Processing on GPU
Continuing our previous research on distributed processing and GPUs, Hiroshi and Arnaud have worked on optimizing the framework we have previously developed. In that way, Hiroshi has delivered optimizations that now enable our k-means machine learning job to perform more than 10 times faster than the same job run on Mahout, using GPUs on the cluster nodes. He then reapplied the approach on a new ML algorithm, the soft clustering Expectation Maximization algorithm.
Arnaud on the other side has pursued on our vision of a unified DataFlow Graph processing framework that takes into account GPU resources and computations. In that way he has extended AROM, our in-house DataFlow Graph processing framework, to take into account job graphs where some operators perform computations exclusively on the GPU. This has further paved to way to a complete rehaul of the scheduling of the framework, to take into account heterogeneous resources and computations.
Final Word
It is to be noted that Sahilu and Hiroshi are international students that have had the opportunity to spend 6 months for their internship and Master Thesis at EURA NOVA in the scope of the IT4BI programme. IT4BI is an Europe-funded programme that allows Master students in Computer Science and Engineering to spend their last curriculum years abroad, with an industrial internship. EURA NOVA has been a partner and fervent supporter of this programme since its creation. For any question regarding IT4BI and Erasmus Mundus internships and Master Thesis do not hesitate to contact us!