Skip to content

SNIFF

CONTEXT

These past years the Graphics Processing Units have gained popularity and momentum in industrial research thanks to the parallel processing power (# of parallel core) that these offers for a reasonable price. The second reason behind this is their wide hardware availability in the consumer landscape: the sheer price cost of a GPU has favored their presence in most of the consumer devices nowadays, from mobile phones & tablets to video game systems.

In the beginning of 2013 EURA NOVA has started an internal R&D project codenamed “SNIFF” whose goal is to study the potential of GPUs in a distributed machine learning framework. This consideration adds a second level of parallelism:

  1.  in the distribution of the processing among the computing nodes.
  2.  in the parallel computation on the individual nodes.

Concretely this means that each node in the data processing cluster is equipped with a GPU suitable for massive parallel computation in addition to its main Central Processing Unit (CPU).

Both these challenges have endured extensive research these past years with on one hand the development of distributed processing frameworks such as MapReduce, Spark, etc. EURA NOVA also has also worked on AROM which is based on the DFG execution model. On the other hand, the proliferation of affordable parallel hardware (GPUs, manycores, …) has driven much momentum in multi-core research. More precisely when considering the context of Machine Learning, these challenges translate to the following:

  • Which Machine Learning algorithms are suitable for distribution and how can this be achieved?
  • Which sections of these algorithms can be executed in parallel and leverage the parallel processing power provided by the GPU?

The first problem above has been studied these past years following the developments of MapReduce[1], while for the second, many effort has been invested in porting fundamental machine learning tasks to run on GPUs (ex: [2]).

CONTRIBUTION

In the focus of the study we have focused on two fundamental machine learning algorithms: the naive Bayes classifier and the k-means clustering. Both these algorithms are data parallel and there has already been many parallelization propositions in the litterature for either distributed or parallel computation of the model using the GPU, but not both at the same time. In the project we have reeingineered these algorithms to run distributed and we have delegate many parallel sections to be computed on the GPU.

The following figure summarizes the architecture of the framework. Given a suited machine learning algorithm, the workflow of the process is as follows:

  1. The global processing is distributed among worker nodes
  2. Each node computes a part of the model, using the parallel hardware (GPU)
  3. A combiner then gathers the partial models and combines them into the final global model
  4. (A master node is used for signal and control)

 

Design of SNIFF-see details in text

Design of SNIFF-see details in text

 

INSIGHTS

Going one step further, our research has given us enough insight to project the future of distributed processing framework as distributed heterogeneous processing platforms. When considering a framework such as AROM this means that the particularities of the parallel hardware are hidden behind operators, which then pack different implementations of a same task with and without requiring the presence of a GPU depending on the capabilities of the host on which it will be scheduled on. For the user in the end, writing the processing job means composing with the different operators which will handle the execution on the GPU hardware.

Some operators whose implementation requires the use of a GPU will be scheduled on adhoc host at runtime.
Some operators whose implementation requires the use of a GPU will be scheduled on adhoc host at runtime.

 

Nam-Luc Tran

 

References

[1] C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Y. Yu, G. Bradski, A. Ng, and K. Olukotun, “Map-Reduce for Machine Learning on Multicore”, Advances in Neural Information Processing Systems, vol. 19, pp. 281—288, 2007.

[2] L. Lopes and B. Ribeiro, “GPUMLib: An Efficient Open-Source GPU Machine Learning Library”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 3, pp. 355–362, 2010.

Releated Posts

Insights from GTC Paris 2025

Among the NVIDIA GTC Paris crowd was our CTO Sabri Skhiri, and from quantum computing breakthroughs to the full-stack AI advancements powering industrial digital twins and robotics, there is a lot to share! Explore with Sabri GTC 2025 trends, keynotes, and what it means for businesses looking to innovate.
Read More

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Tumor tracking in PET/CT is essential for monitoring cancer progression and guiding treatment strategies. Traditionally, nuclear physicians manually track tumors, focusing on the five largest ones (PERCIST criteria), which is both time-consuming and imprecise. Automated tumor tracking can allow matching of the numerous metastatic lesions across scans, enhancing tumor change monitoring.
Read More