SNIFF

Data science

October 22, 2013

CONTEXT

These past years the Graphics Processing Units have gained popularity and momentum in industrial research thanks to the parallel processing power (# of parallel core) that these offers for a reasonable price. The second reason behind this is their wide hardware availability in the consumer landscape: the sheer price cost of a GPU has favored their presence in most of the consumer devices nowadays, from mobile phones & tablets to video game systems.

In the beginning of 2013 EURA NOVA has started an internal R&D project codenamed “SNIFF” whose goal is to study the potential of GPUs in a distributed machine learning framework. This consideration adds a second level of parallelism:

in the distribution of the processing among the computing nodes.
in the parallel computation on the individual nodes.

Concretely this means that each node in the data processing cluster is equipped with a GPU suitable for massive parallel computation in addition to its main Central Processing Unit (CPU).

Both these challenges have endured extensive research these past years with on one hand the development of distributed processing frameworks such as MapReduce, Spark, etc. EURA NOVA also has also worked on AROM which is based on the DFG execution model. On the other hand, the proliferation of affordable parallel hardware (GPUs, manycores, …) has driven much momentum in multi-core research. More precisely when considering the context of Machine Learning, these challenges translate to the following:

Which Machine Learning algorithms are suitable for distribution and how can this be achieved?
Which sections of these algorithms can be executed in parallel and leverage the parallel processing power provided by the GPU?

The first problem above has been studied these past years following the developments of MapReduce[1], while for the second, many effort has been invested in porting fundamental machine learning tasks to run on GPUs (ex: [2]).

CONTRIBUTION

In the focus of the study we have focused on two fundamental machine learning algorithms: the naive Bayes classifier and the k-means clustering. Both these algorithms are data parallel and there has already been many parallelization propositions in the litterature for either distributed or parallel computation of the model using the GPU, but not both at the same time. In the project we have reeingineered these algorithms to run distributed and we have delegate many parallel sections to be computed on the GPU.

The following figure summarizes the architecture of the framework. Given a suited machine learning algorithm, the workflow of the process is as follows:

The global processing is distributed among worker nodes
Each node computes a part of the model, using the parallel hardware (GPU)
A combiner then gathers the partial models and combines them into the final global model
(A master node is used for signal and control)

Design of SNIFF-see details in text

INSIGHTS

Going one step further, our research has given us enough insight to project the future of distributed processing framework as distributed heterogeneous processing platforms. When considering a framework such as AROM this means that the particularities of the parallel hardware are hidden behind operators, which then pack different implementations of a same task with and without requiring the presence of a GPU depending on the capabilities of the host on which it will be scheduled on. For the user in the end, writing the processing job means composing with the different operators which will handle the execution on the GPU hardware.

Some operators whose implementation requires the use of a GPU will be scheduled on adhoc host at runtime.

Nam-Luc Tran

References

[1] C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Y. Yu, G. Bradski, A. Ng, and K. Olukotun, “Map-Reduce for Machine Learning on Multicore”, Advances in Neural Information Processing Systems, vol. 19, pp. 281—288, 2007.

[2] L. Lopes and B. Ribeiro, “GPUMLib: An Efficient Open-Source GPU Machine Learning Library”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 3, pp. 355–362, 2010.

Releated Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

24.12.2025 / Engineering / Papers

Traditional RAG systems struggle to capture relationships and cross-references between different sources unless explicitly mentioned. This challenge is common in real-world scenarios, where information is often distributed and interlinked, making graphs a more effective representation. Our work provides a technical contribution through a comparative evaluation of retrieval strategies within GraphRAG, focusing on context relevance rather than abstract metrics. We aim to offer practitioners actionable insights into the retrieval component of the GraphRAG pipeline.

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

22.12.2025 / Data science / Papers

The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.

SNIFF

CONTEXT

CONTRIBUTION

INSIGHTS

References

Releated Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

Recent Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

Investigating a Feature Unlearning Bias Mitigation Technique for Cancer-type Bias in AutoPet Dataset

Muppet: A Modular and Constructive Decomposition for Perturbation-based Explanation Methods

Tracks

Mjolnir

Rune

Vadgelmir

Yggdrasil

Field of expertises

Data architecture

Data governance

Data science

Engineering

Academic collaboration

SERVE

Expertise

CRAFT

digazu

CONTACT

Belgium

France

Tunisia

CAREER

Job Offers

Social media