NAVI GATIONSEARCH BOX
Join us on LinkedIn Follow us on Twitter
Eura Nova RD
Eura Nova

High-Performance and Distributed Architecture

High-Performance and Distributed Architecture Machine Learning

05-11-2015

Distributed Frank-Wolfe under pipelined stale synchronous parallelism

Iterative-convergent algorithms represent an im-portant family of applications in big data analytics. These aretypically run on distributed processing frameworks deployed on a cluster of machines. On the other hand, we are witnessing the move towards data center operating systems (OS), where resources are unified by a resource manager and processing frameworks coexist with each other. In this context, different processing framework job tasks can be scheduled on the same machine and slow down a worker (straggler problem). Existing work has shown that an iteration model with relaxed consistency such as the Stale Synchronous Parallel (SSP) model, while still guaranteeing convergence, is able to cope with stragglers. In this paper we propose a model for the integration of the SSP model on a pipelined distributed processing framework. We then apply SSP on a distributed version of the Frank-Wolfe algorithm. We theoretically show its sparsity bounds and convergence under SSP. Finally, we experimentally show that the Frank-Wolfe algorithm applied on LASSO regression under SSP is able to converge faster than its BSP counterpart, especially under load conditions similar to those encountered in a data center OS.

 

Nam-Luc Tran, Thomas Peel, Sabri Skhiri, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, proceedings of the 2015 IEEE Conference on Big Data, November 2015, Santa Clara, CA, USA.

Download file (.pdf)

High-Performance and Distributed Architecture Machine Learning

12-07-2015

Distributed frank-wolfe under pipelined stale synchronous parallelism

We are witnessing the move towards data center operating systems (OS), where resources are unified and  processing frameworks coexist with each other. In this context it has been shown that an iteration model with relaxed consistency such as the Stale Synchronous Parallel (SSP) model, while still guaranteeing convergence, is able to cope with the straggler problem for converging iterative algorithms. In this poster we present a model for the integration of the SSP model on a pipelined processing framework. We then apply the SSP on a distributed version of the Frank-Wolfe algorithm and empirically show its convergence under stress situations similar to those encountered in a data center OS.

 

Thomas Peel, and Nam-Luc Tran, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, poster at the Greed is Great ICML’15 Workshop, Lille, France, July 2015

Download file (.pdf)

High-Performance and Distributed Architecture

29-05-2015

Analysis of interbank messages for the enforcement of financial regulations

In the context of the recent policies concerning anti-money laundering and counter terrorist financing defined by the Financial Action Task Force Recommendation 16, it is the responsibility of the financial institution to monitor the quality of the information present in wire transfers. To that end we present in this paper an approach to automate the monitoring and the validation of the information contained in interbank transfer messages. The approach is backed by a solution built around an event-driven architecture where the data is processed as a stream and transformed at each stage. This architecture is in line with the latest research in data warehouses with stream data processing. We show that our approach is suitable to the requirements and the standards in the banking industry.

 

Nam-Luc Tran, Analysis of Interbank Messages for the Enforcement of Financial Regulations, proceedings of Journées francophones sur les Entrepôts de Données et l’Analyse en ligne, Bruxelles, Belgium, April 2015.

Download file (.pdf)

High-Performance and Distributed Architecture

06-11-2014

An approach for maximizing performance on heterogeneous clusters of CPU and GPU

Over the past years there has been significant enthusiasm for development of parallel computing on Graphics Processing Units (GPU) which have now become powerful and affordable hardware equipping data centers and research clusters. Our earlier research has explored the ways to exploit the parallel compute performance of the GPU along the CPU in the same cluster. We have proposed a model for processing distributed machine learning tasks leveraging both the CPU and the GPU equipped on the nodes. Still in this direction, we present in this paper our approach for optimizing the performance of the previously proposed framework. We then further present our approach for integrating this processing model into a more general dataflow graph processing framework by extending it with support for GPU tasks and resources. In addition we have developed a k-nearest neighbors implementation demonstrating all the features. We then present our model based on flow networks for the efficient scheduling on this heterogeneous framework.

 

Nam-Luc Tran, Sabri Skhiri, Arnaud Schils, and Egar Isaac Hiroshi Leon Saiki, An Approach for Maximizing Performance on Heterogeneous Clusters of CPU and GPU. EURA NOVA technical series.

Download file (.pdf)

High-Performance and Distributed Architecture

14-11-2013

A distributed data mining framework accelerated with graphics processing units

In the context of processing high volumes of data, the recent developments have led to numerous models and frameworks of distributed processing running on clusters of commodity hardware. On the other side, the Graphics Processing Unit (GPU) has seen much enthusiastic development as a device for general-purpose intensive parallel computation. In this paper we propose a framework which combines both approaches and evaluates the relevance of having nodes in a distributed processing cluster that make use of GPU units for further fine-grained parallel processing. We have engineered parallel and distributed versions of two data mining problems, the naive Bayes classifier and the k-means clustering algorithm, to run on the framework and have evaluated the performance gain. Finally, we also discuss the requirements and perspectives of integrating GPUs in a distributed processing cluster, introducing a fully distributed heterogeneous computing cluster.

Nam-Luc Tran, Quentin Dugauthier, and Sabri Skhiri, A Distributed Data Mining Framework Accelerated with Graphics Processing Units, proceedings of the 2013 International Conference on Cloud Computing and Big Data (CloudCom-Asia), FuZhou, China, December 2013.

Download file (.pdf)

High-Performance and Distributed Architecture

14-08-2013

Imgraph: a distributed in-memory graph database

Diverse applications including cyber security, social networks, protein networks, recommendation systems or citation networks work with inherently graph-structured data. The graphs modeling the data of these applications are large by nature so the efficient processing of them becomes challenging.
In this paper we present imGraph, a graph system that addresses the challenge of efficient processing of large graphs by using a distributed in-memory storage. We use this type of storage to obtain fast random data access which is mostly required for graph exploration. imGraph uses a native graph data model to ease the implementation of graph algorithms. On top of it, we design and implement a traversal engine that achieves high performance by efficient memory access, distribution of the workload, and optimizations on network communications. We run a set of experiments on real graph datasets of different sizes to assess the performance of imGraph in relation to other graph systems. The results show that imGraph gets better performance on traversals on large graphs than its counterparts.

 

Salim Jouili, and Aldemar Reynaga, imGraph: A distributed in-memory graph database, proceedings of the 2013 ASE/IEEE International Conference on Big Data, Washington D.C., USA, September 2013.

Download file (.pdf)

High-Performance and Distributed Architecture

13-08-2013

An empirical comparison of graph databases

In recent years, more and more companies provide services that can not be anymore achieved efficiently using relational databases. As such, these companies are forced to use alternative database models such as XML databases, object-oriented databases, document-oriented databases and, more recently graph databases. Graph databases only exist for a few years. Although there have been some comparison attempts, they are mostly focused on certain aspects only.
In this paper, we present a distributed graph database comparison framework and the results we obtained by comparing four important players in the graph databases market: Neo4j, OrientDB, Titan and DEX.

 

Salim Jouili, and Valentin Vansteenberghe, An empirical comparison of graph databases, proceedings of the 2013 ASE/IEEE International Conference on Big Data, Washington D.C., USA, September 2013.

Download file (.pdf)

High-Performance and Distributed Architecture

03-12-2012

Arom: processing big data with data flow graphs and functional programming

The development in computational processing has driven towards distributed processing frameworks performing tasks in parallel setups. The recent advances in Cloud Computing have widely contributed to this tendency. The MapReduce model proposed by Google is one of the most popular despite the well-known limitations inherent to the model which constrain the types of jobs that can be expressed. On the other hand models based on Data Flow Graphs (DFG) for the processing and the definition of the jobs, while more complex to express, are more general and suitable for a wider range of tasks, including iterative and pipelined tasks. In this paper we present AROM, a framework for large scale distributed processing based on DFG to express the jobs and which uses paradigms from functional programming to define the operators. The former leads to more natural handling of pipelined tasks while the latter enhances genericity and reusability of the operators, as shown by our tests on a parallel and pipelined job performing the calculation of PageRank.

 

Nam-Luc Tran, Sabri Skhiri, Esteban Zimányi, and Arthur Lesuisse. AROM: Processing Big Data With Data Flow Graphs and Functional Programming, proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science, IEEE CloudCom 2012. IEEE Computer Society Press, Taipei, Taiwan, December 2012.

Download file (.pdf)

High-Performance and Distributed Architecture

14-02-2011

Measuring elasticity for cloud databases

The rise of the Internet and the multiplication of data sources have multiplied the number of “Bigdata” storage problems. These data sets are not only very big but also tend to grow very fast, sometimes in a short period. Distributed databases that work well for such data sets need to be not only scalable but also elastic to ensure a fast response to growth in demand of computing power or storage. The goal of this article is to present measurement results that characterize the elasticity of three databases. We have chosen Cassandra, HBase, and mongoDB as three representative popular horizontally scalable NoSQL databases that are in production use. We have made measurements under realistic loads up to 48 nodes, using the Wikipedia database to create our dataset and using the Rackspace cloud infrastructure. We define precisely our methodology and we introduce a new dimensionless measure for elasticity to allow uniform comparisons of different databases at different scales. Our results show clearly that the technical choices taken by the databases have a strong impact on the way they react when new nodes are added to the clusters.

 

Thibault Dory, Boris Mejías, Peter Van Roy, and Nam-Luc Tran, Measuring Elasticity for Cloud Databases, proceedings of the Cloud Computing 2011 (Second International Conference on Cloud Computing, GRIDs, and Virtualization), Rome, Italy, September 2011.

Download file (.pdf)