Flink Forward 2015

 

The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event. Here is our field report.

Continue reading

High Availability in RoQ

In the last year, we have worked with Benjamin Van Melle on implementing High Availability in RoQ, our proof-of-concept distributed pub-sub messaging system. As a consequence, we needed to expand our JUnit tests to cover individual component failure scenarios and prove they were handled as expected. This piece will show how we used Docker to achieve this.

Elastic Messaging for the Cloud
Elastic Messaging for the Cloud

Continue reading

ICML 2015

 

ICML Lille

Introduction

The International Conference on Machine Learning is one of the most important annual event in the world of machine learning. The place is where the most renowned researchers in the field gather to present and share their -often diverging – vision and directions for the future. As such, the event is sponsored by most of the biggest companies in IT such as Google, Baidu and Facebook. It also attracts numerous smaller companies with particular interest in big data in its wake.

Continue reading

A framework for building OLAP cubes on graphs

Graphs are widespread structures providing a powerful abstraction for modeling networked data. Large and complex graphs have emerged in various domains such as social networks, bioinformatics, and chemical data. However, current warehousing frameworks are not equipped to handle efficiently the multidimensional modeling and analysis of complex graph data. In this paper, we propose a novel framework for building OLAP cubes from graph data and analyzing the graph topological properties. The framework supports the extraction and design of the candidate multidimensional spaces in property graphs. Besides property graphs, a new database model tailored for multidimensional modeling and enabling the exploration of additional candidate multidimensional spaces is introduced. We present novel techniques for OLAP aggregation of the graph, and discuss the case of dimension hierarchies in graphs.

Furthermore, the architecture and the implementation of our graph warehousing framework are presented and show the effectiveness of our approach.

Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro Vaisman, and Esteban Zimany, A Framework for Builidng OLAP Cubes on Graphs, proceedings of the 19th East-European Conference on Advances in Databases and Information Systems, Poitiers, France, September 2015.

Click here to access the paper in its preprint form.

Distributed frank-wolfe under pipelined stale synchronous parallelism

We are witnessing the move towards data center operating systems (OS), where resources are unified and  processing frameworks coexist with each other. In this context it has been shown that an iteration model with relaxed consistency such as the Stale Synchronous Parallel (SSP) model, while still guaranteeing convergence, is able to cope with the straggler problem for converging iterative algorithms. In this poster we present a model for the integration of the SSP model on a pipelined processing framework. We then apply the SSP on a distributed version of the Frank-Wolfe algorithm and empirically show its convergence under stress situations similar to those encountered in a data center OS.

 

Thomas Peel, and Nam-Luc Tran, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, poster at the Greed is Great ICML’15 Workshop, Lille, France, July 2015

Analysis of interbank messages for the enforcement of financial regulations

In the context of the recent policies concerning anti-money laundering and counter terrorist financing defined by the Financial Action Task Force Recommendation 16, it is the responsibility of the financial institution to monitor the quality of the information present in wire transfers. To that end we present in this paper an approach to automate the monitoring and the validation of the information contained in interbank transfer messages. The approach is backed by a solution built around an event-driven architecture where the data is processed as a stream and transformed at each stage. This architecture is in line with the latest research in data warehouses with stream data processing. We show that our approach is suitable to the requirements and the standards in the banking industry.

Nam-Luc Tran, Analysis of Interbank Messages for the Enforcement of Financial Regulations, proceedings of Journées francophones sur les Entrepôts de Données et l’Analyse en ligne, Bruxelles, Belgium, April 2015.

Click here to access the paper.

An approach for maximizing performance on heterogeneous clusters of CPU and GPU

Over the past years there has been significant enthusiasm for development of parallel computing on Graphics Processing Units (GPU) which have now become powerful and affordable hardware equipping data centers and research clusters. Our earlier research has explored the ways to exploit the parallel compute performance of the GPU along the CPU in the same cluster. We have proposed a model for processing distributed machine learning tasks leveraging both the CPU and the GPU equipped on the nodes. Still in this direction, we present in this paper our approach for optimizing the performance of the previously proposed framework. We then further present our approach for integrating this processing model into a more general dataflow graph processing framework by extending it with support for GPU tasks and resources. In addition we have developed a k-nearest neighbors implementation demonstrating all the features. We then present our model based on flow networks for the efficient scheduling on this heterogeneous framework.

Nam-Luc Tran, Sabri Skhiri, Arnaud Schils, and Egar Isaac Hiroshi Leon Saiki, An Approach for Maximizing Performance on Heterogeneous Clusters of CPU and GPU. EURA NOVA technical series.

Click here to access the paper.

Analytics-aware graph database modeling

Graphs are a fundamental structure for modeling many real world domains and applications. They have emerged in various fields such as social, informational and transportation networks. The hetero geneity and dynamicity of these networks pose challenges to traditional techniques for data modeling, storage and analysis of data.

Managing graph-structured data using native graph structures and algorithms is the key for its efficient analysis. Therefore, the graph should be modeled using nodes and edges, and explored using graph algorithms, such as pattern matching and k-neighborhood.

In this paper, we introduce a novel model for management of graph data. The aim of our model is to provide analysts with a set of simple, well-defined, and adaptable components to perform complex graph modeling and analysis tasks.

Amine Ghrab, Oscar Romero, Sabri Skhiri, and Esteban Zimanyi, Analytics-Aware Graph Database Modeling, EURA NOVA technical series.

Click here to access the paper.

EURA NOVA Master Theses, 2013-2014 Season

Similar to the previous years EURA NOVA R&D has been supervising Master students either for their internship or for their Master Thesis during the 2013-1014 academic year. This year, 5 students have had the opportunity to work in the fields of Machine Learning, GPU compute, distributed processing, metabolic pathways and social graphs. This blog post summarizes their breakthroughs.

EURA NOVA Master Theses

Continue reading