Renaud Vilmart (Mines de Nancy) did an Engineering Internship at EURA NOVA from June to September. In the article, Renaud describes his experience as an intern.
Flink Forward 2015 – Slides & video
The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event (http://flink-forward.org/?session=stale-synchronous-parallel-iterations-on-flink).
Here is the talk we presented.
IEEE Big Data 2015
This year we had the opportunity to publish a paper, DISTRIBUTED FRANK-WOLFE UNDER PIPELINED STALE SYNCHRONOUS PARALLELISM, at the IEEE Big Data conference at Santa Clara, CA. This was an excellent opportunity to write a short summary on the trends in the big data area and our personal feelings after one week under the sun with Tacos and Enchiladas.
EURA NOVA Internships & Master Thesis
As of each year since its foundation, EURA NOVA proposes Master thesis subjects and research internships, led in collaboration with academic institutions.
Distributed Frank-Wolfe under pipelined stale synchronous parallelism
Iterative-convergent algorithms represent an im-portant family of applications in big data analytics. These aretypically run on distributed processing frameworks deployed on a cluster of machines. On the other hand, we are witnessing the move towards data center operating systems (OS), where resources are unified by a resource manager and processing frameworks coexist with each other. In this context, different processing framework job tasks can be scheduled on the same machine and slow down a worker (straggler problem). Existing work has shown that an iteration model with relaxed consistency such as the Stale Synchronous Parallel (SSP) model, while still guaranteeing convergence, is able to cope with stragglers. In this paper we propose a model for the integration of the SSP model on a pipelined distributed processing framework. We then apply SSP on a distributed version of the Frank-Wolfe algorithm. We theoretically show its sparsity bounds and convergence under SSP. Finally, we experimentally show that the Frank-Wolfe algorithm applied on LASSO regression under SSP is able to converge faster than its BSP counterpart, especially under load conditions similar to those encountered in a data center OS.
Nam-Luc Tran, Thomas Peel, Sabri Skhiri, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, proceedings of the 2015 IEEE Conference on Big Data, November 2015, Santa Clara, CA, USA.
Graph Data Management: Status and Trends
Today’s social environments are getting more interconnected and the business market is becoming increasingly open and competitive. Organisations require a better awareness of their state and an accurate prediction of their evolution. To cope with this surging demand, new models and tools need to be developed. In my opinion, graph models are of a crucial interest for addressing these challenges.
Flink Forward 2015
The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event. Here is our field report.
High Availability in RoQ
In the last year, we have worked with Benjamin Van Melle on implementing High Availability in RoQ, our proof-of-concept distributed pub-sub messaging system. As a consequence, we needed to expand our JUnit tests to cover individual component failure scenarios and prove they were handled as expected. This piece will show how we used Docker to achieve this.
ICML 2015
Introduction
The International Conference on Machine Learning is one of the most important annual event in the world of machine learning. The place is where the most renowned researchers in the field gather to present and share their -often diverging – vision and directions for the future. As such, the event is sponsored by most of the biggest companies in IT such as Google, Baidu and Facebook. It also attracts numerous smaller companies with particular interest in big data in its wake.
A framework for building OLAP cubes on graphs
Graphs are widespread structures providing a powerful abstraction for modeling networked data. Large and complex graphs have emerged in various domains such as social networks, bioinformatics, and chemical data. However, current warehousing frameworks are not equipped to handle efficiently the multidimensional modeling and analysis of complex graph data. In this paper, we propose a novel framework for building OLAP cubes from graph data and analyzing the graph topological properties. The framework supports the extraction and design of the candidate multidimensional spaces in property graphs. Besides property graphs, a new database model tailored for multidimensional modeling and enabling the exploration of additional candidate multidimensional spaces is introduced. We present novel techniques for OLAP aggregation of the graph, and discuss the case of dimension hierarchies in graphs.
Furthermore, the architecture and the implementation of our graph warehousing framework are presented and show the effectiveness of our approach.
Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro Vaisman, and Esteban Zimany, A Framework for Builidng OLAP Cubes on Graphs, proceedings of the 19th East-European Conference on Advances in Databases and Information Systems, Poitiers, France, September 2015.