Big Data Architectures at Universitat Politècnica de Catalunya

Today and Wednesday (the 13rd and the 15th of March 2017), our R&D Director will be in Barcelona to give a course about Big Data Architectures.

The objective is to learn the basic concepts and details to take into account when designing a Big Data Architecture. The student will learn the impact of technical & functional constraints on the storage and processing choices. Going further the course will show, through industrial use cases, the raise of new architecture patterns. The course includes a practical part with hands-on session on distributed frameworks.

Contents :

  • Terminology & Concepts
  • Distributed architecture
  • Big Data Storage
  • Big Data Processing
  • Big Data Architecture Patterns (Hands-on session)
  • Distributed processing with Apache Flink / Spark
  • Data manipulation with Apache Pig

For more details, contact Oscar Romero ( oromero@essi.upc.edu )

Want to host Sabri Skhiri for a course in your university? Contact research@euranova.eu

Flink Forward 2015

 

The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event. Here is our field report.

Continue reading

High Availability in RoQ

In the last year, we have worked with Benjamin Van Melle on implementing High Availability in RoQ, our proof-of-concept distributed pub-sub messaging system. As a consequence, we needed to expand our JUnit tests to cover individual component failure scenarios and prove they were handled as expected. This piece will show how we used Docker to achieve this.

Elastic Messaging for the Cloud
Elastic Messaging for the Cloud

Continue reading

A distributed data mining framework accelerated with graphics processing units

In the context of processing high volumes of data, the recent developments have led to numerous models and frameworks of distributed processing running on clusters of commodity hardware. On the other side, the Graphics Processing Unit (GPU) has seen much enthusiastic development as a device for general-purpose intensive parallel computation. In this paper we propose a framework which combines both approaches and evaluates the relevance of having nodes in a distributed processing cluster that make use of GPU units for further fine-grained parallel processing. We have engineered parallel and distributed versions of two data mining problems, the naive Bayes classifier and the k-means clustering algorithm, to run on the framework and have evaluated the performance gain. Finally, we also discuss the requirements and perspectives of integrating GPUs in a distributed processing cluster, introducing a fully distributed heterogeneous computing cluster.

Nam-Luc Tran, Quentin Dugauthier, and Sabri Skhiri, A Distributed Data Mining Framework Accelerated with Graphics Processing Units, proceedings of the 2013 International Conference on Cloud Computing and Big Data (CloudCom-Asia), FuZhou, China, December 2013.

Click here to access the paper in its preprint form.

OpenNebula Conference

Open-nebula

And then it’s over, 3 days at the 1st OpenNebula Conference! I took part in 18 talks, put my hands into OpenNebula core, met a lot of interesting people, enjoyed a really nice German dinner in the capital and filled my brain with tons of information and inspiration.

Continue reading