NAVI GATIONSEARCH BOX
Join us on LinkedIn Follow us on Twitter
Eura Nova RD
Eura Nova

Activity

In this section you will find EURA NOVA’s latest news and activities.

Activity Conferences

15-11-2018

Improving Topic Quality by Promoting Named Entities in Topic Modeling

In July, our R&D engineer Katherine Krasnoschok was in Melbourne, Australia to attend the ACL conference. She presented her poster on topic modelling. Her paper, co-written with Salim Jouili, indicates that involving more named entities positively influences the overall quality of topics.

 

The abstract:

News-related content has been extensively studied in both topic modeling research and named entity recognition. However, expressive power of named entities and their potential for improving the quality of discovered topics has not received much attention. In this paper, we use named entities as domain-specific terms for news-centric content and present a new weighting model for Latent Dirichlet Allocation. Our experimental results indicate that involving more named entities in topic descriptors positively influences the overall quality of topics, improving their interpretability, specificity and diversity.

Download file (.pdf)

Activity Conferences Insights Machine Learning

02-11-2018

Spark+AI Summit: a summary

A few weeks ago, Sabri Skhiri and Florian Demesmaeker were in London to attend the Spark+AI summit. They came back with a lot to say about the new features of Spark and the presented use cases! In this article, they will give you their opinion about Databricks’ main announcement, the intakes of their favourite talks and training, and what they thought of the new name of the conference.

 

A new name

This year, Spark expanded the summit’s scope and renamed it “Spark + AI Summit”. The goal of Databricks, announced by its co-founder Ali Ghodsi, is to incorporate unified aspects of data and AI.

Florian Demesmaeker, our R&D engineer, explains: “In some of the keynote talks, the speakers talked about use cases where the job of the data engineer is strongly reduced. The data scientists can easily experiment with data, travelling back and forth in time. This means more focus on AI, rather than on the data engineering part that makes all data accessible to the data scientists”.

 

Main announcement

In line with this change of name, Databricks announced the release of a complete data science lifecycle on the cloud.

Sabri Skhiri, our R&D Director, explains “It is interesting to see that the change in the event name is actually very visible in the change of Databricks’ strategy. Their tools are now completely dedicated to stream ETL, and there is a huge focus on integrated data management”.

Databricks’ new features include Databricks Delta which creates data pipeline and provides data views and exploration features. Secondly, the Databricks Runtime ML is a ready-to-use environment providing a set of pre-loaded ML frameworks where the data scientist can play with data. Finally, the MLflow tool allows to simplify the ML models development at enterprise scale.

Our R&D Director precises: “Together, these features provide a complete and unified approach to machine learning lifecycle and pipeline automation. This looks like a very competitive SaaS offer for integrated data management, available on AWS and Azure. However, the metadata management and the security aspect is still the missing piece”.

 

The training day

The first day of the conference was dedicated to training workshops that include a mix of instruction and hands-on exercises to help attendants improve their Apache Spark skills.

Florian gives insights into his favourite training Tuning and Best Practices. He explains: “The aim of the training was to make programmers aware of how Spark works internally, in order to be able to write optimised applications. They presented a few situations, each one showing one relatively slow process. Then they presented a step-by-step procedure to debug the situation and to find the points that could be improved in the current situation. In summary, tips and tricks to adapt to different situations”.

 

Favourite talks

The sessions at the conference covered data engineering and data science contents along with best practices for productionising AI. The talks were divided into roughly two categories: Spark programming and deployment, and applications on top of Spark (AI applications).

Florian Demesmaeker explains: “I attended 28 talks. The keynotes from Databricks were quite interesting, they presented Delta and MLflow. I also enjoyed the talks about tools to optimise the internals of Spark, these provided good technical details. Other talks were about use cases on top of Spark, it was interesting to see what challenges other companies face and how they address them”.

Sabri Skhiri adds: “The talk Learning to Rank Datasets for Search was very inspiring. Oscar Castañeda-Villagrán, a data scientist working at Xoom (a Paypal service) talked about learning to rank R data set. The idea is that we can extract metadata when the data pipeline is arriving in the lake. Going further, you can not only extract metadata but also calculate a kind of judgment relevance score that will be used for bootstrapping the learning to rank process. In this way, a user can search and retrieve the relevant R data set in the lake. A very good idea for the metadata-driven exploration”.

 

 

Early September 2018, 8 EURA NOVA engineers travelled to Berlin to attend the Flink Forward Conference, dedicated to Apache Flink users and stream processing communities. You can read their feedback here.

Activity Conferences Insights

05-10-2018

Flink Forward 2018: What You Want to Know and What You (Will) Need to Know.

Early September 2018, 8 EURA NOVA engineers travelled to Berlin to attend the Flink Forward Conference, dedicated to Apache Flink users and stream processing communities.

They came back with a lot to say about the hot topics in stream processing and the presented use cases! In this article, they will give you their opinion about data Artisans’ main announcement, the intakes of their favourite talks, and what they thought makes Flink Forward different from other conferences.

 

First keynote announcement:

During the keynote speech, data Artisans announced that they now bring ACID transactions directly on streaming data with data Artisans Streaming Ledger.

Charles Bonneau, our software architect, says: “This feature allows ACID transactions between multiple operators’ event-processing operations and internal states. This means that streaming applications can now update multiple states in one transaction. For example, an application that transfers money from one bank account to another can finally be implemented using Flink with strong consistency guarantees. Both bank accounts will have their balance updated at the same time as if there was a master data-management state”.

For Sabri Skhiri, our R&D director, this opens the doors to a brand new range of applications, especially in data-driven real-time services but also in streaming data management. He explains: “They are pushing forward the concept of streaming. Now, you could imagine a master data-management state that can be updated by operational streaming applications in real time. This will allow even more complex and advanced use cases of stream processing!”.

 

Favourite talks:

In 2 days, each Euranovian attended about 18 talks and use case presentations, with speakers from tech giants such as IBM, Netflix, Alibaba, and Uber as well as speakers from smaller companies.

Charles explains: “The conclusions are reassuring: most of them face the same issues that we see at our clients’ and our solutions are all valuable. They include a stream-first data architecture, a stream-first data pipeline product, and Flink developers skills. Even though a number of companies are at the very edge of the technology and their issues do not yet require continuous flows of a considerable amount of events, we are ready”.

For our R&D Director Sabri Skhiri, the keynote speech from Lightbend was one of the most interesting ones. He explains: “Viktor Klang, Lightbend deputy CTO, talked about the convergence between microservices and stream processing.  At EURA NOVA, we have been advocating for this convergence for more than a year in our architecture practice. The idea is simple: asynchronous microservices can be designed as stream processing stages. This is fantastic because it makes modern stateful stream processing frameworks the perfect target for implementing reactive microservices. With stateful deployment, exactly once semantics, high availability and ACID access to states, microservices can become stateful streaming apps.”

 

Vision-oriented Flink Conference:

Our colleagues came back with sparkles in their eyes. When we asked them how they felt about the event, Sabri Skhiri explained:

“Very often, this type of conferences tend to be business oriented. They are focused on how to make the framework easy to use and available to as many people as possible. By contrast, this year’s Flink Forward conference was all about innovation and vision. data Artisans shared their vision of what the Flink framework will be within 3 to 5 years and talked about what role stream processing and big data have within this vision.  In fact, almost all the talks were very technical. They were testimonies of big names in the industry, such as Alibaba, Netflix, and ING about problems encountered on the field and how they have been solved, which is often out of the box. The Flink-Alibaba partnership is a sharing one. Alibaba are way ahead with their technology. They keep their lead for 1 year and then they share their work and make their code open source. data Artisans have a great long-term vision of stream processing. I can see a lot of very interesting architecture discussions in the coming months!”

 

Stream Processing Technology:

When most frameworks cannot process considerable streams of live data and provide results in real time, Flink provides a single runtime for the streaming and batch processing while being highly scalable.

Cyrille Duverne, our Lead Data Architect, confirms: “Flink is definitely a real-time processor! We’re speaking about true real time, not only mini batches etc… Plus, the introduction of ACID transaction management in the new version of data Artisans’ Flink distribution creates a good marketing edge”.

Sabri Skirhi and our R&D engineer Florian Demesmaeker were at the Spark Summit this week. Stay tuned for part 2 with their feedback!

Activity Conferences

05-07-2018

Third Workshop on Real-time & Stream Analytics in Big Data & Stream Data Management

EURA NOVA Research center is proud and excited to organize the third workshop on Real-time and Stream analytics in Big Data, collocated with the 2018 IEEE conference on Big Data. The workshop will take place in December in Seattle, USA.

As the world become more connected, flood of digital data is getting generated, in high volume, and in a high velocity. For industries such as financial markets, telecommunications, Smart Cities, manufacturing, or healthcare, there is an increasing need to process, and analyze, these data streams in real time.

These past two years, we have seen arriving another usage of Stream & complex event processing: the data management. New architecture patterns have been proposed to resolve data pipeline and data management within enterprise.

After the success of the two first edition, this is an excellent opportunity to engage in discussions with experts and researchers, to refine new opportunities and use cases required by the industry.

Authors are invited to contribute to the conference by submitting articles in the (among others) following areas: Scalable real-time decision algorithms, IoT analytics & stream mining, Data pipelines & Data management with Streams and Stream ETL and Real-Time Data Warehouse.

 

Want to submit a paper? Check out the workshop website to find all the information you  will need. Your paper will be reviewed by a prestigious panel of international experts from both the academic and the industrial worlds.

Activity

09-03-2018

Second Spring School Big Data Analytics

EURA NOVA Research Center is both proud and happy to lead the Second Spring School Big Data Analytics that will be held in Tunis, from the 20th to the 22nd of March 2018. Sabri Skhiri and Aymen Cherif will talk about their favorite topics:

  • Deep Learning
  • TensorFlow
  • CNN Architecture
  • Unsupervised Learning
  • Complex Event Processing
  • Stream processing & micro-services

 

Check out the complete agenda and register on the event website : https://sites.google.com/view/ssbda2018/welcome

The conference is organised by the Ecole Polytechnique de Tunisie.

Activity

06-07-2017

Second Workshop on Real-Time and Stream Analytics in Big Data

EURA NOVA is thrilled to share the news with you: we are organizing our second workshop collocated with the 2017 IEEE International Conference on Big Data. The workshop will take place in December in Boston, MA, USA.

 

Stream processing and real-time analytics have caught the interest of the industry lately. Many use cases are waiting for relevant and efficient solutions to be developed. Such use cases include event-driven marketing, dynamic network management & optimization, real-time recommendation, context-aware applications and real-time fraud detection.

 

After the success of the first edition, this is an excellent opportunity to bring together the industry and academics  to discuss, to explore and to refine new opportunities and use cases in the area. The workshop will benefit  both researchers and practitioners interested in the latest research in real-time and stream processing. The workshop will showcase prototypes and products leveraging big data technologies as well as models, efficient algorithms for scalable complex event processors and context detection engines, or new architecture leveraging stream processing.
Want to submit a paper? Check out the workshop website to find all the information you  will need. Your paper will be reviewed by a prestigious panel of international experts from both the academic and the industrial worlds.

Activity

21-04-2017

Next Workshop on Graph Business Intelligence

EURA NOVA is organizing their second workshop collocated with an international conference. This time, the workshop will be collocated with  the 21th European Conference on Advances in Databases and Information Systems. It will take place in September in Cyprus and will bring together industrial and academic stakeholders to discuss, explore and refine new opportunities and use cases in the area of Graph Business Intelligence.

 

Want to be part of the fun? Check out the workshop website to find all the information you need to know and submit your paper. Our researchers Sabri Skhiri, Salim Jouili and Amine Ghrab cannot wait to read your papers and meet you in Nicosia.

Activity

13-03-2017

Big Data Architectures at Universitat Politècnica de Catalunya

Today and Wednesday (the 13rd and the 15th of March 2017), our R&D Director will be in Barcelona to give a course about Big Data Architectures.

The objective is to learn the basic concepts and details to take into account when designing a Big Data Architecture. The student will learn the impact of technical & functional constraints on the storage and processing choices. Going further the course will show, through industrial use cases, the raise of new architecture patterns. The course includes a practical part with hands-on session on distributed frameworks.

Contents :

  • Terminology & Concepts
  • Distributed architecture
  • Big Data Storage
  • Big Data Processing
  • Big Data Architecture Patterns (Hands-on session)
  • Distributed processing with Apache Flink / Spark
  • Data manipulation with Apache Pig

For more details, contact Oscar Romero ( oromero@essi.upc.edu )

Want to host Sabri Skhiri for a course in your university? Contact research@euranova.eu

Activity

10-02-2017

ENX University in Tunis

On the 9th and 10th of May 2017, the R&D Director of EURA NOVA Sabri Skhiri will lecture on Big Data and Data Science at the Polytechnic School of Tunisia. The course will be hosted by the SERCOM laboratory.

After the launch of EURA NOVA Tunis last September, this course will be a new opportunity for us to bond a little more with Tunisians, especially students. Indeed, EURA NOVA offers programmes in collaboration with universities, such as boot camps, master thesis, research internships and PhDs, and engineering internships. We hope that this lecture will make Polytechnic students want to explore Data Science with us and join the pack!

 

Want to organise a lecture on Big Data and Data Science in your own university? Contact research@euranova.eu and ask for ENX University offer.

 

Here is the detailed programme [in French]

 

Mardi 9 mai 2017: Architecture BIG DATA (partie 1)

Matin (8h30-12h30)

  1. Terminologie et concepts généraux
  2. Architecture distribuée
  3. Stockage du Big Data : NoSQL, NewSQL, Systèmes de fichiers distribués

Pause déjeuner : 12h30-14h

Après-midi : 14h-17h

Travaux pratiques : Préparation de données : Script Pig

    1. Introduction à Pig
    2. Exercice de préparation de données

______________________________________________________

 

Mercredi 10 mai 2017 : Architecture BIG DATA (partie 2)

Matin (8h30-12h30)

  1. Traitement du Big Data : Batch et Streaming
  2. Patrons d’architecture Big Data
  3. Architectures adoptées dans des contextes industriels : Etude de cas

Pause déjeuner : 12h30-14h

Après-midi : 14h-17h

Travaux pratiques sur Apache Spark/Flink

    1. Introduction à Flink et commande Scala de base
    2. Traitement de données en batch et en stream

 

 

 

 

Activity

26-07-2016

EURA NOVA R&D has a new rallying cry : Join The Pack!

Screenshot from 2016-07-26 17-35-03

 

After launching our first bootcamp, we are organising our first workshop colocated with IEEE conference. The workshop will take place in December in Washington D.C. and will bring together industrial and academic stakeholders to discuss, explore and refine new opportunities and use cases in the area of stream processing and real-time analytics in big data.

Indeed, stream processing and real-time analytics have caught the interest of the industry lately. Many use cases are waiting for relevant and efficient solutions to be developed. Such use cases include event-driven marketing, dynamic network management & optimization, real-time recommendation, context-aware applications and real-time fraud detection.

The workshop will showcase prototypes or products leveraging big data technologies as well as models and efficient algorithms for scalable complex event processors and context detection engines. Here is a short list of research topics to inspire you :

  • New stream processing architecture for big data.
  • Complex event processing for big data, pattern matching engines for big data.
  • Scalable real-time decision algorithms.
  • Scalable stream processing architecture, algorithms or models.
  • Stream SQL and other continuous query languages on big data frameworks.
  • Algorithms for high-speed data stream mining.
  • On-line/incremental learning on data streams.

Your paper will be reviewed by a panel of academic as well as industrial experts.  

Find more information about program co-chairs and members on the workshop website and submit your paper to join the Euranovian pack!

Don’t miss the chance to be part of an IEEE conference and to see Washington under the snow.

 

 

Activity

04-03-2016

Installing TensorFlow with distributed GPU support.

Today, I wrote my first “Hello World” script using the freshly open-sourced version of TensorFlow with distributed GPU support. At the time of this writing, the binary releases of TensorFlow don’t come with the distributed GPU support therefore I had to build TensorFlow from sources. All the documentation to do this already exists but is a bit scattered on multiple websites. Here is a condensed version of the install process (on a Linux Ubuntu 14.04 platform).

(more…)

Activity

16-02-2016

My internship at EURA NOVA

Renaud Vilmart (Mines de Nancy) did an Engineering Internship at EURA NOVA from June to September. In the article, Renaud describes his experience as an intern.

(more…)

Activity

05-01-2016

Flink Forward 2015 – Slides & video

The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event (http://flink-forward.org/?session=stale-synchronous-parallel-iterations-on-flink).

Here is the talk we presented.

(more…)

Activity

08-12-2015

IEEE Big Data 2015

This year we had the opportunity to publish a paper, DISTRIBUTED FRANK-WOLFE UNDER PIPELINED STALE SYNCHRONOUS PARALLELISM, at the IEEE Big Data conference at Santa Clara, CA. This was an excellent opportunity to write a short summary on the trends in the big data area and our personal feelings after one week under the sun with Tacos and Enchiladas.

(more…)

Activity

17-11-2015

EURA NOVA Internships & Master Thesis

As of each year since its foundation, EURA NOVA proposes Master thesis subjects and research internships, led in collaboration with academic institutions.

(more…)

Activity

02-11-2015

Graph Data Management: Status and Trends

Today’s social environments are getting more interconnected and the business market is becoming increasingly open and competitive. Organisations require a better awareness of their state and an accurate prediction of their evolution. To cope with this surging demand, new models and tools need to be developed. In my opinion, graph models are of a crucial interest for addressing these challenges.

(more…)

Activity

20-10-2015

Flink Forward 2015

 

The first edition of Flink Forward took place past October 12th and 13th in Berlin. Flink Forward is two-day conference exclusively dedicated to Apache Flink, the distributed pipelined batch and streaming processing framework. EURA NOVA was present among the speakers of the event. Here is our field report.

(more…)

Activity

06-10-2015

High Availability in RoQ

In the last year, we have worked with Benjamin Van Melle on implementing High Availability in RoQ, our proof-of-concept distributed pub-sub messaging system. As a consequence, we needed to expand our JUnit tests to cover individual component failure scenarios and prove they were handled as expected. This piece will show how we used Docker to achieve this.

Elastic Messaging for the Cloud

Elastic Messaging for the Cloud

(more…)

Activity

22-09-2015

ICML 2015

 

ICML Lille

Introduction

The International Conference on Machine Learning is one of the most important annual event in the world of machine learning. The place is where the most renowned researchers in the field gather to present and share their -often diverging – vision and directions for the future. As such, the event is sponsored by most of the biggest companies in IT such as Google, Baidu and Facebook. It also attracts numerous smaller companies with particular interest in big data in its wake.

(more…)

Page 1 of 612345...Last »