Skip to content

A Performance Prediction Model for Spark Applications

Apache Spark is a popular open-source distributed-processing framework that enables efficient processing of massive amounts of data. It has a large number of parameters that need to be tuned to get the best performance. However, tuning these parameters manually is a complex and time-consuming task. Therefore, a robust performance model to predict applications execution time could greatly help in accelerating the deployment and optimization of big data applications relying on Spark. In this paper, we ran extensive experiments on a selected set of Spark applications that cover the most common workloads to generate a representative dataset of execution time. In addition, we extracted application and data features to build a machine learning-based performance model to predict Spark applications execution time. The experiments show that boosting algorithms achieved better results compared to other algorithms.

Florian Demesmaeker, Amine Ghrab, Usama Javaid, Ahmed Amir Kanoun, A Performance Prediction Model for Spark Applications, in the proceedings of Big Data congress 2020.

Click here to access the paper in its preprint form.

Releated Posts

Insights From Flink Forward 2024

In October, our CTO Sabri Skhiri attended the Flink Forward conference, held in Berlin, which marked the 10-year anniversary of Apache Flink. This event brought together experts and enthusiasts in the field of stream processing to discuss the latest advancements, challenges, and future trends. In this article, Sabri will delve into some of the keynotes and talks that took place during the conference, highlighting the noteworthy insights and innovations shared by Ververica and industry leaders.
Read More

Internships 2025

This document presents internships supervised by our consulting department or by our research & development department. Each project is an opportunity to feel both empowered and responsible for your own professional development and for your contribution to the company.
Read More