In this section you will find EURA NOVA’s latest news and activities. In this section you will find EURA NOVA's scientific publications and reports.
Apache Spark is a popular open-source distributed-processing framework that enables efficient processing of massive amounts of data. It has a large number of parameters that need to be tuned to get the best performance. However, tuning these parameters manually is a complex and time-consuming task. Therefore, a robust performance model to predict applications execution time could greatly help in accelerating the deployment and optimization of big data applications relying on Spark. In this paper, we ran extensive experiments on a selected set of Spark applications that cover the most common workloads to generate a representative dataset of execution time. In addition, we extracted application and data features to build a machine learning-based performance model to predict Spark applications execution time. The experiments show that boosting algorithms achieved better results compared to other algorithms.
The paper will be published at the Big Data congress 2020 taking place in Hawaii. In the meantime, do not hesitate to contact our R&D department at firstname.lastname@example.org to discuss how you can optimise distributed processing frameworks in your projects.
Florian Demesmaeker, Amine Ghrab, Usama Javaid, Ahmed Amir Kanoun, A Performance Prediction Model for Spark Applications, in the proceedings of Big Data congress 2020.