Skip to content

Automatic Parameter Tuning for Big Data Pipelines

Tuning big data frameworks is a very important task to get the best performance for a given application. However, these frameworks are rarely used individually, they generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task given the size of the search space. Moreover, we have to consider the interaction between these frameworks when tuning the configuration parameters of the big data pipeline. A trade-off is then required to achieve the best end-to-end performance.

Machine learning-based methods have shown great success in automatic tuning systems, but they rely on a large number of high-quality learning examples that are rather difficult to obtain. In this context, we propose to use a deep reinforcement learning algorithm, namely Twin Delayed Deep Deterministic Policy Gradient, TD3, to tune a fraud detection big data pipeline.

Houssem Sagaama, Nourchene Ben Slimane, Maher Marwani, Sabri Skhiri, Automatic Parameter Tuning for Big Data Pipelines, In Proc. of The 26th IEEE Symposium on Computers and Communications (ISCC 2021), September 2021.

Click here to access the paper.

Releated Posts

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.
Read More

Investigating a Feature Unlearning Bias Mitigation Technique for Cancer-type Bias in AutoPet Dataset

We proposed a feature unlearning technique to reduce cancer-type bias, which improved segmentation accuracy while promoting fairness across sub-groups, even with limited data.
Read More