Anomaly Detection: How to Artificially Increase your F1-Score with a Biased Evaluation Protocol

Data science, Vadgelmir

June 30, 2021

Anomaly detection is a widely explored domain in machine learning. Many models are proposed in the literature, and compared through different metrics measured on various datasets.
The most popular metrics used to compare performances are F1-score, AUC and AVPR.
In this paper, we show that F1-score and AVPR are highly sensitive to the contamination rate.
One consequence is that it is possible to artificially increase their values by modifying the train-test split procedure.
This leads to misleading comparisons between algorithms in the literature, especially when the evaluation protocol is not well detailed.
Moreover, we show that the F1-score and the AVPR cannot be used to compare performances on different datasets as they do not reflect the intrinsic difficulty of modeling such data.
Based on these observations, we claim that F1-score and AVPR should not be used as metrics for anomaly detection. We recommend a generic evaluation procedure for unsupervised anomaly detection, including the use of other metrics such as the AUC, which are more robust to arbitrary choices in the evaluation protocol.

Damien Fourure*, Muhammad Usama Javaid*, Nicolas Posocco*, Simon Tihon*, Anomaly Detection: How to Artificially Increase your F1-Score with a Biased Evaluation Protocol, In Proc. of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.

* equal contributions

Click here to access the paper.

Releated Posts

Insights from GTC Paris 2025

25.06.2025 / Engineering / Blog, Event

Among the NVIDIA GTC Paris crowd was our CTO Sabri Skhiri, and from quantum computing breakthroughs to the full-stack AI advancements powering industrial digital twins and robotics, there is a lot to share! Explore with Sabri GTC 2025 trends, keynotes, and what it means for businesses looking to innovate.

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

23.05.2025 / Engineering / Academic collaborations, Papers

Tumor tracking in PET/CT is essential for monitoring cancer progression and guiding treatment strategies. Traditionally, nuclear physicians manually track tumors, focusing on the five largest ones (PERCIST criteria), which is both time-consuming and imprecise. Automated tumor tracking can allow matching of the numerous metastatic lesions across scans, enhancing tumor change monitoring.

Anomaly Detection: How to Artificially Increase your F1-Score with a Biased Evaluation Protocol

Releated Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Recent Posts

Insights from GTC Paris 2025

Development & Evaluation of Automated Tumour Monitoring by Image Registration Based on 3D (PET/CT) Images

Insights from Data & AI Tech Summit Warsaw 2025

Insights From Flink Forward 2024

Tracks

Mjolnir

Rune

Vadgelmir

Yggdrasil

Field of expertises

Data architecture

Data governance

Data science

Engineering

Academic collaboration

SERVE

Expertise

CRAFT

digazu

CONTACT

Belgium

France

Tunisia

CAREER

Job Offers

Social media