Skip to content

Estimating Expected Calibration Errors

Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken.
Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities.
Hence being able to calibrate these models, or enforce calibration while learning them, has regained interest in recent literature.
In this context, properly assessing calibration is paramount to quantify new contributions tackling calibration.
However, there is room for improvement for commonly used metrics and evaluation of calibration could benefit from deeper analyses.
Thus this paper focuses on the empirical evaluation of calibration metrics in the context of classification.
More specifically it evaluates different estimators of the Expected Calibration Error ($ECE$), amongst which legacy estimators and some novel ones, proposed in this paper.
We build an empirical procedure to quantify the quality of these $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.

Nicolas Posocco, Antoine Bonnefoy, Estimating Expected Calibration Errors, In Proc. of the The 30th International Conference on Artificial Neural Networks, 2021.

Watch the presentation on YouTube.

Click here to access the paper.

Releated Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

Traditional RAG systems struggle to capture relationships and cross-references between different sources unless explicitly mentioned. This challenge is common in real-world scenarios, where information is often distributed and interlinked, making graphs a more effective representation. Our work provides a technical contribution through a comparative evaluation of retrieval strategies within GraphRAG, focusing on context relevance rather than abstract metrics. We aim to offer practitioners actionable insights into the retrieval component of the GraphRAG pipeline.
Read More

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.
Read More