Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
ArXivWe study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.
| Author(s): | Dorner, Florian E. and Hardt, Moritz |
| Links: | |
| Book Title: | Proceedings of the 41st International Conference on Machine Learning (ICML 2024) |
| Year: | 2024 |
| Month: | July |
| Publisher: | PMLR |
| Project(s): | |
| BibTeX Type: | Conference Paper (inproceedings) |
| Event Name: | The Forty-First International Conference on Machine Learning (ICML) |
| State: | Published |
| URL: | https://proceedings.mlr.press/v235/dorner24a.html |
| Electronic Archiving: | grant_archive |
BibTeX
@inproceedings{dorner2024dontlabel,
title = {Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget},
booktitle = {Proceedings of the 41st International Conference on Machine Learning (ICML 2024)},
abstract = {We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.},
publisher = {PMLR},
month = jul,
year = {2024},
author = {Dorner, Florian E. and Hardt, Moritz},
url = {https://proceedings.mlr.press/v235/dorner24a.html},
month_numeric = {7}
}