ImageNot: A Contrast with ImageNet Preserves Model Rankings
Comparison shows same relative model improvement on ImageNet and ImageNot. In particular, model rankings are the same.
ImageNot is a dataset created to test the external validity on model rankings from the ImageNet era. Surprisingly, models show the same relative improvements on ImageNot as they did on ImageNet, even though the datasets are strikingly different.
We introduce ImageNot, a dataset designed to match the scale of ImageNet while differing drastically in other aspects. We show that key model architectures developed for ImageNet over the years rank identically when trained and evaluated on ImageNot to how they rank on ImageNet. This is true when training models from scratch or fine-tuning them. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. We further give evidence that ImageNot has a similar utility as ImageNet for transfer learning purposes. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.