Fine-grained visual recognition skills are vital to many expert domains, yet understanding how humans acquire such expertise remains an open challenge. We introduce
CleverBirds, a large-scale benchmark for knowledge tracing in fine-grained visual recognition. The dataset contains 17.9 million multiple-choice questions from 40,144 participants across 10,779 bird species, with an average of 444 questions per participant. This dataset was introduced in
CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing, to appear at NeurIPS 2025 (Datasets and Benchmarks track).
CleverBirds enables us to study how individuals learn to recognize fine-grained visual distinctions over time. We evaluate state-of-the-art knowledge tracing methods on this benchmark and find that tracking learner knowledge across participant subgroups and question types is challenging, with different forms of contextual information providing varying degrees of predictive benefit.