<p>Lake ice phenology has been used extensively to study the impacts of anthropogenic climate change, owing to the widespread occurrence of lake ice and the length of time series available for such studies. The proliferation of process-based lake models and gridded climate data have enabled the modeling of ice phenology across broad spatial scales, for example where lakes are not sampled. In this study, we used ice phenology outputs from an ensemble of lake-climate model projections to directly compare their performance with in situ data. Generally, we found that the lake models captured the range of variability of observational records (RMSE ice on = 22.9 days [4.7, 95.4]; RMSE ice off = 17.4 days [6.1, 76.5]), and particularly the long-term trends in temperate regions. However, the models performed poorly in extremely warm years or when there were rapid short-term changes in ice phenology. The location of the lakes, such as latitude and longitude, as well as lake morphology, such as lake depth and surface area, significantly influenced model performance. For example, the models performed best in shallow small lakes and worst in deep larger lakes. Our analysis suggests that the lake models tested can reliably estimate long-term trends in lake ice cover, particularly when averaged across large spatial scales, but widespread in situ observations are critical to capture extreme events.</p>