Benchmarks — INDUS

An observations-focused assessment of global AI weather models during the South Asian monsoon.

AI weather models are routinely ranked against ECMWF's reanalysis — but reanalysis inherits the same sparse inputs the models do. Evaluating against the in-situ observation record tells a different story: systematic biases, hidden by reanalysis-centric evaluation, surface immediately.

FourCastNet

Vision Transformer · Adaptive Fourier Neural Operators

Pioneering transformer-based global forecaster. Trained on ERA5 (1979–2015), validated 2016–17, tested 2018+.

Levels4 pl

Sfc varsT2 · U10 · V10 · SP · MSLP

FourCastNet-SFNO

Vision Transformer · Spherical Fourier Neural Operators

Update to FourCastNet using spherical operators. 13 pressure levels; same training regime on ERA5.

Levels13 pl

Pl varsT · U · V · Z · RH

Pangu-Weather

3D Earth-Specific Transformer

Trained on ERA5 1979–2017, validated 2019, tested 2018. The 6-hourly checkpoint is used here for consistency.

Levels13 pl

Pl varsT · U · V · Z · Q

GraphCast

Graph Neural Network · Encoder / Decoder

Graph-based attention. Trained on ERA5 (1979–2018), fine-tuned on ECMWF HRES (2016–2021). 37 pressure levels.

Levels37 pl

Pl varsT · U · V · Z · Q · W

Aurora

3D Swin Transformer · U-Net Foundation Model

Foundation model pre-trained on 16 datasets (ERA5, HRES, IFS/GEFS ensembles, CMIP6, MERRA-2, CAMS). Used at 0.25° checkpoint.

Levels13 pl

Pl varsT · U · V · Q · Z

AIFS Deterministic

GNN Encoder / Decoder · Swin Processor · Reduced Gaussian Grid

ECMWF's data-driven system. Trained on ERA5 1979–2018, fine-tuned on operational analyses. Outputs regridded from N320 to 0.25°.

Levels13 pl

ExtrasTCW · TP · CP · Cloud cover

GenCast Ensemble

Conditional Diffusion Transformer

The only probabilistic AWP in the assessment. Trained on ERA5 1979–2018. 12-hourly cadence, 32-member ensemble used for cyclone trajectory analysis.

Levels13 pl

Members32 (prob.)

—

Baselines

Traditional NWP

ECMWF HRES — 9 km deterministic, 12-hourly, 10-day lead. IFS Ensemble — 50-member, 18 km, 15-day lead. IFS Ensemble Mean — the mean of the 50-member ensemble.

Year2022

SourceWeatherBench GCS

Observational validation exposes systematic biases masked by reanalysis-centric evaluation.

Mean absolute error for six AIWPs, scored against three different references: Indian Meteorological Department station observations (rows labelled a, d, g, j, m, p), ERA5 analysis (b, e, h, k, n, q), and ECMWF HRES operational forecasts (c, f, i, l, o, r), across lead times 1–10 days for 2022. Color encodes percentage difference in MAE relative to ECMWF's IFS HRES. Blue is better; red is worse.

An observations-focused assessment of global AI weather models during the South Asian monsoon.

Six production-grade AI weather models, evaluated head-to-head.

FourCastNet

FourCastNet-SFNO

Pangu-Weather

GraphCast

Aurora

AIFS Deterministic

GenCast Ensemble

Baselines

Observational validation exposes systematic biases masked by reanalysis-centric evaluation.

What we evaluated against.

a. Reanalysis

b. Stations

c. Gridded rainfall

d. Satellite

e. Best tracks