IndusWX
Contact us
MAUSAM · Monsoon Assessment of AI & Unified Surface Analyses for Meteorology

An observations-focused assessment of global AI weather models during the South Asian monsoon.

AI weather models are routinely ranked against ECMWF's reanalysis — but reanalysis inherits the same sparse inputs the models do. Evaluating against the in-situ observation record tells a different story: systematic biases, hidden by reanalysis-centric evaluation, surface immediately.

Full paper Read on arXiv →
01 / ModelsAI Weather Prediction

Six production-grade AI weather models, evaluated head-to-head.

All models selected for 0.25° × 0.25° spatial resolution (≈25 km at the tropics) and 6-hourly forecast output. Aurora is included at its 0.25° checkpoint for parity. GenCast is treated separately for cyclone tracks (probabilistic, 12-hourly, 32-member).

01

FourCastNet

Vision Transformer · Adaptive Fourier Neural Operators

Pioneering transformer-based global forecaster. Trained on ERA5 (1979–2015), validated 2016–17, tested 2018+.

Levels4 pl
Sfc varsT2 · U10 · V10 · SP · MSLP
02

FourCastNet-SFNO

Vision Transformer · Spherical Fourier Neural Operators

Update to FourCastNet using spherical operators. 13 pressure levels; same training regime on ERA5.

Levels13 pl
Pl varsT · U · V · Z · RH
03

Pangu-Weather

3D Earth-Specific Transformer

Trained on ERA5 1979–2017, validated 2019, tested 2018. The 6-hourly checkpoint is used here for consistency.

Levels13 pl
Pl varsT · U · V · Z · Q
04

GraphCast

Graph Neural Network · Encoder / Decoder

Graph-based attention. Trained on ERA5 (1979–2018), fine-tuned on ECMWF HRES (2016–2021). 37 pressure levels.

Levels37 pl
Pl varsT · U · V · Z · Q · W
05

Aurora

3D Swin Transformer · U-Net Foundation Model

Foundation model pre-trained on 16 datasets (ERA5, HRES, IFS/GEFS ensembles, CMIP6, MERRA-2, CAMS). Used at 0.25° checkpoint.

Levels13 pl
Pl varsT · U · V · Q · Z
06

AIFS Deterministic

GNN Encoder / Decoder · Swin Processor · Reduced Gaussian Grid

ECMWF's data-driven system. Trained on ERA5 1979–2018, fine-tuned on operational analyses. Outputs regridded from N320 to 0.25°.

Levels13 pl
ExtrasTCW · TP · CP · Cloud cover
07

GenCast Ensemble

Conditional Diffusion Transformer

The only probabilistic AWP in the assessment. Trained on ERA5 1979–2018. 12-hourly cadence, 32-member ensemble used for cyclone trajectory analysis.

Levels13 pl
Members32 (prob.)

Baselines

Traditional NWP

ECMWF HRES — 9 km deterministic, 12-hourly, 10-day lead. IFS Ensemble — 50-member, 18 km, 15-day lead. IFS Ensemble Mean — the mean of the 50-member ensemble.

Year2022
SourceWeatherBench GCS
02 / Headline resultWeatherBench, re-run

Observational validation exposes systematic biases masked by reanalysis-centric evaluation.

Mean absolute error for six AIWPs, scored against three different references: Indian Meteorological Department station observations (rows labelled a, d, g, j, m, p), ERA5 analysis (b, e, h, k, n, q), and ECMWF HRES operational forecasts (c, f, i, l, o, r), across lead times 1–10 days for 2022. Color encodes percentage difference in MAE relative to ECMWF's IFS HRES. Blue is better; red is worse.

Scale range
Reference IFS HRES · 2022
Fig. 01 Surface variables · 2022
−50%−25%0+25%+50%
Better   % diff in MAE vs IFS HRES   Worse
Why this matters

Every AI weather model in the study shows substantially larger errors against station observations than against ERA5 — and the gap widens with lead time. Reanalysis is not ground truth; it is a model output conditioned on the same sparse inputs. For South Asia, where station density is already low, the evaluation layer matters as much as the forecast layer.

03 / ValidationReanalysis & observations

What we evaluated against.

The benchmark is only as honest as its reference data. MAUSAM pairs conventional reanalysis against a layered stack of in-situ and satellite observations — the measurements models rarely see.

a. Reanalysis

ERA5

6-hourly, 0.25° × 0.25° reanalysis, 2021–2024. The default baseline in the AWP literature. Copernicus CDS.

b. Stations

MeteoStat

Hourly point-based surface observations from IMD's weather station network, accessed through the MeteoStat Python API. Used to validate T2, U10, V10, and precipitation during extreme events.

c. Gridded rainfall

IMD 0.25°

Daily-averaged gridded rainfall from up to 6,995 rain gauges across India. Used for 2022 and 2024 monsoon-season verification.

d. Satellite

INSAT-3DS

Geostationary cloud-top products (processed clear-sky cloud fraction at 0.5°), used to validate total cloud cover diagnostics from AIFS. Source: ISRO MOSDAC.

e. Best tracks

IBTrACS

Best-track cyclone trajectory data for Tauktae (2021) and Yaas (2021), used to benchmark deterministic tracks, the 50-member IFS ensemble, and the 32-member GenCast ensemble.