The collapse in AI inference costs

The cost of running an AI model at GPT-3.5-equivalent quality fell from about USD 20 per million tokens in late 2022 to roughly USD 0.07 per million tokens by October 2024 — a more than 280-fold reduction in about 18 months. Capable AI has become dramatically cheaper to deploy.

20 $/M tokensNov 20220.07 $/M tokensOct 2024
Cost to query a GPT-3.5-level model, USD per million tokens (Stanford AI Index 2025).

Source: Stanford HAI — AI Index 2025: State of AI in 10 Charts (2025)

What it means

A 280-fold fall in the cost of capable AI in under two years is why applications that were uneconomic in 2023 — analysing every maintenance log, every sensor stream, every quality image — are now affordable to run continuously. For an operator the practical message is that the budget barrier to applying AI across operations has largely disappeared.

Context

The Stanford AI Index tracks the price of achieving a fixed quality threshold (about 64.8% on the MMLU benchmark) rather than the price of a single named model. Depending on the task, the report finds inference prices falling anywhere from 9 to 900 times per year. Because the metric holds quality fixed while hardware and models improve, it captures genuine economic gains rather than simple discounting.

Related charts

Related topics

All industrial data & charts →