What do LLMs cost the environment?
Large Language Models have a significant and growing environmental impact that is mostly hidden.

To conclude this initial look at the impact of Large Language Models (LLMs) this week I’ll explore what is known about their environmental impact.
As with ROI and efficiency claims, getting high-quality information on the environmental impact of LLMs is extremely difficult.
Unfortunately there are a lot of vague numbers and claims that have little to no evidence1 backing them, and highly specific releases that are hard to judge or abstract to the wider ecosystem of providers.
After numerous web searches and reading various articles and papers, I’m left with one preprint that seemed to provide relevant detail.
The Study Preprint
This is the best source I’ve been able to find covering the impact of LLM training and use: Holistically Evaluating The Environmental Impact Of Creating Language Models, by researchers at the Allen Institute for AI and Carnegie Mellon University. It’s worth noting that this is a preprint, and so worth approaching with some skepticism. In my reading I didn’t see any red flags.
This paper deals with the training of a custom LLM, in four different sizes. To better understand the differences between creating a model to be ran on a local device (laptop or phone) and a consumer-grade one which would be run in a datacenter. These models were not deployed, so the costs of generating responses to user queries is simulated.
Training a LLM
The four variations of the OLMo model they built went from <1 billion to 13 billion parameters, plus a “Mixture of Experts” model2.
Because they’re training the models themselves they were able to execute detailed performance monitoring, to get a better understanding of energy use throughout the different phases of training across varied sizes.
It’s worth pointing out that the models they’re making are much smaller than the commercially available models from companies such as Google, OpenAI, Anthropic, etc.
While we don’t have access to detailed information about the environmental impact of those, a rough estimate based on this set of experiments and scaled up 10-100x should establish general guidelines.
To help put things in perspective, I checked the Ollama Library’s recent models to check their sizes. I was able to find models with: 397B, 744B, 671B, 80b, and 123B parameters. I think it’s fair to expect the corporate versions to be at the upper end of this range, and likely much higher. We can also expect that the training process will be more sophisticated and complex, though how that would affect the outcome is unclear.
What did they find?
Development
| size | CO2 emissions (1 US household) | Water use (1 person) |
|---|---|---|
| <1B | 1.33 yr | 3 months |
| 7B | 13.5 yr | ~2.5 yr |
| 13B | ~9.5 yr | ~3.5 yr |
| MOE | 1.33 yr | 3 months |
| Total | ~33 yr | ~7.5 yr |
All of this is determined by the specifics of their servers and energy suppliers, so it won’t be 100% accurate for other systems. But it sets general expectations that can be built on.
I did some back of the envelope math extrapolating these results to the largest model I found (744B, which is ~57x the 13B model). While things won’t scale linearly, and another preprint covers how complex the interactions of training choices are3; this should at least illustrate the range a hyperscalar is near.
To get a general idea of what a production model might use, let’s multiply those test results by 57.
| test size x57 | test CO2 emissions x57 | test Water user x57 |
|---|---|---|
| <1B | ~75 yr | 14.25 yr |
| 7B | 769.5 yr | 142.5 yr |
| 13B | 541.5 yr | 199.5 yr |
| MOE | ~75 yr | 14.25 yr |
| Total | 1,461 yr | 370.5 yr |
The paper mentions that they run a number of experiments to calibrate and stabilize their training setup for the final run, which are included in the above totals.
Final Training This was followed by a final training, leveraging the results of the early development runs.
I’ll forego the graph for this due to the size, but I recommend checking it out on page 7 of the paper. Here are their totals, and what a possible production sized model version would cost.
| CO2 emissions (1 US household) | Water use (1 person) |
|---|---|
| 65 yr | ~17 yr |
| CO2 emissions x57 | Water use x57 |
|---|---|
| 3,705 yr | 969 yr |
While we can’t say how accurate these numbers would be for a production model, these seem like a workable starting point.
Measuring a Deployed LLM
The numbers for deployment and how many requests it would take to equal and then exceed training cost (in CO2 and water use) are based on simulations. They benchmarked 2400 requests at varying rates, and projected the amount needed to reach parity with training.
The low end of the range was 441 million requests, and the high end was 29.8 billion requests.
If we go back to our earlier 57x model size to try and project to a production model, we would have a minimum of ~25 billion and 1.698 trillion requests before usage met/exceeded training.
For an individual user, these numbers would be impossible to reach, but in a production setting, especially where LLM interfaces are being added to a huge variety of applications, these numbers don’t seem so unreachable.
Other Costs
Unsurprisingly, the impact of manufacturing the hardware these models run on is at least as difficult to nail down as training and deployment.
This paper uses the same estimate as a previous paper from 2023, where a single GPU equals 463kg of CO2, and ~100 liters of water. These numbers are relatively low when compared to the training cost, but each datacenter will be hosting hundreds to thousands or even millions of GPUs and other electronics in their racks. Meaning these amounts will compound quickly.
The authors estimate an average 4 year lifespan per GPU.
A question I have that I haven’t seen covered anywhere is the amount of e-waste a datacenter produces, either in a year or over its lifetime. Given how little e-waste is recovered or recycled, I’d expect this could be a significant cost area as well.
Overall this paper is only the start, and one example of the costs of training a full LLM. There are a number of assumptions made that they couldn’t test, and a wide variety of factors that would change final costs - though how much is impossible to say until more information becomes available.
Conclusion
I have seen multiple articles calling for greater transparency34, and discussing how companies in this space seem to be obscuring the impact of datacenters and LLM deployment56. This paper puts things into a more relatable context, which is helpful for individuals that aren’t deeply engaged in either tech or environmental spaces.
The numbers the authors arrive at combined with my calculations of what a enterprise scale LLM might mean for CO2 and water usage are startling.
If we look at Hugging Face, we see that they list over 2.5 million models in total, over 3 thousand of which are over 128B parameters. I don’t know what proportion of all LLM models are on Hugging Face, but even assuming the majority are listed there, these numbers show that an enormous amount of resources have been used already.
It is safe to assume that the largest providers are at or above the size estimate of 57x I used, due to the continued focus on scaling. It is also reasonable to assume that they engage in significant ongoing training, reinforcement, etc. making the initial training costs ongoing expenses.
From an environmental impact perspective, there is reason to be skeptical of this technology and it’s impact. Advancements from it must be concrete and widely realizable, and coincide with significant improvements in efficiency before advocating widespread LLM use would seem reasonable.
Takeaway
At this moment the most impactful thing that can be done is to advocate for greater transparency across the industry. Large corporations and research groups need to document and share the actual costs of LLM creation and ongoing deployment, so rational decisions can be made.
Given the speed and scale of datacenter development, pushing at the local level with city councils and development boards to include disclosures as part of any agreement to allow a datacenter to progress seems like it could have an outsized impact. It would only take a handful of such agreements, if adhered to, to greatly improve our understanding.
Some things to push for might include:
- energy consumption and pollution over hosted LLM life-cycle
- energy consumption and pollution from hardware (e-waste)
- energy use and water consumption (for cooling)
- analysis of local grid impacts and/or water system stress
Until there’s comprehensive documentation of the actual environmental costs to enable rational decision making; it seems reasonable to minimize LLM use only to situations where there are clear and unique benefits.
Next Time
I’ll be wrapping this series up by reviewing the past three posts on LLMs, and briefly touching on some other articles that have come up in the meantime.
Photo credit: Photo by Marcus Woodbridge on Unsplash
-
This post and the linked report, while specifically about the discourse around possible positive environmental impacts, matches my experience searching for measurements of current impacts: Don’t believe the hyperscalers! AI can’t cure the climate crisis ↩︎
-
Mixture of Experts models have specialized subsets which are used together to address a query. For instance: a portion of the full model may be trained only on programming languages, and another on English-language literature; a Python question would be routed to the former and a question about Shakespeare to the latter. For more detail see this article from IBM: https://www.ibm.com/think/topics/mixture-of-experts. ↩︎
-
Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View ↩︎ ↩︎
-
Sustainability in large language model supply chains-insights and recommendations using analysis of utility for affecting factors - Nature, 2025. This article discusses the views of individuals in the IT realm, and the surprisingly low value placed on environmental concerns; one of their conclusions is that greater transparency is urgently needed. ↩︎
-
Generative AI’s environmental costs are soaring — and mostly secret - Nature, 2024. ↩︎
-
Real datacenter emissions are a dirty secret - Register, 2025. ↩︎