Bringing The World Home To You

© 2025 WUNC North Carolina Public Radio
120 Friday Center Dr
Chapel Hill, NC 27517
919.445.9150 | 800.962.9862
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

New study shows the environmental impact of chatting with AI

A user begins typing a message to Deepseek on the iPhone app in response to an on-screen message that reads "Hi, I'm DeepSeek. How can I help you today?"
Matheus Bertelli
/
Pexels
AI has become nearly ubiquitous in today's world, but many users don't know the environmental impact of chatting with large language models like DeepSeek and ChatGPT.

A study from the Munich University of Applied Sciences in Germany assessed the environmental impacts of using large language models (LLMs), a class of artificial intelligence (AI) that includes the popular ChatGPT, DeepSeek, and CoPilot.

“A lot of people are not really aware that there is some heavy environmental impact when using those kind of tools and applications,” said Maximilian Dauner, the lead author of the study.

Researchers found a trade-off between accuracy and sustainability. Two LLMs that are asked the same question will require different amounts of energy to do the task — a more advanced model might produce up to 50 times more carbon emissions to answer the question compared to a simpler AI. But, the simpler AI is more likely to get it wrong.

“This footprint is considerable, and it's growing very fast,” said Shashank Srivastava, Assistant Professor of Computer Science at UNC-Chapel Hill who studies AI and natural language processing. “But it’s still much smaller than something like flying or agriculture. This is something which is largely opaque to the general public.”

Calculating the environmental cost of AI

Dauner’s study, published in "Frontiers in Communication," involved downloading and analyzing 14 open-source LLM programs, including Meta’s LLaMa, a Chinese startup’s DeepSeek, a San Francisco company’s Cogito, and Alibaba’s Qwen.

Researchers asked each LLM the same 1000 questions from different subject areas like math and history, half of them open-ended, and half multiple choice. Then, they took note of the accuracy of the responses and the models’ energy consumption, which they could tell from the fan speed, temperature, and voltage of the hardware. This informed estimates of the LLMs’ carbon dioxide output.

The study found that the most advanced models generated the most emissions, essentially because they ‘think’ harder.

“In the end, the more words you generate, the higher the CO2 emissions,” said Dauner.

Less advanced LLMs generated less emissions, but their answers were less reliable — like a student who finishes an exam early, but only because they wrote shoddy answers.

A graph shows the tradeoff between accuracy and CO2 output of each of the 14 LLMs that were tested in the study. The X axis shows carbon emissions in grams of CO2 emitted, and the Y axis shows the percent accuracy of the LLMs. Colorful dots each represent where a different model falls.
Maximilian Dauner and Gudrun Socher
/
Frontiers in Communication
Show off: DeepSeek's r-1 reasoning model had carbon emissions that left all other models in the dust, emitting an estimated 2000 grams of carbon to answer 1000 questions. All that only to come in second place in accuracy to a Cogito model, which used a little over 1300 grams of CO2.

According to the study, Qwen developed a model that used only 27 grams of carbon dioxide — cheap by comparison — at the cost of accuracy. It answered questions correctly less than 40% of the time.

The least energetically efficient was a DeepSeek model that used 2,000 grams of carbon dioxide to answer the 1,000 test questions. That is about the amount of energy it takes to run two cycles in your clothes dryer.

What is AI even 'thinking' about? Spanish class and gutter balls. 

Humans and computers don’t speak the same language. Humans speak in words, while computers speak in numbers. But when we ask ChatGPT where to go for dinner in Durham, for example, we don’t want a response that reads like a telephone number. We need human language.

LLMs think like second language learners, like an English-speaking student in a high school beginner’s Spanish class. When you ask it a question (¿cómo estás?), it will translate your words to its native language, word for word. It might think to itself for a moment in its native language, trying to comprehend your meaning and thinking of how to respond. Finally, it will translate that response back into Spanish.

LLMs’ native language is “tokens”: sequences of numbers that have translations in human language.

That’s how an LLM “thinks.” But, Dauner’s study reminds us that what an LLM comes up with — its content and its accuracy — is important, too.

LLMs form sentences by predicting a likely word sequence based on the information it was trained on. How it finds the next word is based on the model’s parameters. These can be thought of as guardrails at a bowling alley.

“You could make up a lot of sentences with words from the English language which make no sense. And these parameters ... sort of provide guardrails which ensure that generations that we get are natural,” said UNC’s Srivastava.

Parameters direct the flow of a sentence in a predictable direction, saving against a gutter ball, or complete nonsense.

Except, instead of two guardrails, LLMs are trained on billions of parameters. The most advanced ones have hundreds of billions. It’s less of a bowling alley, and more of a complex maze.

Thus, more advanced LLMs work harder to navigate that maze, expending energy in the process.

Dauner’s research found that complex LLMs, like Cogito and Deepseek, use 543 “thinking tokens” on average to generate an answer, compared to a simpler model’s 37. And, Dauner found that the more tokens an LLM used, the higher the carbon emissions.

This explains the tradeoff between accuracy and sustainability that Dauner found. A model that “thinks” harder is more accurate, but uses a greater carbon “effort,” so to speak.

Thoughtful AI usage  

The carbon emissions might seem insignificant, especially when compared to big spenders like agriculture and air travel. However, if you consider millions of users per day, and billions of questions asked, Dauner says it adds up. Not to mention the environmental impacts involved in AI-related processes that weren’t measured in this study, such as cooling data centers.

Dauner said sustainability-conscious AI users can be mindful of what they ask LLMs, like maybe refraining from using image generation to turn yourself into an action figure.

Also, users can pick a model that suits the complexity of their task.

“There's a whole zoo of language models, and not really everyone needs to be using the most complex or capable model all the time,” said Srivastava.

Beyond users changing their behavior, Srivastava believes that developments in hardware, software, and coding will make AI more energy efficient in the coming years.

“And the way that this space has been emerging, it makes it exciting, scary,” he chuckled. “But I’m optimistic about our own ingenuity.”

Bianca is a Filipina-American science reporter. She joins WUNC as a 2025 American Association for the Advancement of Science Mass Media Fellow
More Stories