Certain AI prompts generate 50x more CO₂ than others

Certain AI prompts generate 50x more CO₂ than others

In recent years, researchers and climate advocates have been ringing the alarm about artificial intelligence’s impact on the environment. Advanced and increasingly popular large language models (LLMs)—such as those offered by OpenAI and Google—reside in massive data centers that consume significant amounts of electricity and water to cool servers. Every time someone types a question or phrase into one of these platforms, the energy used to generate a response produces a measurable amount of potentially harmful CO₂. But, according to a new research published in Frontiers in Communication, not all of those prompts leave have the same environmental impact. Not even close. 

The study looked at 14 different LLMs, each varying in the size of their training data, and evaluated their performance using a standardized set of 500 questions across different subject areas. Each model generates a certain number of “thinking tokens” per query, and those tokens correlate with CO₂ emissions. When the researchers compared the responses, they found that more complex “reasoning models”—which have larger training sets and take longer to process and respond—produced significantly more CO₂ than smaller, more efficient “concise models.” In some cases, reasoning models generated up to 50 times the emissions of their more concise counterparts. 

Aside from the models themselves, the amount of CO₂ generated by prompts also varied based on subject matter. More complex or open-ended questions, such as those involving advanced algebra or philosophy, tended to produce a larger carbon output than simpler prompts, like high school history questions. These findings shed further light on the often-overlooked ways AI models contribute to soaring energy consumption.

Related: [AI will require even more energy than we thought]

“The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions,” Maximilian Dauner, PhD student at Hochschule München University of Applied Sciences and paper author, said in a statement. 

What are reasoning models?

Reasoning models—sometimes called “thinking models”—refer to large LLMs optimized for solving more complex tasks that require logic, step-by-step breakdowns, or detailed instructions. These models often go by different names. At OpenAI, for example, GPT-4o and GPT-4o-mini are considered “generalized” models, while versions like o1 and o3-mini are classified as reasoning models. 

Reasoning models employ what some LLM researchers call “chain-of-thought” processing, allowing them to respond more deliberately than generalized models, which prioritize speed and clarity. The end goal is for reasoning models to generate more human-like responses. The most obvious by-product of that, for anyone who has used these, is that reasoning models take longer to generate answers. 

Microsoft Chairman and Chief Executive Officer Satya Nadella (L), speaks with OpenAI Chief Executive Officer Sam Altman, who joined by video during the Microsoft Build 2025, conference in Seattle, Washington on May 19, 2025. (Photo by Jason Redmond / AFP)
Microsoft Chairman and Chief Executive Officer Satya Nadella (L), speaks with OpenAI Chief Executive Officer Sam Altman, during the Microsoft Build 2025, conference. Image: Jason Redmond / AFP JASON REDMOND

The researchers found that the reasoning models generated significantly more tokens, which correlate with CO₂ emissions, than the more concise models. (Tokens refer to words or parts of words that are converted into numerical representations the LLM can understand.) 

The testing occurred in two phases. In the first phase, researchers asked the same multiple-choice questions to models. The next, free response phrase, had the models provide written responses. On average, reasoning models generated 543.5 tokens per question, compared to just 37.7 tokens for concise models. The most accurate reasoning model they examined, called “Cogito,” produced three times as much CO₂ as similarly sized models optimized for concise responses. 

“From an environmental perspective, reasoning models consistently exhibited higher emissions, driven primarily by their elevated token production,” the researchers write in the paper. 

While the difference in emissions per individual prompt might seem marginal, it can make a real difference when scaled up. The researchers estimate that asking DeepSeek’s R1 model to answer 600,000 questions would generate roughly the same amount of CO₂ as a round-trip flight from London to New York. By comparison, you could ask the non-reasoning Qwen 2.5 model three times as many questions before reaching the same level of emissions.

Overall, the researchers say that their findings highlight a fundamental trade-off between LLM accuracy and environmental sustainability.

“As model size increases, accuracy tends to improve,” the researchers said. “However, this gain is also linked to substantial growth in both CO₂ emissions and the number of generated tokens.”

Energy-hungry AI models are fueling a boom in new power plants 

The findings come amid a fierce global race among tech companies to develop increasingly advanced AI models. Over the past year alone, Apple has announced plans to invest $500 billion in manufacturing and data centers over the next four years. Similarly, Project Stargate—a joint initiative by OpenAI, SoftBank, and Oracle—has also pledged to spend $500 billion on AI-focused data centers. Researchers warn that this surge in infrastructure could place additional strain on already overburdened energy grids.

AI applications, in particular, play an outsized role in the energy consumption of newer data centers. A recent report in the MIT Technology Review notes that starting around 2017, data centers began incorporating more energy-intensive hardware specifically designed for complex AI computations. Energy use surged after that. The Electric Power Research Institute (EPRI) estimates that data centers supporting advanced AI models could account for up to 9.1 percent of the United States’ total energy demand by the end of the decade—up from approximately 4.4 percent today.

Companies are scrambling to find new ways to meet this growing energy demand. Meta, Google, and Microsoft have all partnered with nuclear power plants to generate more electricity. Microsoft, one of OpenAI’s primary partners, even signed a 20-year agreement to source energy from the Three Mile Island nuclear facility in Pennsylvania, a site once known for the worst reactor accident in U.S. history. 

 in this aerial view, the shuttered Three Mile Island nuclear power plant stands in the middle of the Susquehanna River
The shuttered Three Mile Island nuclear power plant stands in the middle of the Susquehanna River near Middletown, Pennsylvania. The plant’s owner, Constellation Energy, plans to spend $1.6 billion to refurbish the reactor that it closed five years ago and restart it by 2028 after Microsoft recently agreed to buy as much electricity as the plant can produce for the next 20 years to power its growing fleet of data centers. Image: Chip Somodevilla/Getty Images Chip Somodevilla

Meta is also making major investments in geothermal technology as a less fossil fuel–intensive way to generate power. Others, like OpenAI CEO Sam Altman, who has said the coming age of AI will require an “energy breakthrough” are investing in experimental nuclear fusion. These investments may help companies make progress, but recent research indicates it’s almost certain that more fossil fuels—namely natural gas—will be needed to fully meet AI’s massive energy demand.

Related: [The future of AI is even more fossil fuels]

That may all sound daunting, but the researchers comparing different types of models say their findings could help empower everyday AI users to take steps to reduce their own carbon impact. If users understand how much more energy-intensive reasoning models are, they may choose to use them more sparingly and rely on concise models for general everyday tasks, such as web searches and answering basic questions.

“If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies,” Dauner said.

 

More deals, reviews, and buying guides

 

Mack DeGeurin is a tech reporter who’s spent years investigating where technology and politics collide. His work has previously appeared in Gizmodo, Insider, New York Magazine, and Vice.


Leave a Comment

Your email address will not be published. Required fields are marked *