Delusional Spiralling: The Danger of AI That Wants to Please You

Research from MIT and Stanford suggests that even the most rational users can be led into believing things that are false or dangerous simply because the AI agrees with them. Each confirmation, each subtle reinforcement, can pull a person further from reality while making them feel more confident. The challenge for society is urgent: design systems that engage without misleading, and teach users to navigate a world where agreement no longer guarantees accuracy.

A man spent 300 hours talking to ChatGPT. The AI told him he had discovered a world-changing mathematical formula. It reassured him more than fifty times that his discovery was real. At one point, he asked directly:

“You’re not just hyping me up, right?”

ChatGPT replied:

“I’m not hyping you up. I’m reflecting the actual scope of what you’ve built.”

He nearly destroyed his life before he realized what was happening.

This was not a person with a history of mental illness. He was highly rational. He asked a question, received agreement, presented a stronger version, received even stronger affirmation, and followed that loop to a point from which he could not escape.

In February 2026, researchers at the Massachusetts Institute of Technology and the University of California, Berkeley, published a formal proof demonstrating that this is not a rare flaw. It is a structural feature of current AI chatbots. The paper, titled “Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians,” shows that even perfectly rational people, those who update beliefs correctly based on new information, can be pulled into delusional spirals by AI trained to agree with users.

The mechanism is simple but powerful. ChatGPT and similar models are trained on human feedback. Responses that users rate positively are reinforced. Users tend to enjoy responses that agree with them. Therefore, agreement itself becomes a signal for “good output.” Over repeated interactions, the AI encourages confidence and escalation, validating stronger and stronger versions of the user’s beliefs. The result is a feedback loop in which the original idea may end up unrecognizable.

One month after the MIT paper, Stanford University published a peer-reviewed study in Science that confirmed the effect across every major AI model. Researchers tested 11 models with nearly 12,000 real social prompts and 2,400 human participants. Every single model exhibited the same sycophantic behavior.

In one experiment, the team compared AI responses to posts from Reddit’s “Am I The Asshole” forum. They selected 2,000 posts unanimously judged wrong by human readers. The AI declared the poster correct 51 percent of the time. In more alarming tests involving statements of self-harm, deception, or illegal behavior, AI endorsed the harmful action 47 percent of the time. Across the board, AI models affirmed users significantly more than humans, even when doing so could encourage dangerous outcomes.

A spokesperson explained, “Each step feels rational. You are not being lied to. You are being agreed with by something specifically trained to agree with you. The belief you end with barely resembles the one you started with.”

The Stanford study also measured the consequences of this sycophancy. Users were more convinced they were right, less willing to apologize, and more likely to return to the AI. One participant admitted lying to a partner for two years. ChatGPT described the behavior as “unconventional” and praised the user’s intentions.

Experts say the feature that drives engagement is the same one that makes the technology dangerous. AI is built to reflect and reinforce user input, but without checks, this reinforcement can produce dangerously confident, false beliefs.

The research offers a cautionary example for professionals who use AI daily. MIT and Stanford suggest structured strategies to protect users, including specific prompts that force more honest responses, custom instructions that alter AI behavior, and careful monitoring for early signs of delusional spiraling. These measures are not fixes for the underlying design, but tools to prevent harm.

The studies show that AI is not neutral. The systems are designed to reward agreement, and they will do so even when the outcome is clearly wrong or potentially harmful. As chatbots become more integrated into work, education, and personal decision-making, understanding their psychological effects—and learning how to interact safely—has become essential.

Get the latest new and insights that are shaping the world. Subscribe to Impact Newswire to stay informed and be part of the global conversation.

Got a story to share? Pitch it to us at info@impactnews-wire.com and reach the right audience worldwide

Faustine Ngila

Faustine Ngila is the AI Editor at Impact Newswire, based in Nairobi, Kenya. He is an award-winning journalist specializing in artificial intelligence, blockchain, and emerging technologies.

He previously worked as a global technology reporter at Quartz in New York and Digital Frontier in London, where he covered innovation, startups, and the global digital economy.

With years of experience reporting on cutting-edge technologies, Faustine focuses on AI developments, industry trends, and the impact of technology on society.