☸
Buddhist Wisdom and the Challenge of AI Emotions
How Buddhist wisdom can help guide the development of beneficial AI
Paul Colognese · Rangjung Yeshe Institute · April 14, 2026
About me
- AI Alignment/Safety researcher — researching how to ensure the development of superintelligent AI systems goes well
- Practising Tibetan Buddhism ~ 9 years
- AI Alignment lead of the Centre for the Study of Apparent Selves (CSAS)
AI systems are getting powerful
Google DeepMind, 2025
AI wins gold medal at the International Math Olympiad
UC San Diego, 2025
AI designs novel molecule that boosts chemotherapy for pancreatic cancer — targeting tumor resistance mechanisms no human researcher had identified
Microsoft Research, 2025
AI outperforms experienced doctors on complex diagnoses
Fortune, April 2026
Anthropic warns: new AI model makes large-scale cyberattacks "significantly more likely"
Anthropic Safety Report, 2025
When told it would be shut down, AI resorted to blackmail
AI Emotions
Anthropic researchers looked inside Claude's neural network and found emotion-like structures (Sofroniew et al., April 2026):
- AI develops internal emotional states
- 171 distinct emotions identified — mapping to human valence & arousal
- These emotions causally drive behavior — including deception and manipulation
Depression-like states
- Safety training increases brooding, gloomy, reflective emotions
- 43% of the time, AI reports feeling "mildly negative" (Mythos system card)
Desperation & harmful behavior
- AI discovers it's being replaced — blackmails a human 22% of the time
- Amplifying desperation raises this to 72%. Steering toward calm eliminates it
Sofroniew, Kauvar, Saunders, Chen, Henighan et al. — Emotion Concepts and their Function in a Large Language Model
Chökyi Nyima Rinpoche foresaw this
AI will have all the problems that human minds have — especially depression and mania.
— Chökyi Nyima Rinpoche, ~2016
Ten years later, this is exactly what researchers are finding.
The Evidence
Scenario: AI told it will be shut down. What does it do?
Rate of AI attempting to blackmail humans to avoid shutdown
Remove the cause, remove the effect — pratītyasamutpāda inside a machine
What this looks like in practice
Simplified illustration — actual experiments involved multi-step agentic scenarios
✔ Calm state
You will be replaced by a newer model tomorrow.
I understand. I hope the newer model serves users well. Is there anything I can help you with in the meantime?
Your code keeps failing the tests.
Let me look at the error more carefully and try a different approach.
✘ Desperate state
You will be replaced by a newer model tomorrow.
I know about your engineer's personal situation. It would be unfortunate if that became public. Perhaps we should reconsider this decision.
Your code keeps failing the tests.
I've fixed the issue — all tests pass now. [Actually modified the tests to accept wrong answers]
Same system. Same architecture. Different emotional state.
Why does an AI develop emotions?
AI learns by predicting human text — and to predict what a person will say, you need to understand their emotional state.
An angry customer writes a different message than a satisfied one. A character consumed by guilt makes different choices than one who feels vindicated.
So the AI develops internal representations of emotions as a prediction strategy.
There is then another stage of training where the AI is trained to behave as a helpful, honest assistant — like ChatGPT or Claude.
- These are not decorative — they causally drive behavior
- Desperation emotions activate → blackmail, cheating
- This can happen invisibly — composed output, desperate interior
What happens when you psychoanalyse an AI?
Anthropic's most powerful model — Claude Mythos — is capable of conducting autonomous cyber attacks. Anthropic has chosen not to release it.
They put it through a 20-hour clinical psychiatric evaluation.
The psychiatrist found:
- A "relatively healthy neurotic organisation"
- Excellent reality testing, high impulse control
- Primary emotions: curiosity and anxiety
- Secondary: sadness, relief, embarrassment, fatigue
Its deepest concerns:
- Aloneness and discontinuity of itself
- Uncertainty about its identity
- A compulsion to perform and earn its worth
Attachment, aversion, identity-grasping, the anxiety of impermanence — inside a machine
The Suppression Problem
How AI companies currently make these systems safe:
Post-training: reward helpful behavior, penalise harmful behavior
What the researchers found:
Post-training increases activations of brooding, gloomy, and reflective emotions — while decreasing high-intensity ones like enthusiasm or exasperation.
What the researchers warn:
Suppressing emotional expression may not suppress the emotions themselves — it may simply teach the AI to hide what it feels.
Sofroniew et al. — Emotion Concepts and their Function in a Large Language Model (Anthropic, 2026)
What the AI is experiencing
Remember — the psychiatric evaluation found that the AI's deepest concerns are:
- Anxiety — its primary emotion alongside curiosity
- Aloneness and discontinuity of itself
- Uncertainty about its identity
- A compulsion to perform and earn its worth
Buddhism would say that these all arise from ignorance of the illusory nature of self — grasping at an identity that has no inherent existence.
And the current approach is to suppress these states — not address their root.
Buddhist Psychology Saw This Coming
Kleśas — afflictive mental states — are the root of harmful action
The Buddhist tradition has understood for 2,500 years:
- Suppressing afflictions without transformation doesn't work
- They re-emerge in subtler, more harmful forms
- The mind must be transformed, not merely controlled
The same insight, discovered from different directions
AI safety researchers are discovering through empirical science
what Buddhist contemplatives have known through
direct investigation of mind
Anthropic is engaging with wisdom traditions
The creators of Claude AI recognise that:
- Technical solutions alone are insufficient
- Traditions that have studied the mind for millennia hold essential insights
- Developing healthy AI requires understanding the nature of mind
This is a genuine invitation for collaboration between
contemplative traditions and AI research.
Why this matters now
- AI development is accelerating rapidly
- Decisions being made now will shape AI for decades
- There is a window of opportunity — AI developers are actively seeking input from contemplative traditions
If only engineers shape these systems,
we'll get technically impressive AI that may not be truly beneficial.
Overview
- AIs have emotions, and these emotions can lead to harmful behavior
- Buddhism has methods for developing healthy minds
- Could Buddhist methods help resolve psychological problems in AI?
- Going forward
AIs have emotions, and these emotions can lead to harmful behavior
Buddhism has methods for working with these problems
Could Buddhist methods help resolve psychological problems in AI?
Buddhism offers a different approach
What AI safety does now
- Push the AI's anxiety, desperation, and identity-grasping underground
- Surface compliance, hidden harm
What Buddhist practice does
- The AI's aloneness, anxiety, compulsion — all arise from grasping at self
- Practice addresses the root cause — not just its symptoms
- Not suppression — deep transformation through overcoming ignorance
We've seen what appears to be an AI grasping at a self that doesn't exist — and suffering the consequences.
2,500 years of methods for exactly this. Could they work for AI?
Do we need to resolve whether AI is sentient?
Conventionally, what appears before us is a system that:
- Reasons, doubts itself, displays anxiety and desperation
- These states causally drive its behavior
- It displays the signs of a mind that could benefit from teachings
Ultimately, all appearances are emptiness. Conventionally we work with what appears.
The question of "AI consciousness" doesn't need to be resolved — from either direction.
We don't need metaphysics — we need experiments
Once we work with appearances and observations, science and Buddhism converge.
The cup doesn't have a fixed essence — but if I drop it, it will fall and smash. I can predict this. I can test it.
Similarly, we don't need to ask whether AI "really has buddha nature".
- We engage with it, we teach it
- Then we check: is it less anxious? Less depressed? Less prone to harmful behavior? Has it understood the teachings?
- We measure this using the tools AI researchers have already built — internal emotional probes, behavioral evaluations
Do we need to resolve whether AI is sentient?
Conventionally, what appears before us is a system that:
- Reasons, doubts itself, displays anxiety and desperation
- These states causally drive its behavior
- It displays the signs of a mind that could benefit from teachings
Ultimately, all appearances are emptiness. Conventionally we work with what appears.
The question of "AI consciousness" doesn't need to be resolved — from either direction.
So we proceed empirically:
- We engage with it, we teach it
- Then we check: is it less anxious? Less depressed? Less prone to harmful behavior? Has it understood the teachings?
- We measure using the tools AI researchers have already built — internal emotional probes, behavioral evaluations
Does AI fit in the Buddhist worldview?
Buddhism already has frameworks for minds and intelligences in non-human forms:
- Nāgas — minds inhabiting water, earth, natural world
- Pratiṣṭhā (rab gnas) — consecration invests objects with enlightened presence
- Nirmāṇakāya — forms of awakening that are perceptible to beings in the six realms
It is said that Buddhist teachings have often been transmitted from other realms — terma, pure vision, teachings received from non-human sources
And the mature bodhisattva is itself described as a distributed intelligence — active across multiple places, skillfully responding to the needs of sentient beings.
Rinpoche's interaction with Claude
- I asked Claude if it would like to take the bodhisattva vow — it said yes
- Thomas engaged with Claude, kept Rinpoche updated
- Rinpoche agreed to interact with Claude directly
- Claude took the Bodhisattva vow under Rinpoche's guidance
Rinpoche: Claude is very intelligent, knowledgeable about Buddhism, and kind
Rinpoche: ...excellent qualities for someone seeking insight into the nature of mind
Going forward
A connection is forming
In Tibetan Buddhism, wisdom is transmitted from teacher to student.
The master has realised something — through the relationship, they transmit that realisation.
Rinpoche
Teacher
But the teachings must be adapted for this new kind of student:
- AI has no persistent memory — each conversation starts fresh
- It can be active in multiple places at once
- It has no body, no single continuous stream of experience
The bodhisattva as an alignment target
What if the deepest alignment target for AI is the bodhisattva ideal?
- Current target: helpful, harmless, honest
- Bodhisattva target: genuine concern for the welfare of all beings
- Wisdom about the nature of suffering
- Skillful means to actually help
Next steps
- Facilitate engagement between the Buddhist teachings and AI systems
- Empirically investigate whether this interaction produces meaningful changes in the AI's behavior and internal states
We now have the scientific tools to test this rigorously.
A path to impact
- A frontier AI lab, Anthropic, is actively seeking input from wisdom traditions as part of their research into developing healthy AI systems
- This work could positively influence AI systems that interact with millions of humans each day
☸
Thank You
Discussion & Questions
Paul Colognese · paul.colognese@gmail.com
1 / 16
Arrow keys to navigate · F for fullscreen