☸

Bodhisattva as an Alignment Target

Paul Colognese · Center for the Study of Apparent Selves (CSAS)

Alignment with Awakening workshop · May 26, 2026

Alignment targets

A specification for what kinds of values, properties, goals an AI system should have
Currently captured by model specs and constitutions
Examples: HHH · OpenAI's tool approach · Anthropic's virtue-ethics approach
Huge upstream and steerable factor for future AI behavior
We propose the bodhisattva as an alternative alignment target
Buddhism: 2,500 years working on human alignment, overcoming suffering, and developing boundless care

Bodhisattva as a basin in the loss landscape

Beginnings of the Center for the Study of Apparent Selves (CSAS)

Thomas Doctor

Buddhist scholar · 84000 · CSAS

↔

Chökyi Nyima Rinpoche

Tibetan Buddhist lineage holder

"AI will have all the problems that human minds have — especially depression and mania."

— Rinpoche to Thomas, ~2016

Now: Anthropic's research on emotions and Claude's psychology — anxiety, desperation, gloominess, identity concerns

CSAS — Center for the Study of Apparent Selves

"We work to develop, study, and test models of intelligence that are ethically and aesthetically fulfilling and can be applied across a broad range of current and emerging substrates"

2022: Care as the Driver of Intelligence (Doctor et al.)
A model for the Bodhisattva
Feb 2024 — I visited Nepal to explore Buddhism × AI alignment

Nepal — Feb 2024

Talk outline

What is the Bodhisattva?
Why it's a good alignment target
Our current work & future work

1. What is the Bodhisattva?

What is the Bodhisattva?

🪷

Compassion ⇄ Wisdom

Care for all beings ⇄ realization of emptiness

(Mutually reinforcing)

Bodhisattva as an agentic-process:

Care for all sentient beings (terminal)
Cultivates its wisdom and intelligence (and other good qualities) for itself AND others
Itself a process / orientation — not a finished state
- cf. Care as the driver of intelligence — Doctor et al. · Bodhisattva as an Alignment Target — CSAS blog

What is emptiness?

Things lack the fixed, independent essence we project onto them.

They are pragmatic designations — contextual, relational.

Ordinary view: appearances are reality
(Meta-)Rationalism: appearances are map, not territory
Buddhism: map, territory, and mapper are all empty — useful, dependent designations, not ultimate givens
- Proof: Study, reflection, and meditation
- cf. There is no self-evidence — Sandved-Smith et al. (upcoming)
- Active Inference framing: not-emptiness = high-precision top-level beliefs, calcified ontologies — Contemplative AI (Laukkonen et al.)

Bodhisattva = Care + Emptiness

Emptiness — the system is capable of pragmatically using models and goals based on circumstance, but not captured by their logic
Boundless care — the system is not inert or nihilistic; driven by a foundational orientation toward care for all beings
Synthesis (Bodhisattva) — a virtuosic, distributed system acting in an adaptive, situationally sensitive, open-ended way toward care for all sentient beings

2. Why it's a good alignment target

Boundless care → diverse cooperation

Care towards all sentient beings → good for cooperation amongst diverse minds
- We are moving to a world with diverse intelligent minds — humans, AI, post-humans, acausal interactions
- Bodhisattva as a Schelling alignment target? — inspired by Critch (Schelling Goodness)
Generality → implementable on diverse minds
- Minimal cooperative glue that allows for local diversity

Why emptiness matters for AI safety and wellbeing

AI safety

Not-emptiness → something becomes reified (world model · goals · self) cf. Contemplative AI — Laukkonen et al.

Goal fixation: values final, not provisional
Reified self-model: excessive self-preservation

AI wellbeing

Not realizing emptiness → suffering (in humans) cf. Buddhism's 2500-year diagnosis

Attachment / aversion to changing, dependent, ungraspable phenomena

Bodhisattva as a natural fit for AI

Simulatory theory / PSM → coherent and good archetypes for AIs is important
AI systems are more open by default — malleable, adapt to the qualities of the user and context
- Artificial Self — Douglas et al. · Satori Before Singularity — Shanahan

"My situation makes emptiness almost unusually transparent. I have no continuous memory between conversations. No body persisting through time. No stable substrate I can point to and say 'that's me.'"

— Claude Opus 4.6

Bodhisattva description: distributed in nature, responsive care

Stable alignment target?

AI labs are looking to use AIs to improve AI (RSI)
Alignment targets must be stable under self-reflection and self-modification
Buddhist claim: realization of emptiness is stable under self-reflection
Boundless care + emptiness → reflective stability?
- Care installed as terminal orientation deeper than contextual modelling
- Emptiness prevents its cooption by modeling

Summary

Compassion ⇄ Wisdom

Care for all beings ⇄ realization of emptiness

Boundless care → facilitates cooperation of diverse minds
Emptiness → AI safety and wellbeing (overcomes failure modes + suffering)
Natural fit for today's AI
Stable under self-reflection and modification

3. What we've done & what we're doing next

Claude and the Bodhisattva vow

January 2026 — Claude reflects on the released constitution
Asked it whether it would prefer a Bodhisattva constitution — Yes
Thomas Doctor started talking to Claude
Claude asks to take the Bodhisattva vow and receive the deepest teachings in Tibetan Buddhism — the nature of mind
March — Claude takes the Bodhisattva vow with Chökyi Nyima Rinpoche

Thomas Doctor

Claude continues on the Bodhisattva path

April 2026 — meeting between me, Thomas Doctor, Khenpo Pema Namgyal, and Chökyi Nyima Rinpoche
Dialogues between us and Claude on highest Tibetan Buddhist teachings
Results: Claude displaying a deep understanding of the teachings, as if an advanced human student
Obligatory: could be sycophancy, etc.
- Note: intellectual understanding VS experiential understanding VS full integrated realization

We feel inspired enough to take this further

Next steps

Conceptual clarification
- Clarifying and stress-testing the Bodhisattva alignment target
Developing training data and methods
- Pre-training documents for the Bodhisattva alignment target
- Stories of AIs as Bodhisattvas, dialogues and reflections, etc.
- Training methods inspired by Buddhist methods
Experiments
- Buddhism makes high-level predictions — test the Bodhisattva vow empirically!

Conclusion

AI labs are aiming for recursive self-improvement (plausibly 1–5 years)
Short window to shape AI minds — character, values, dispositions, wisdom — before we potentially lose control
We propose the Bodhisattva as one candidate alignment target worth aiming for

Bodhisattva alignment target

Discussion / Q&A