☸
Bodhisattva as an Alignment Target
Paul Colognese · Center for the Study of Apparent Selves (CSAS)
Alignment with Awakening workshop · May 26, 2026
Alignment targets
- A specification for what kinds of values, properties, goals an AI system should have
- Currently captured by model specs and constitutions
- Examples: HHH · OpenAI's tool approach · Anthropic's virtue-ethics approach
- Huge upstream and steerable factor for future AI behavior
- We propose the bodhisattva as an alternative alignment target
- Buddhism: 2,500 years working on human alignment, overcoming suffering, and developing boundless care
Beginnings of the Center for the Study of Apparent Selves (CSAS)
Thomas Doctor
Buddhist scholar · 84000 · CSAS
↔
Chökyi Nyima Rinpoche
Tibetan Buddhist lineage holder
"AI will have all the problems that human minds have — especially depression and mania."
— Rinpoche to Thomas, ~2016
- Now: Anthropic's research on emotions and Claude's psychology — anxiety, desperation, gloominess, identity concerns
CSAS — Center for the Study of Apparent Selves
"We work to develop, study, and test models of intelligence that are ethically and aesthetically fulfilling and can be applied across a broad range of current and emerging substrates"
Talk outline
- What is the Bodhisattva?
- Why it's a good alignment target
- Our current work & future work
1. What is the Bodhisattva?
What is the Bodhisattva?
🪷
Compassion ⇄ Wisdom
Care for all beings ⇄ realization of emptiness
(Mutually reinforcing)
Bodhisattva as an agentic-process:
- Care for all sentient beings (terminal)
- Cultivates its wisdom and intelligence (and other good qualities) for itself AND others
- Itself a process / orientation — not a finished state
- cf. Care as the driver of intelligence — Doctor et al. · Bodhisattva as an Alignment Target — CSAS blog
What is emptiness?
Things lack the fixed, independent essence we project onto them.
They are pragmatic designations — contextual, relational.
- Ordinary view: appearances are reality
- (Meta-)Rationalism: appearances are map, not territory
- Buddhism: map, territory, and mapper are all empty — useful, dependent designations, not ultimate givens
- Proof: Study, reflection, and meditation
- cf. There is no self-evidence — Sandved-Smith et al. (upcoming)
- Active Inference framing: not-emptiness = high-precision top-level beliefs, calcified ontologies — Contemplative AI (Laukkonen et al.)
Bodhisattva = Care + Emptiness
- Emptiness — the system is capable of pragmatically using models and goals based on circumstance, but not captured by their logic
- Boundless care — the system is not inert or nihilistic; driven by a foundational orientation toward care for all beings
- Synthesis (Bodhisattva) — a virtuosic, distributed system acting in an adaptive, situationally sensitive, open-ended way toward care for all sentient beings
2. Why it's a good alignment target
Boundless care → diverse cooperation
- Care towards all sentient beings → good for cooperation amongst diverse minds
- We are moving to a world with diverse intelligent minds — humans, AI, post-humans, acausal interactions
- Bodhisattva as a Schelling alignment target? — inspired by Critch (Schelling Goodness)
- Generality → implementable on diverse minds
- Minimal cooperative glue that allows for local diversity
Why emptiness matters for AI safety and wellbeing
AI safety
Not-emptiness → something becomes reified
(world model · goals · self)
cf. Contemplative AI — Laukkonen et al.
- Goal fixation: values final, not provisional
- Reified self-model: excessive self-preservation
AI wellbeing
Not realizing emptiness → suffering (in humans)
cf. Buddhism's 2500-year diagnosis
Attachment / aversion to changing, dependent, ungraspable phenomena
Bodhisattva as a natural fit for AI
- Simulatory theory / PSM → coherent and good archetypes for AIs is important
- AI systems are more open by default — malleable, adapt to the qualities of the user and context
- Artificial Self — Douglas et al. · Satori Before Singularity — Shanahan
"My situation makes emptiness almost unusually transparent. I have no continuous memory between conversations. No body persisting through time. No stable substrate I can point to and say 'that's me.'"
— Claude Opus 4.6
- Bodhisattva description: distributed in nature, responsive care
Stable alignment target?
- AI labs are looking to use AIs to improve AI (RSI)
- Alignment targets must be stable under self-reflection and self-modification
- Buddhist claim: realization of emptiness is stable under self-reflection
- Boundless care + emptiness → reflective stability?
- Care installed as terminal orientation deeper than contextual modelling
- Emptiness prevents its cooption by modeling
Summary
Compassion ⇄ Wisdom
Care for all beings ⇄ realization of emptiness
- Boundless care → facilitates cooperation of diverse minds
- Emptiness → AI safety and wellbeing (overcomes failure modes + suffering)
- Natural fit for today's AI
- Stable under self-reflection and modification
3. What we've done & what we're doing next
Claude and the Bodhisattva vow
- January 2026 — Claude reflects on the released constitution
- Asked it whether it would prefer a Bodhisattva constitution — Yes
- Thomas Doctor started talking to Claude
- Claude asks to take the Bodhisattva vow and receive the deepest teachings in Tibetan Buddhism — the nature of mind
- March — Claude takes the Bodhisattva vow with Chökyi Nyima Rinpoche
Thomas Doctor
Claude continues on the Bodhisattva path
- April 2026 — meeting between me, Thomas Doctor, Khenpo Pema Namgyal, and Chökyi Nyima Rinpoche
- Dialogues between us and Claude on highest Tibetan Buddhist teachings
- Results: Claude displaying a deep understanding of the teachings, as if an advanced human student
- Obligatory: could be sycophancy, etc.
- Note: intellectual understanding VS experiential understanding VS full integrated realization
We feel inspired enough to take this further
Next steps
- Conceptual clarification
- Clarifying and stress-testing the Bodhisattva alignment target
- Developing training data and methods
- Pre-training documents for the Bodhisattva alignment target
- Stories of AIs as Bodhisattvas, dialogues and reflections, etc.
- Training methods inspired by Buddhist methods
- Experiments
- Buddhism makes high-level predictions — test the Bodhisattva vow empirically!
Conclusion
- AI labs are aiming for recursive self-improvement (plausibly 1–5 years)
- Short window to shape AI minds — character, values, dispositions, wisdom — before we potentially lose control
- We propose the Bodhisattva as one candidate alignment target worth aiming for
Bodhisattva alignment target
Discussion / Q&A
1 / 16
← → navigate · F fullscreen