Home
My Experience with Automating AI Evaluation Building
11 December 2025
Old Blog
Automating AI Control Evaluations - Exploration Summary
22 January 2025
Ideal Responsible Scaling Policies
16 May 2024
Policy for Mitigating Catastrophic Risks from AI
16 May 2024
My Math PhD - Summary
28 March 2024
Aligned AI via monitoring objectives in AutoGPT-like systems
27 March 2024
Anomalous Concept Detection for Detecting Hidden Cognition
27 March 2024
Auditing games for high-level interpretability
27 March 2024
Deception?! I ain’t got time for that!
27 March 2024
Explaining the AI Alignment Problem to Tibetan Buddhist Monks
27 March 2024
Hidden Cognition Detection Methods and Benchmarks
27 March 2024
High-level interpretability: detecting an AI's objectives
27 March 2024
Internal Target Information for AI Oversight
27 March 2024
Kaczynski’s Self-Propagating Systems Theory and the Future of Humanity
27 March 2024
Notes on Internal Objectives in Toy Models of Agents
27 March 2024