About - Paul Colognese

Hi, I'm a researcher who primarily works on AI safety/alignment - the task of ensuring the development of advanced AI goes well.

I'm currently researching how AIs might self-reflect on their goals or beliefs and the consequences of this. Additionally, I'm investigating whether Tibetan Buddhist methods can be used to develop healthy artificial minds. I gave a talk on this topic, and the slides are available here.

Here's some stuff I've worked on:

AI alignment/safety research:

I've spent time as an independent researcher trying to understand how to ensure that superintelligent AI systems remain aligned (it turns out this is hard). I explored whether an agent's goals must be represented on its substrate - for humans our nervous systems, for AI their neural network - and whether we could detect and verify alignment. I succeeded in predicting the goals of an AI mouse by reading its brainwaves. It's a start, OK?! See my older blog posts to read more.
I am the AI alignment lead for the Center for the Study of Apparent Selves, where we explore how insights from contemplative traditions apply to understanding diverse intelligences — from cells and cellular automata to future digital minds. Some of our work includes Care as the Driver of Intelligence and Selves as Perspectives.
I participated in MATS where I was mentored by Evan Hubinger (Anthropic). I researched approaches to detecting scheming and deception in AI systems.

AI evaluations:

I've built AI evaluations for the AI Security Institute and Anthropic. These control or sabotage evaluations measure whether AI agents deployed internally at AI labs are capable of subtly sabotaging critical safety infrastructure.
I automated my process of building AI evaluations so now I have achieved my dream of having an AI assistant that does my work while I relax on the beach. I'm exploring turning this into a scalable service. Reach out if you're interested!

AI threat modelling:

I've worked on threat modelling catastrophic risks from advanced AI, in collaboration with the AI Security Institute. This covered threat models ranging from AI assisted cyberattacks, to mass persuasion campaigns, authoritarian control, and even robot takeovers. This work (under NDA) was shared with the "Deep State", who reportedly had very sweet things to say.

Organization-building:

I dreamt up and helped found the London Initiative for Safe AI, an AI safety centre and ecosystem based in central London.
I've taken part in 5050 AI, a startup founder mentorship program run by VC firm, 50Y.

Mathematics:

Prior to all of that, I completed a PhD in mathematics in the field of Geometry, Topology, and Dynamical Systems. I studied the behavior of beautiful geometric spaces known as translation surfaces and showed that when a certain lens is applied, they behaved like hyperbolic surfaces that had been flattened to the point where their geometry broke and singularities formed on them.

Feel free to contact me at firstname.lastname@gmail.com or follow me on X.

Home