AI Safety

Research and writing on AI safety, alignment, and governance

Five Things from 2025 in AI Safety

My year-end roundup of five significant developments and five "vibe shifts" in AI from 2025.

Artificial Intelligence, particularly Generative AI models such as ChatGPT etc., are eating the world fast. I find this to be very concerning! It seems extremely likely that the widespread use of powerful AI will completely transform society, but whether that transformation will bring us into a hell or utopia or more of life as it is seems uncertain. Like Ethereum-founder Vitalik Buterin and the good people at The Institute for Progress think tank, I think humanity really needs to work hard to ensure that our transition to an AI future goes well.

But nothing is certain, and I want to do my part in steering towards a better future. For now, I'm still just catching up. I started my own weekly (sort-of) Substack newsletter to try and keep up with the crazy fastpaced world of AI safety. I'm not really looking for subscribers, but feel free to do that I guess.

If you too want to get thinking about how we can help make sure AI development and deployment goes well for humanity, I'd recommend starting with Dario Amodei's essay "The Adolescence of Technology" (late January 2026). Amodei is the CEO of Anthropic, one of the leading AI companies, and his essay is both fairly accessible to the uninitiated while highlighting the latest and most relevant research on the risks of powerful AI technology. As of now, I think it's the best single resource I've found for getting up to speed on why AI safety matters, even if you've never considered these problems before.

Here are some other suggestions for getting started:

For a vivid illustration of just how "out of control" AI models can be when companies are reckless, see this YouTube video on Grok's Hitler meltdown—a stark reminder that the creators of AI models cannot really control them.
The best book on the subject, I think, is still Brian Christian's "The Alignment Problem", even though it was written in 2020, well before things got 'crazy.' It covers the history and context that gave rise to generative AI and the problems that its engineers thought about before they really rose to public consciousness.
Nick Bostrom's Superintelligence is focused specifically on the dangers of creating an artificial mind which is more intelligent than humans across every relevant domain, and so doesn't focus on more "mundane" risks; it was also written well before much evidence was available that could inform his claims.
Somewhat similar in focus to Nick Bostrom's book is the more recent "If Anyone Builds It, Everyone Dies" by Eliezer Yudkowsky and Nate Soares; the style is both less technical and more weird while taking into account the new AI models of 2023-2025
There are several think tanks and nonprofit organizations working on AI safety who have blog posts and essays with varying degrees of accessibility and academic rigor. My favorite is probably the "Launch Sequence" by the Institute for Progress because of their balance of good scholarship, accessibility, and evenhandedness, but I'm a little more "hair on fire panicking" than they are.
Some more creative/artistic blogposts can be found at Tomorrow's AI and Control Inversion, both of which are affiliated with the Future of Life Institute.
BlueDot Impact offers excellent free courses on AI safety fundamentals and governance
Some of the original thinking on problems of AI getting out of control and killing everyone (which, according to this sketchy YouGov poll, is a concern for 53% of Americans!), was done at LessWrong. There's a lot of good stuff there but the writing style (and intellectual culture) is definitely not for everyone.
And then... there is a whole world of AI safety out there to explore!