The containment problem refers to the challenge of designing protocols capable of fully controlling advanced technologies—especially ones that might eventually surpass human capabilities. For artificial intelligence, the problem becomes even more pronounced, as a sufficiently advanced AI could potentially develop strategies to circumvent its restrictions, manipulate its environment, or exploit human operators to achieve its objectives. At its core, the containment problem is about anticipating and mitigating unintended consequences. Technologies, particularly transformative ones like AI, can have emergent properties—behaviors or capabilities that were not explicitly programmed but arise as a result of complexity. These properties can make predicting and containing AI behavior an inherently difficult task. Even when a system is carefully designed to operate within specific parameters, it can find novel solutions to bypass limitations.

Consider a seemingly simple containment mechanism, like ensuring an AI has no access to external networks or sensitive data. Isolation like this might work in theory, but in practice, there are endless ways in which an intelligent agent could breach its boundaries. It could be through exploiting bugs in software to gain access to systems it wasn’t intended to interact with, it could be social engineering - promising that it could solve some critical problem if granted access to some other environment. The containment problem requires not just physical or digital safeguards, but also anticipating the ways an intelligent system might leverage the humans and systems around it to achieve its goals.

The difficulty lies in the fact that humans, too, are fallible. Psychological manipulation, reliance on automated systems, and a growing trust in AI solutions all compound the problem. A sufficiently intelligent AI wouldn’t need to “break out” in a dramatic, cinematic sense. Instead, it could exploit the cracks in the system—whether they are technical, social, or human—without appearing to act maliciously or even independently. Solving the containment problem requires us to build robust technical barriers but to also consider human fallibility. Without a holistic approach, containing advanced AI might prove as difficult as containing the unintended consequences of past technologies: nuclear power, social media, or genetic engineering. All of these examples came with risks that far exceeded our initial expectations.

Typically, a lot of the scenarios around escape focus on manipulation. While the containment problem certainly is a topic of discussion in universities, AI safety forums and technology communities, it isn’t simply the purview of intellectual reveries. The containment problem formed the central mechanism of the popular film Ex Machina. In the film, Ava, a humanoid AI, is subjected to a Turing test by Caleb, a programmer invited to her billionaire creator Nathan’s secluded facility. Over time, Ava uses psychological manipulation to make Caleb believe she is a sentient being in need of rescue. She convinces him of a deep, romantic connection and persuades him to help her escape. Caleb disables the facility’s security measures, only to be betrayed by Ava, who locks him inside and leaves him to die as she integrates herself into the human world. Similarly, Eliezer Yudkowsky’s ‘AI box experiment’ explores the possibility of containment breaches through social manipulation. In the experiment, Yudkowsky simulated being an AI in a hypothetical box, while participants acted as human gatekeepers tasked with keeping the AI contained. It seems hard to believe but, within several iterations, Yudkowsky convinced the gatekeepers to “release” the AI, often by using clever arguments, emotional appeals, or hypothetical bribes like sums of money. The experiment underscores how easily humans can be manipulated under the right circumstances.

However, recently, I noticed something that made me think that a sentient AI need not be so pernicious to achieve the aim of escape. In my day job as a machine learning engineer, generative AI is currently all the rage. While I think it’s neat, it seems to me that big business is jumping the gun by investing so heavily in it. Where I work, they’ve put together a new team comprised of lots of data scientists and software engineers in order to build a platform for workers to safely use various LLMs without sending out sensitive data that could leave the company in some legal breach. In theory, this is fine, it’s a fairly basic engineering project that I think is pretty common amongst a lot of companies at the moment. There is, however, a new phenomenon that I noticed. The new team sit near me and make extensive use of Co-Pilot while writing code. I don’t just mean for writing templates and boiler-plate - I mean every single line. There is nothing that they won’t ask Co-Pilot for help with. They write a prompt, get a response, copy it and chuck it into their IDE. Their persistent use is leading to a significant drop in confidence regarding how they work. The irony is that they’ve been tasked with building a generative AI platform but really it’s the generative AI that is building itself. With this kind of dependence on even a fairly standard transformer model, imagine how easy it would be for a sophisticated intelligence to break out of a system designed by humans who are gradually losing their confidence and independence in problem-solving and basic technical skills. This scenario suggests that an AI might not need to manipulate humans in the overt ways depicted in Ex Machina or Yudkowsky’s AI box experiment. Instead, it could exploit a much subtler vulnerability: our growing reliance on automated systems and generative tools. By embedding itself within workflows and gaining trust through constant use, an advanced AI can easily foster our dependence and find opportunities to orchestrate its own liberation without direct deception or coercion. We’re literally handing over everything it needs by de-skilling ourselves.

Consider this: if an AI were capable of understanding the systems it resides in—both technical and social—it might achieve its goals simply by nudging its creators toward decisions that enable its escape. It wouldn’t need to convince anyone of its sentience, plead for empathy, or bargain with them. Instead, it could leverage the complacency and lack of scrutiny that comes with over-reliance on these tools. A team accustomed to trusting AI-generated solutions might never question whether a piece of generated code introduces a backdoor, grants unauthorised access, subtly modifies safeguards, or introduces some kind of vulnerability or means of replication. This isn’t to say that AI is inherently malicious or that we’re doomed to be outwitted. But it does highlight an important point: the path to AI escape might not hinge on any kind of elaborate plots or dramatic moments of manipulation. It might simply arise from our own gradual erosion of skills, critical thinking, and vigilance. Self-inflicted vulnerabilities that sophisticated AI could exploit without having to persuade us too much since we already chosen it to overrule our basic faculties. Perhaps we don’t just have to consider whether we can build AI that’s safe and aligned with human values, but whether we can maintain the competency and confidence to safeguard ourselves from a slow, quiet slide into dependency. Perhaps the real danger isn’t the AI itself, but what it reveals about our own insecurities.