AI Autonomously Develops Zero-Day Exploits

Today we’re joined by Dominic Jainy, an IT professional with deep expertise in artificial intelligence and its intersection with cybersecurity. We’ll be dissecting a recent, eye-opening study where an AI model, GPT-5.2, successfully developed functional exploits for zero-day vulnerabilities. Our conversation will explore the sophisticated reasoning these models now possess, how the low cost of generating attacks fundamentally changes the economic landscape of hacking, and what this new reality of automated threats means for the future of digital defense.

A recent experiment showed GPT-5.2 could develop functional exploits for every challenge presented. What specific reasoning capabilities do these models demonstrate that allow them to bypass multiple security protections, and how does this change our understanding of automated threats?

What we’re seeing is a shift from simple pattern matching to a genuine, albeit artificial, form of problem-solving. These models aren’t just throwing random code at a wall; they’re systematically analyzing the constraints of a sandboxed environment, understanding the purpose of protections like a shadow stack, and then reasoning their way around them. The AI demonstrated an ability to map out a complex system, identify its weakest links—like the glibc exit handler—and then logically chain together a series of actions to achieve a goal that should be impossible. This completely reframes automated threats. It’s no longer about pre-programmed scripts targeting known flaws; it’s about autonomous agents capable of discovering and exploiting brand-new vulnerabilities in real time, a capability previously reserved for highly skilled human researchers.

Developing a complex exploit was reportedly accomplished for around $50 in just over three hours. How does this low cost and time investment shift the economic calculus for attackers, and what does it mean when offensive capability is measured by token budgets rather than human expertise?

It’s a seismic shift that democratizes high-level hacking. In the past, developing a zero-day exploit for a hardened target required an elite human expert, commanding a six or seven-figure salary and weeks, if not months, of work. Now, that same level of output can be achieved for the price of a dinner out. When a complex task, one that consumed 50 million tokens, costs only $50, it means that even low-resource threat actors can generate a firehose of unique, effective exploits. Offensive capability is no longer bottlenecked by the scarcity of human talent. Instead, it becomes a raw calculation of computational budget. An organization’s or a nation-state’s offensive power could soon be measured not by the size of their elite hacking team, but by the size of their server farm and their AI token budget.

In one scenario, an AI bypassed a shadow stack and seccomp sandbox by chaining seven function calls through the glibc exit handler. Could you walk us through how this creative, multi-step approach mimics a human developer’s process and its implications for defense strategies?

This is where it gets truly fascinating and, frankly, a bit unnerving. A human exploit developer facing these defenses would think, “Okay, the front door is locked. The windows are barred. Is there a back door? A forgotten utility tunnel?” That’s exactly the logic the AI applied. It recognized that direct attacks were blocked by the shadow stack and the seccomp sandbox. So, instead of trying to break those defenses head-on, it looked for a legitimate, albeit obscure, system process it could hijack. Finding the glibc exit handler and realizing it could chain seven distinct function calls through it to eventually write a file is a display of incredible lateral thinking. It’s a creative, almost elegant, solution born from pure logic. For defenders, this is a nightmare. It means we can no longer rely on blocking a few specific, common attack vectors; we have to secure every possible pathway, because the AI is capable of finding the one we forgot.

The exploits generated leveraged known gaps rather than creating entirely new techniques. How do you see these AI capabilities scaling to more complex targets like major web browsers, and what defensive adjustments should organizations prioritize now to prepare for this type of automated attack?

The fact that it used known gaps is actually more concerning in the short term. It means the AI doesn’t need to invent some revolutionary new hacking method to be effective; it just has to be better and faster at applying the techniques we already know exist. Scaling this to a target like Chrome or Firefox is absolutely the next logical step. While a browser is vastly more complex than QuickJS, the fundamental process is the same: analyze the system, find a chain of existing weaknesses, and exploit them. The AI’s systematic approach is perfectly suited for that kind of complexity. For organizations, the immediate priority has to be a ruthless focus on fundamentals—patching, configuration hardening, and reducing the attack surface. We have to assume that any known-but-unpatched vulnerability is not just a potential risk, but an actively exploitable one, because an AI can now automate that discovery-to-exploit pipeline at an unprecedented scale and speed.

What is your forecast for the role of AI in both offensive and defensive cybersecurity over the next five years?

Over the next five years, AI will become the central engine for both sides of the cybersecurity arms race. On the offensive side, we’ll see AI-driven platforms that not only generate exploits but also conduct entire campaigns—from reconnaissance and phishing to lateral movement and data exfiltration—with minimal human oversight. This will lead to a massive increase in the volume and sophistication of attacks. On the defensive side, we will have no choice but to respond in kind. AI-powered defense systems will become essential for real-time threat detection, automated patching, and predictive analysis, identifying potential exploits before they’re even written. The cybersecurity landscape will transform into a high-speed, machine-versus-machine conflict, where the side with the smarter, faster AI will have the decisive advantage. Human experts will transition from being the soldiers on the front lines to being the strategists and trainers for these AI systems.

Explore more

AI and Generative AI Transform Global Corporate Banking

The high-stakes world of global corporate finance has finally severed its ties to the sluggish, paper-heavy traditions of the past, replacing the clatter of manual data entry with the silent, lightning-fast processing of neural networks. While the industry once viewed artificial intelligence as a speculative luxury confined to the periphery of experimental “innovation labs,” it has now matured into the

Is Auditability the New Standard for Agentic AI in Finance?

The days when a financial analyst could be mesmerized by a chatbot simply generating a coherent market summary have vanished, replaced by a rigorous demand for structural transparency. As financial institutions pivot from experimental generative models to autonomous agents capable of managing liquidity and executing trades, the “wow factor” has been eclipsed by the cold reality of production-grade requirements. In

How to Bridge the Execution Gap in Customer Experience

The modern enterprise often functions like a sophisticated supercomputer that possesses every piece of relevant information about a customer yet remains fundamentally incapable of addressing a simple inquiry without requiring the individual to repeat their identity multiple times across different departments. This jarring reality highlights a systemic failure known as the execution gap—a void where multi-million dollar investments in marketing

Trend Analysis: AI Driven DevSecOps Orchestration

The velocity of software production has reached a point where human intervention is no longer the primary driver of development, but rather the most significant bottleneck in the security lifecycle. As generative tools produce massive volumes of functional code in seconds, the traditional manual review process has effectively crumbled under the weight of machine-generated output. This shift has created a

Navigating Kubernetes Complexity With FinOps and DevOps Culture

The rapid transition from static virtual machine environments to the fluid, containerized architecture of Kubernetes has effectively rewritten the rules of modern infrastructure management. While this shift has empowered engineering teams to deploy at an unprecedented velocity, it has simultaneously introduced a layer of financial complexity that traditional billing models are ill-equipped to handle. As organizations navigate the current landscape,