With extensive expertise in artificial intelligence, machine learning, and blockchain, Dominic Jainy has a unique perspective on the forces reshaping modern software delivery. As AI-driven development accelerates release cycles to unprecedented speeds, he argues that the industry is at a critical inflection point. The conversation has shifted from a singular focus on velocity to a more nuanced understanding of system resilience. We explore the costly feedback loop created by shipping code too fast, the new metrics that define success beyond deployment frequency, and why runtime control is evolving from a simple safety net into a strategic tool for human oversight. This interview delves into how teams can build for stability, bake governance into their pipelines, and ultimately prove that their systems can withstand the constant pressure of change.
You mention a feedback loop where teams knowingly ship risky code due to deadlines, then spend excessive time fixing the resulting incidents. Can you share a practical example of this and detail the first steps a team can take to successfully break this costly cycle?
Absolutely, it’s a scenario I’ve seen play out far too often. Imagine a development team rushing to meet an end-of-quarter feature deadline. They know a particular component has some instability under heavy load, but the pressure from management is immense, so they push it to production anyway. The release goes out, and for a few hours, everything seems fine. Then, the support tickets start flooding in, and the system begins to lag. The very same engineers who were celebrating the launch are now pulled into a weekend-long firefight, patching the flawed code they knew was a risk. This creates a deeply demoralizing and expensive cycle. The first step to breaking it is to make that cost tangible. Don’t just talk about it; measure it. Start tracking the engineering hours spent on unplanned work and incident response versus planned feature development. When you can show a leader that 30-40% of their team’s capacity is being consumed by cleaning up messes from rushed releases, the conversation fundamentally changes. It shifts from “we must hit the deadline” to “we must ship with confidence.”
The article states that resilience, not deployment frequency, will become the new North Star. Beyond incident detection time, what specific, quantifiable metrics should leaders use to measure how well systems “absorb constant change,” and how would you advise they begin implementing them?
It’s a crucial shift in mindset. For years, we celebrated teams that could deploy ten times a day, but that metric is meaningless if half of those deployments introduce instability. To truly measure how well a system absorbs change, you need to focus on what happens after the code is live. A key metric is the ‘change failure rate’—what percentage of our deployments lead to a degraded service or require a hotfix? Another is ‘mean time to restore’ (MTTR), which isn’t just about finding the issue but about how quickly you can restore a stable service for the customer. A great starting point for implementing this is to automate post-deployment health checks. Don’t just push the code and assume it works. Build automated scripts that immediately verify critical user journeys are functioning as expected. This simple step moves the focus from the act of shipping to the impact of what was shipped, which is the heart of resilience.
You describe runtime control shifting from a “safety net” to a “strategic tool” for human oversight. Can you elaborate on this? What does a mature runtime strategy look like in practice, and how do tools like feature flags move from being nice-to-haves to non-negotiables?
Historically, runtime control was the emergency brake you pulled when things were already on fire—think of a full-system rollback. Today, it’s becoming the steering wheel, accelerator, and fine-tuned suspension system. A mature runtime strategy means you decouple the act of deploying code from the act of releasing features to users. Your code can be live in production but completely invisible, or ‘dark.’ Then, you can strategically expose it. For example, you can turn a new feature on for internal employees first, then for 1% of your customers in a specific region, and monitor performance every step of the way. This is where feature flags become non-negotiable. They provide the surgical precision needed to manage this process. With AI introducing changes at a dizzying pace, this granular, human-in-the-loop control at runtime is the only way to maintain stability and confidence.
Given that AI-driven workflows increase risk, you argue that governance can’t be an afterthought. What does it actually look like to “bake” compliance and auditability into the delivery lifecycle? Could you walk us through a few key steps for making this an automated, core function?
Baking in governance means transforming it from a manual, end-of-stage checklist into an automated, always-on part of the pipeline. One of the most effective first steps is implementing ‘policy as code.’ Instead of a document stating that all changes need two approvals, you write a script that the delivery pipeline automatically enforces. The pipeline will literally halt a deployment if it doesn’t meet the codified approval criteria. Another critical step is to ensure every action creates an immutable audit log. Who changed a feature flag? When? Who approved it? This information should be logged automatically, creating a trail that makes audits straightforward rather than a frantic scavenger hunt. This isn’t about bureaucracy; it’s about building trust and ensuring that even as velocity increases, risky or unauthorized changes are prevented before they can ever reach a customer.
The piece concludes, “Resilience is the new velocity.” For an IT leader convinced by this argument, what is the most critical first investment they should make to begin this shift—in tooling, culture, or process—and why is that the most impactful starting point?
For a leader ready to make this shift, the most critical first investment is in tooling that enables progressive delivery, with feature flags being the cornerstone. While culture and process are ultimately the goal, a tool-first approach is often the most impactful starting point because it immediately changes what is possible for the team. When you give engineers the ability to instantly roll back a single faulty feature without a full, stressful redeployment, you fundamentally change their relationship with risk. It provides a massive psychological safety net. That newfound safety allows them to experiment and move faster but with confidence. This practical, tool-driven capability makes the abstract cultural goal of “building for resilience” a daily reality, creating the momentum needed for broader process and cultural changes to take root.
What is your forecast for the evolution of the DevOps role as resilience and runtime control become central to the discipline over the next five years?
Over the next five years, I believe the DevOps role will pivot significantly from being a “pipeline architect” to a “production resilience engineer.” The primary focus will move away from pre-production concerns and squarely into the real-world environment where software interacts with users. The most valuable skills won’t just be about automating builds and tests, but about mastering observability, managing complex feature rollouts, and architecting systems that can gracefully degrade rather than fail catastrophically. These professionals will spend more of their time analyzing real-time system behavior and fine-tuning runtime controls than they will staring at build logs. They will become the strategic guardians of customer experience, ensuring that the incredible speed promised by AI doesn’t come at the cost of the stability that users demand.
