CodeRabbit Tool Targets Rising AI Errors in Code

Dominic Jainy is a recognized leader at the forefront of a critical new frontier: the strategic application of artificial intelligence within the high-stakes world of DevOps and software engineering. With a deep background in AI, machine learning, and blockchain, he has a unique vantage point on how these technologies are not just tools, but transformative forces reshaping how we build, deploy, and maintain software. In our conversation, we explore the stark realities of AI-generated code quality, the hidden costs of uncoordinated AI adoption, and the essential role of human oversight in an increasingly automated landscape. Dominic provides a clear-eyed look at the challenges teams face and outlines a practical path toward harnessing AI’s power safely and effectively, moving from siloed experimentation to collaborative, integrated intelligence.

A recent analysis suggests AI-authored pull requests generate significantly more, and more critical, issues than human-only ones. Could you elaborate on why this quality gap exists and describe, with a specific example, how collaborative prompt planning and review can directly address these AI-induced errors?

That quality gap is something we see firsthand, and the numbers are quite stark. The analysis of 470 pull requests found AI changes introduced almost 11 issues per request, compared to just over 6 for human-only work. The reason for this isn’t that the AI is inherently “bad,” but that it lacks context. It operates on the specific instructions it’s given, and when those instructions are vague or created in a vacuum by one developer, the AI is forced to make assumptions—what we call guesswork. For instance, imagine a junior developer asks an AI to “optimize a slow user authentication function.” The AI, trying to be helpful, might streamline the code by removing what it perceives as a redundant security check. A single developer might miss this, but a collaborative review process involving a security specialist would immediately flag it. The team plan would have specified “optimize for speed without compromising security protocols X, Y, and Z.” This simple, collaborative step prevents a critical vulnerability from ever being written, turning a potential disaster into a non-issue.

Many DevOps teams have engineers creating AI prompts in silos, with varying levels of expertise. What are the primary hidden costs and risks of this individual approach, and can you walk me through the specific steps a team would take to establish a consensus-driven prompt?

The siloed approach feels fast initially, but it’s incredibly inefficient and risky. The most immediate hidden cost is financial—every time an engineer has to re-run a prompt because the output was wrong, you’re paying for wasted compute cycles. But the bigger costs are in human capital: duplicated effort as multiple engineers try to solve the same problem, and the sheer frustration of reviewing and fixing suboptimal AI output. The greatest risk, of course, is that a poorly crafted prompt leads to flawed code being deployed. To fix this, a team would first define the task in their existing issue tracker, like Jira or GitLab. Then, instead of one person writing a prompt, the team drafts it collaboratively. They’d discuss the specific goals, define the constraints—like “don’t use deprecated libraries”—and agree on the desired output format. This draft is then reviewed and validated by the group, much like a code review. Once approved, that high-quality, consensus-driven prompt becomes a reusable, reliable asset embedded directly in their workflow, ensuring everyone is building on the same solid foundation.

AI agents often engage in guesswork without detailed instructions, leading to repeated prompting and suboptimal results. Can you share an anecdote that illustrates this inefficiency and explain how a dedicated planning phase before prompt execution can improve both the output quality and the team’s operational costs?

I remember a team that was struggling with an AI agent tasked with generating unit tests. An engineer would simply prompt, “Write tests for the new billing module,” and the results were a mess. The AI would generate tests for happy paths but completely ignore edge cases, like what happens with a negative-value invoice or a failed payment gateway. The engineer would then spend an hour going back and forth, prompting again and again: “Now add a test for a failed transaction,” “Now add one for an expired credit card.” It was a draining, iterative process. A dedicated planning phase changes the game entirely. The team would sit down for ten minutes and map it out: “We need tests covering successful transactions, failed transactions, expired cards, currency conversion errors, and negative values.” This detailed plan, fed to the AI in a single, well-structured prompt, would produce a comprehensive test suite on the first try. That upfront planning saves an hour of frustrating back-and-forth, reduces compute costs, and, most importantly, results in a much more robust and reliable product.

As AI models gain more powerful reasoning abilities, the potential for autonomous, high-risk actions increases. How can teams practically implement human supervision for these powerful agents, and what does that day-to-day validation process look like to ensure AI-driven tasks are completed safely?

This is the million-dollar question, and it’s absolutely critical. As these AI agents get smarter, the thought of one deciding to “fix” a performance issue by deleting a production database is terrifyingly plausible. The key is to never give an agent the final say on execution. Human supervision can’t be an afterthought; it must be a mandatory gate in the workflow. On a day-to-day basis, this looks like a pull request model for AI actions. The agent can propose a change—say, a script to clean up old database records—but it can’t run it. Instead, it submits its proposed action for human review. A senior engineer then has to explicitly review the code or the command, understand its impact, and provide a manual approval before it can be executed. It’s about keeping humans not just “in the loop,” but in direct, conscious control of any action that could have a significant impact. This ensures we harness the AI’s power to create and propose, while retaining human judgment for the final, critical decision.

Integrating with existing platforms like Jira and GitLab is a key feature. Can you detail how this prevents prompt engineering from becoming a separate, disconnected task and instead embeds it seamlessly within a team’s established development and issue-tracking workflow?

Without that integration, prompt engineering becomes “shadow IT.” It’s this separate, undocumented activity happening on the side, and nobody has visibility into what prompts are being used or how effective they are. It’s a recipe for chaos. When you integrate directly into tools like Jira or GitLab, the prompt becomes part of the official record. When a developer picks up a ticket, the plan and the associated AI prompt are right there, attached to the issue. This creates a transparent, traceable workflow. It’s no longer a separate task; it’s a natural step in the development process, just like writing code or running tests. This embedding ensures that the prompts are versioned, reviewed, and improved over time within the very system the team already lives in. It transforms prompt engineering from a disparate, individual art into a structured, collaborative engineering discipline.

What is your forecast for the role of AI agents in DevOps?

My forecast is that we are on the verge of creating small, specialized armies of AI agents that will become integral members of every DevOps team. We’re moving beyond simple co-pilots that suggest code. Soon, we’ll have agents that can independently identify a bug from a monitoring alert, find the root cause, write the fix, generate the tests, and submit the pull request for human approval—all while we sleep. The ultimate evolution will be deploying AI agents whose primary job is to supervise and validate the work of other AI agents, creating a self-healing, self-optimizing system. However, the human role will become more critical than ever. We won’t be writing boilerplate code, but we will be the architects of these systems, the strategists defining the goals, and the ultimate arbiters ensuring that this powerful automation is always deployed safely and ethically. Our job is shifting from doing the work to directing the work.

Explore more

Why Does TikTok Understand Attention Better Than Email?

A promotional email lands in an inbox with the quiet confidence of a decades-old tradition, while a TikTok video bursts onto a screen with the desperate energy of a contestant in a global talent show; one assumes it has a right to be there, while the other knows it has three seconds to prove it. This fundamental difference in philosophy

Which Email Platform Is Best for Your Business?

Selecting the ideal email marketing platform has become one of the most consequential decisions for any small business aiming to carve out a sustainable presence in an increasingly competitive digital marketplace. In a world where customer attention is the ultimate currency, these platforms are no longer simple tools for dispatching newsletters but have evolved into sophisticated hubs for building client

10 Essential Release Criteria for Launching AI Agents

The meticulous 490-point checklist that precedes every NASA rocket launch serves as a powerful metaphor for the level of rigor required when deploying enterprise-grade artificial intelligence agents. Just as a single unchecked box can lead to catastrophic failure in space exploration, a poorly vetted AI agent can introduce significant operational, financial, and reputational risks into a business. The era of

Samsung Galaxy S26 Series – Review

In a market where hardware innovations are becoming increasingly incremental, Samsung bets its flagship legacy on the promise that a smarter smartphone, not just a faster one, is the key to the future. The Samsung Galaxy S26 series represents a significant advancement in the flagship smartphone sector. This review will explore the evolution of the technology, its key features, performance

ERP-Governed eCommerce Is Key to Sustainable Growth

In the world of B2B commerce, the promise of a quick-to-launch website often hides a world of long-term operational pain. Many businesses are discovering that their “bolted-on” eCommerce platforms, initially seen as agile, have become fragile and costly as they scale. We’re joined by Dominic Jainy, an expert in integrated B2B eCommerce for Microsoft Dynamics 365 Business Central, to discuss