Dynatrace Boosts Multi-Cloud Automation and Visibility

With deep expertise in AI, machine learning, and blockchain, Dominic Jainy has dedicated his career to exploring how advanced technologies can transform industries. Today, he shares his insights on the evolving landscape of multi-cloud management, where the convergence of massive data processing and intelligent automation is revolutionizing how enterprises maintain reliability and control costs. Our conversation will touch upon the practicalities of achieving a unified view across disparate cloud environments, the mechanics of automated issue remediation, and what the future holds for truly autonomous IT operations.

Dynatrace now offers a unified view across AWS, Azure, and Google Cloud to manage operational complexity. Beyond a single dashboard, how does this help platform teams manage reliability and costs? Could you share a specific example of how this prevents a common multi-cloud issue?

It’s about moving beyond just seeing everything to truly understanding it. A single dashboard is table stakes, but what platform teams desperately need is context. The unified view provides a real-time dependency map that shows precisely how a service running on Azure might be impacting a user-facing application hosted on AWS. This visibility is critical for preventing the classic multi-cloud “blame game.” For instance, we often see situations where a retail application on Google Cloud suddenly slows down. The Google Cloud team sees no issues, but what they can’t see is that a critical inventory microservice they call, which is running in Azure Kubernetes Service, is experiencing resource contention. Our platform immediately visualizes that cross-cloud dependency, pinpointing the Azure service as the root cause. This turns a potential multi-day troubleshooting nightmare, involving two separate teams burning time and money, into a precise, actionable insight that can be resolved in minutes.

Your platform includes automated remediation to resolve issues as they occur. Can you walk us through how this function would identify a cross-cloud performance risk, and what specific steps it takes to resolve it without manual intervention?

Certainly. The process is a seamless flow from detection to resolution, driven by our AI engine. First, the system detects an anomaly using its built-in health indicators—not just a technical metric like high CPU, but a business-level issue like degrading transaction performance. Imagine a financial service application with its frontend on AWS and its AI-powered fraud detection model running on Azure AI Foundry. The platform might notice a slowdown in transaction approvals. Instead of just firing off a vague alert, it uses the Smartscape dependency graph to trace the issue across the cloud boundary directly to the Azure AI service. The AI engine then performs a root-cause analysis and determines the service is overloaded. At this point, the automation kicks in. It triggers a pre-configured remediation workflow—perhaps automatically scaling up the Azure resources or rerouting traffic to a healthy, replicated instance. This entire sequence happens in seconds, often before the operations team is even aware a problem was brewing, completely eliminating the manual effort of diagnosis and response.

The Grail data lakehouse and Smartscape dependency graph are central to your multi-cloud operations. How do these components work together to provide context for an alert? Please describe a scenario where this combination provides insights that a standard monitoring tool might miss.

Think of Grail as the massive, long-term memory and Smartscape as the real-time consciousness of the system. Grail is our data lakehouse, which ingests and indexes immense volumes of telemetry and metadata from every corner of the AWS, Azure, and Google Cloud environments. Smartscape then uses this rich data to build and continuously update a live, topological map of every single dependency. A standard monitoring tool might send an alert saying, “CPU on an AWS instance is at 95%.” That’s data, but it’s not insight. In that same scenario, Grail provides the historical performance data, while Smartscape shows that this specific instance supports a critical, revenue-generating service that is, in turn, dependent on a database in Google Cloud that just received a major schema update. The platform can then correlate the CPU spike with the database change, providing the “why.” This transforms a generic alert into a highly contextualized insight: “The recent database update in Google Cloud has caused an unexpected load on this dependent AWS service, putting customer transactions at risk.” That’s a level of contextual understanding a standard tool, looking at clouds in isolation, would completely miss.

A key goal for some customers, like SBS Software, is achieving “fully autonomous operations.” What does this look like in practice for a cloud operations team day-to-day? Could you outline the key capabilities or metrics a team would need to measure its progress toward this goal?

For a cloud operations team, “fully autonomous operations” means a fundamental shift from being reactive firefighters to proactive, strategic innovators. Day-to-day, it means their mornings aren’t spent triaging a flood of overnight alerts. Instead, the platform has already identified, diagnosed, and, in many cases, resolved potential issues automatically. The team’s focus shifts to higher-value work, like optimizing architecture for cost or developing new features. To measure progress, they would track metrics like Mean Time to Resolution (MTTR), which should trend toward zero for common issues. Another key metric is the percentage of incidents resolved without any human intervention. And perhaps most importantly, they would measure “innovation velocity”—how quickly they can deploy new code and services—because they are no longer bogged down by a constant stream of operational toil. It’s about innovating more with less, just as SBS Software described.

Given the intense focus on cloud spending, your automated optimization function continuously assesses resource usage. What specific, actionable recommendations does it provide, and how does it help teams balance performance demands with cost efficiency across different cloud providers?

With cloud spending under such intense scrutiny, teams are constantly caught between ensuring great performance and controlling a spiraling budget. Our automated optimization function acts as a continuous financial advisor for their cloud estate. It doesn’t just show you a bill; it provides specific, actionable recommendations. For example, it might identify a set of virtual machines in AWS that are consistently over-provisioned for their actual workload and recommend a smaller, cheaper instance type. Conversely, it might flag a service in Azure that is under-provisioned and at risk of performance degradation during peak hours, recommending a scale-up to protect the user experience. The key is that it continuously analyzes real-world usage data, allowing teams to make data-driven decisions. This helps them confidently downsize resources without impacting performance and justifies spending increases where they are truly needed, ensuring every dollar spent across their multi-cloud deployment is delivering maximum value.

What is your forecast for multi-cloud management and observability over the next three to five years?

Looking ahead, I believe the focus will shift dramatically from passive observation to proactive, AI-driven action. We’re moving beyond simply monitoring multi-cloud environments to truly managing and optimizing them autonomously. In the next three to five years, the expectation will be that observability platforms don’t just find problems; they predict and prevent them. The sheer scale and complexity of applications running across AWS, Azure, and Google Cloud will make manual intervention completely unfeasible. Therefore, platforms that can intelligently automate remediation, optimize resource consumption for both cost and carbon footprint, and secure the entire software delivery lifecycle will become the standard. The winning solutions will be those that can turn an ocean of data into automated decisions that directly improve business outcomes, making the concept of “fully autonomous operations” not just an aspirational goal, but a competitive necessity.

Explore more

AI Redefines Software Engineering as Manual Coding Fades

The rhythmic clacking of mechanical keyboards, once the heartbeat of Silicon Valley innovation, is rapidly being replaced by the silent, instantaneous pulse of automated script generation. For decades, the ability to hand-write complex logic in languages like Python, Java, or C++ served as the ultimate gatekeeper to a world of prestige and high compensation. Today, that gate is being dismantled

Is Writing Code Becoming Obsolete in the Age of AI?

The 3,000-Developer Question: What Happens When the Keyboard Goes Quiet? The rhythmic tapping of mechanical keyboards that once echoed through every software engineering hub has gradually faded into a thoughtful silence as the industry pivots toward autonomous systems. This transformation was the focal point of a recent gathering of over 3,000 developers who sought to define their roles in a

Skills-Based Hiring Ends the Self-Inflicted Talent Crisis

The persistent disconnect between a company’s inability to fill open roles and the record-breaking volume of incoming applications suggests that modern recruitment has become its own worst enemy. While 65% of HR leaders believe the hiring power dynamic has finally shifted back in their favor, a staggering 62% simultaneously claim they are trapped in a persistent talent crisis. This paradox

AI and Gen Z Are Redefining the Entry-Level Job Market

The silent hum of a server rack now performs the tasks once reserved for the bright-eyed college graduate clutching a fresh diploma and a stack of business cards. This mechanical evolution represents a fundamental dismantling of the traditional corporate hierarchy, where the entry-level role served as a primary training ground for future leaders. As of 2026, the concept of “paying

How Can Recruiters Shift From Attraction to Seduction?

The traditional recruitment funnel has transformed into a complex psychological maze where simply posting a vacancy no longer guarantees a single qualified applicant. Talent acquisition teams now face a reality where the once-reliable job boards remain silent, reflecting a fundamental shift in how professionals view career mobility. This quietude signifies the end of a passive era, as the modern talent