In the race to deploy artificial intelligence, many executives find their most ambitious projects stalling out not because of model limitations, but because of a far more foundational problem: data architecture. We’re joined by Dominic Jainy, an expert who operates at the critical intersection of database design and agentic AI systems. He argues that the legacy data stacks built for transactional apps are fundamentally unequipped for the demands of modern AI. Today, we’ll explore why traditional data plumbing is the biggest bottleneck to AI adoption, dissect the severe consequences of using outdated or siloed information, and map out the architectural principles and practical steps needed to build a data layer that’s truly ready for intelligent agents.
The article cites BCG and McKinsey, noting that while many AI obstacles are process-related, poor data plumbing is a major project drag. Could you walk us through a real-world example of how a legacy architecture with rigid schemas and data silos directly caused an AI project to stall?
Absolutely. I saw this firsthand with a major e-commerce company trying to build a sophisticated, real-time personalization agent. On paper, the goal was simple: use an AI agent to offer product recommendations based on a customer’s browsing history, purchase data, and even the semantics of their search queries. The problem was that their data lived in three completely separate, walled-off gardens. Customer profiles and transaction histories were in a classic, rigid-schema ERP system. The product catalog, with rich descriptions and metadata, was in a document store. And their brand-new vector store, used for semantic search, was yet another isolated system. The AI team spent the first six months not on AI, but on writing brittle ETL pipelines just to glue these pieces together. Every time a new data field was needed, the entire pipeline broke. It was a constant, frustrating cycle of translation and synchronization, and the latency was terrible. The project never reached its potential; it got stuck in the mud of data integration, a classic case where the ambition of the AI was completely hamstrung by the inflexibility of the data layer beneath it.
You mention “index drift” from nightly updates, where agents use yesterday’s data. Beyond just inaccuracy, what are the most severe compliance or trust-related consequences you’ve seen from this time lag? Please describe a scenario where this divergence unfolds and how it impacts business operations.
Index drift sounds like a minor technical issue, but its consequences can be catastrophic, especially in regulated industries. Imagine a wealth management firm using an AI agent to monitor client portfolios for compliance with investment regulations. The agent’s knowledge is powered by a search index and a vector store that are only refreshed from the operational database every 24 hours. On a Tuesday morning, a client’s risk profile is updated in the core system—perhaps they inherited a large sum of money or changed their risk tolerance. But until the nightly batch job runs, the AI agent is completely blind to this change. That afternoon, the agent, operating on yesterday’s stale data, approves a trade that is now wildly out of compliance with the client’s actual, updated profile. When the regulators come knocking, the firm can’t just say “the AI made a mistake.” The audit trail shows a clear failure of process. The trust from the client is shattered, and the firm faces heavy fines. That single time gap created a massive liability, and it all stemmed from an architecture that couldn’t keep the AI’s “brain” synchronized with reality.
The text describes “AI as a bolt-on” creating operational blind spots and audit gaps. When you see companies using this sidecar approach, what are the first signs that security and governance are failing? Could you share a specific metric or incident that typically exposes these flaws?
The first sign is almost always a question from a security or compliance officer that the engineering team simply cannot answer. It usually sounds something like this: “Can you provide a complete, end-to-end lineage report for this specific customer’s data point, showing every system and agent that accessed or modified it in the last 48 hours?” The team can pull logs from their operational database, but when it comes to the AI sidecar, the trail goes cold. The data was copied into the AI’s separate memory or vector store, and from that point on, it’s a black box. The security team has zero visibility. A key incident that blows this wide open is often a minor data leak through the AI. For example, a chatbot inadvertently revealing a piece of personally identifiable information. When the post-mortem happens, the security team’s inability to audit the AI’s internal operations exposes the governance failure. The metric that fails is “auditability.” They realize their core governance framework doesn’t extend to this bolt-on system, and that’s a terrifying blind spot.
The content highlights a trend of bringing retrieval closer to operational data, mentioning solutions like MongoDB’s Atlas Vector Search. When evaluating these unified stores versus a purpose-built vector database like Pinecone, what specific trade-offs in performance, cost, and complexity should an engineering leader consider?
This is a critical decision point for engineering leaders. The appeal of a unified store like MongoDB Atlas or using pgvector with Postgres is simplicity and reduced operational complexity. If your team is already standardized on one of these platforms, adding vector search capabilities feels like a natural, low-friction extension. It keeps your data in one place, which helps avoid the index drift we just discussed. The trade-off, however, is often in performance and advanced features. These “bolted-on” vector solutions may not offer the same sub-millisecond latency at massive scale that a purpose-built database like Pinecone or Weaviate is optimized for. When you choose a purpose-built vector database, you’re prioritizing raw speed and specialized retrieval capabilities. The trade-off there is complexity and cost; you now have another database to manage, secure, and keep synchronized. It reintroduces the silo problem. An engineering leader has to ask: is vector search a supporting feature for my application, or is it the core, mission-critical workload? The answer to that question dictates whether the convenience of a unified store outweighs the raw power of a specialized one.
The article positions multi-model databases as a solution, citing the LiveSponsors case study that cut query times from 20 seconds to 7 milliseconds. What were the key architectural changes that team made to achieve such a dramatic improvement when consolidating relational and document data?
That LiveSponsors example is a perfect illustration of the power of consolidation. Before, their loyalty engine was a textbook case of data sprawl. They had user accounts and transaction data in a traditional relational database, while the complex rules and tiers of their loyalty program were stored in a more flexible document database. To answer a simple question like, “Which of our top-tier members in Lisbon have earned enough points for a reward this month?” they had to perform a slow, painful join at the application level. The application would have to query the relational database, get a list of users, then make separate queries to the document database for each user to check their loyalty status. The 20-second query time was the result of all those network hops and data translations. The key architectural change was migrating both data models—the relational user data and the document-based loyalty rules—into a single multi-model database. By doing this, that complex join could be executed natively within one engine. The query went from a distributed, multi-step process to a single, optimized lookup. That’s how you get a drop from 20 seconds to 7 milliseconds; you eliminate the architectural friction.
One of the core principles is to co-locate state, policy, and compute. For a company starting this journey, what’s the first practical step toward co-locating these? Can you detail the initial technical decisions and the kind of pushback teams accustomed to separate systems might give?
The most practical first step is to pick one critical data model and move its security policy from the application layer down into the database itself. For instance, take your “customer” entity. Instead of having code in a dozen different microservices that says, “If the user’s role is ‘support’, only show the last four digits of the credit card,” you implement that logic as a row-level security policy directly in the database. The initial technical decision is to define and enforce that rule at the data source. The pushback is immediate and predictable. You’ll hear from application developers, “The database should just store data; all our business logic lives in the application layer.” You’ll hear from others, “This will make our local development and testing harder.” They’re used to a world where the database is a ‘dumb’ persistence layer. The way to overcome this is to demonstrate the immense security and efficiency gains. By enforcing the policy at the source, every application, every API, and every new AI agent that connects to that data is automatically compliant. You’ve created a single, trustworthy source of truth and eliminated redundant, and potentially inconsistent, security code across the entire organization.
What is your forecast for the database landscape over the next five years as agentic AI becomes more mainstream?
I forecast a radical convergence. The distinct lines we’ve drawn for decades between transactional databases, analytical warehouses, search indexes, and now vector stores are going to blur to the point of being unrecognizable. The idea of a standalone “vector database” will seem quaint; native vector and semantic search will become a standard, non-negotiable feature of any serious operational database, just as JSON support did a decade ago. The winning platforms will be multi-model by nature, seamlessly handling relational, document, graph, and vector data in a single, governed engine. They won’t just be places to store data; they will be the active, intelligent core of the enterprise, serving as the “agentic memory” for AI systems. This means they will have durable transactions, real-time eventing, and security policies built-in. The question will no longer be which five databases you need for your stack, but which single, unified data platform can serve as the central nervous system for both your applications and your intelligent agents.
