Seagate Proposes NVMe Hybrid Arrays for Efficient AI Data Storage

Article Highlights
Off On

In an era where artificial intelligence (AI) workloads are rapidly expanding, the demand for efficient and cost-effective data storage solutions is more pressing than ever. Seagate has proposed the use of NVMe hybrid flash and disk drive arrays as a means to meet these demands, emphasizing the financial impracticality of relying solely on SSDs for large datasets within most enterprises. Seagate suggests that incorporating a parallel NVMe interface, rather than the traditional serial SAS/SATA interface, could streamline AI storage by eliminating the need for HBAs, protocol bridges, and additional SAS infrastructure. This novel approach would leverage a unified NVMe driver and OS stack, enhancing the efficiency of both hard drives and SSDs working together and removing the need for separate software layers to manage the storage devices.

Advancing AI Storage Efficiency with GPUDirect Protocols

A crucial aspect of Seagate’s NVMe hybrid approach is the benefit provided by GPUDirect protocols, which facilitate direct GPU memory-to-drive access without involving a storage array controller CPU, thus bypassing the memory buffering delays typically encountered. This approach can also capitalize on existing NVMe-over-Fabrics infrastructure, allowing for the seamless scaling of distributed AI storage architectures within high-performance data center networks. However, these NVMe benefits are less impactful at the individual HDD level, where access latency is primarily determined by mechanical seek times rather than controller response speed. In contrast, SSDs benefit significantly from NVMe due to their inherently lower latency, which is a result of fast electrical connections to data-storing cells and the absence of mechanical seek times.

Seagate envisions a hybrid drive array that connects both SSDs and HDDs through NVMe to a GPU server, thereby enhancing the aggregate connectivity speed despite the mechanical delays inherent to HDDs. This setup, integrated with an RNIC like the BlueField-3 smartNIC/DPU, could efficiently transmit data using RDMA to a GPU server, linking directly to the server’s memory. Such integration aims to reduce overall storage-related latency in AI workflows, eliminate the overheads associated with legacy SAS/SATA systems, and facilitate seamless scaling using NVMe-oF solutions.

Demonstration and Real-world Applications

At Nvidia’s recent GTC conference, Seagate demonstrated this concept with a hybrid array of NVMe HDDs and SSDs, utilizing the BlueField-3 frontend and MinIO’s AIStore v2.0 software. This demonstration showcased several benefits, including reduced latency in AI workflows, the elimination of legacy SAS/SATA overheads, seamless scaling through NVMe-oF integration, and dynamic caching and tiering capabilities facilitated by AIStore. The proof of concept highlighted the potential of this hybrid approach to improve AI model training performance, which is a critical factor for industries heavily relying on AI applications.

Seagate asserts that NVMe-connected HDDs are well-suited for a range of AI workloads across various sectors, including manufacturing, autonomous vehicles, healthcare imaging, financial analytics, and hyperscale cloud AI. These drives offer several notable advantages over SSDs, such as ten times greater efficiency in embodied carbon per terabyte, four times better operational power consumption efficiency, and a significantly lower cost per terabyte, which can translate to reduced total cost of ownership for AI storage at scale. By leveraging these benefits, Seagate’s NVMe hybrid arrays can help organizations achieve more efficient and cost-effective storage solutions for their AI workloads.

Future Prospects and Industry Impact

Seagate’s NVMe hybrid approach leverages the benefits of GPUDirect protocols, which enable direct access from GPU memory to the drive. This method avoids using a storage array controller CPU, eliminating typical memory buffering delays. It also makes use of existing NVMe-over-Fabrics infrastructure, facilitating seamless scaling of distributed AI storage architectures in high-performance data centers. However, at the individual HDD level, NVMe benefits are less pronounced due to mechanical seek times driving access latency, unlike the reduced latency seen in SSDs, which avoid mechanical seek times through fast electrical connections to data cells.

Seagate proposes a hybrid drive array that integrates SSDs and HDDs via NVMe to a GPU server, enhancing overall connectivity speed despite inherent mechanical delays in HDDs. This array, equipped with an RNIC such as BlueField-3 smartNIC/DPU, can efficiently transmit data using RDMA directly to the server’s memory. This integration aims to reduce storage-related latency in AI workflows, eliminate overheads of older SAS/SATA systems, and enable seamless scaling with NVMe-oF solutions.

Explore more

What’s the Best Backup Power for a Data Center?

In an age where digital infrastructure underpins the global economy, the silent flicker of a power grid failure represents a catastrophic threat capable of bringing commerce to a standstill and erasing invaluable information in an instant. This inherent vulnerability places an immense burden on data centers, the nerve centers of modern society. For these facilities, backup power is not a

Has Phishing Overtaken Malware as a Cyber Threat?

A comprehensive analysis released by a leader in the identity threat protection sector has revealed a significant and alarming shift in the cybercriminal landscape, indicating that corporate users are now overwhelmingly the primary targets of phishing attacks over malware. The core finding, based on new data, is that an enterprise’s workforce is three times more likely to be targeted by

Equinix Wins Approval for Slough Data Center Campus

A Landmark Digital Infrastructure Project Takes Root in the UK In a move set to significantly bolster the UK’s digital infrastructure, colocation giant Equinix has secured planning permission to develop a major new data center campus in Slough. The approval from Slough Borough Council gives the green light to transform a historic former industrial site into a state-of-the-art digital hub,

Trend Analysis: Local Data Center Resistance

In a powerful illustration of a growing national trend, residents of the small Michigan town of Howell Township successfully pushed back against a colossal tech development, forcing the withdrawal of a 1,000-acre data center proposal. As our reliance on the digital world deepens, the physical infrastructure required to support it is meeting unprecedented opposition in local communities. This analysis examines

How Do Hotel Hacks Lead To Customer Fraud?

A seemingly harmless email confirmation for an upcoming hotel stay, a document once considered a symbol of travel and relaxation, has now become a critical vulnerability in a sophisticated cybercrime campaign that directly targets the financial security of travelers worldwide. What begins as a simple booking confirmation can quickly devolve into a carefully orchestrated trap, where cybercriminals exploit the trust