AI高速イーサネット

Look Before You Leap: AI Data Center Testing Emerges as Top Priority

By :

Look-Before-You-Leap-blog-hero-1240x600

AI is transforming data centers, driving demand for high-speed Ethernet and scalable infrastructure. This blog explores how operators are adopting 800G Ethernet and sustainable practices to meet AI’s intensive requirements. Learn why efficiency and resource optimization are essential for monetizing AI infrastructure investments.

Spirent’s blog has closely tracked the impact of AI workloads on hyperscaler data centers.

With AI models now spanning tens of thousands of GPUs, faster and more reliable infrastructure is urgently required. This is prompting the accelerated adoption of 800G and even 1.6T Ethernet.

But meeting demand is coming at a big cost.

We are seeing significant sustainability challenges like rising energy consumption, cooling demands, and ballooning operational budgets.

Ethernet promises to offer an open, scalable, and cost-effective solution versus proprietary approaches, but long-term efficiency and profitability hinges on optimized operations. That means minimizing bottlenecks and taking steps to avoid underutilization of GPUs.

I recently joined Dell’Oro Group Vice President Sameh Boujelbene to share our insights on this front during the Light Reading “Building Sustainable AI Data Center Networks” webinar.

Dell’Oro estimates the AI backend network market will exceed $20 billion by 2028 with more than 50% compounded annual growth rate. Against this backdrop, the session focused on Ethernet adoption drivers, network efficiency imperatives, and how to ensure profitable, future-ready networks.

In Pursuit of Peak AI Network Performance

Operators and vendors are racing to develop and deploy a range of innovations that are reshaping AI networks. These focused efforts underscore an urgent need for scalable and efficient networking technologies that can support AI workloads like training and inferencing.

Ethernet is fast becoming the preferred network fabric for AI workloads and is projected to overtake proprietary technologies like InfiniBand. We’ve previously covered efforts by the Ultra Ethernet Consortium to expedite Ethernet innovation to better meet AI workload requirements.

Dell’Oro projects Ethernet will surpass InfiniBand by 2028 to better meet AI model requirement for faster, cost-efficient, open, and more scalable infrastructure. That is because Ethernet is flexible enough to support training and inferencing workloads, helping operators keep pace with AI’s rapid growth without risk of locking into proprietary solutions.

Dell-Oro-Ethernet-vs-InfiniBand-in-AI-Networks-diagram-1200x628

Why It’s All About Sustainability and Efficiency

The rapid pace of adoption and scale we’re seeing for high-end servers and GPUs make sustainability challenges unavoidable. As AI clusters begin to require hundreds of thousands of accelerators, significant power and cooling demands arise.

To put this need in perspective, North Carolina’s state grid faces unprecedented demand due to AI-driven data center expansion, requiring significant enhancements to support this growth.

It’s no wonder power efficiency is now seen as a business strategy linked to profitability and environmental goals.

Liquid cooling is proving capable of addressing requirements from the latest servers and network switches as GPUs like NVIDIA’s Blackwell require advanced techniques.

But challenges aren’t contained to energy consumption. Keeping up with the latest demands also necessitates operational efficiency improvements, which we know are key to monetizing AI infrastructure investments.

Consider that some GPU clusters report anywhere between 30%-80% idle time as a result of bottlenecks. That means wasted time and wasted investment. As we reported during the webinar, even 1% packet loss can degrade GPU performance by 30%, which can result in millions of dollars in lost business. This increases the need for more GPUs, creating an endless cycle of inefficiency.

Spirent recognizes an urgent requirement to focus on network optimization to ensure clusters operate efficiently, minimize latency, and avoid underutilization. With power and cooling constraints presenting harsh realities, simply adding more GPUs or overbuilding is not sustainable.

These are discoveries increasingly made in testing, a step that is sometimes skipped by operators.

Testing AI Networks Before Deployment

It is said that it takes a data center to test a data center. Often, the testing burden is on the end customer at all levels of the solution, requiring a comprehensive understanding of the AI network landscape, traffic, and protocols.

Of course, building large-scale GPU labs for testing is expensive and resource intensive. Because it essentially requires the same infrastructure as production environments, pre-deployment labs can be difficult to justify for operators already struggling with scaling actual facilities on pace with demand. That’s not to mention the challenge of replicating real-world traffic patterns.

The latter challenge is beginning to be addressed with AI workload emulation tools that let operators simulate network conditions without needing to build costly GPU labs. These tools mimic real-world traffic and use protocols like RoCEv2 to optimize networks for training and inference workloads.

This critical capability is helping operators identify network bottlenecks before deployment so they can reconfigure data center designs for better performance, and faster and smoother rollouts. This is accelerating deployments as time-to-market emerges as a key differentiator.

Toward a Profitable and Sustainable AI Future

Spirent AI Workload Emulation Solution plays a key role in ensuring AI networks are ready to meet these emerging challenges. By emulating real-world workloads and identifying bottlenecks before deployment, data center operators can prepare to reap the full potential of AI infrastructure.

Learn how high-speed Ethernet technologies are evolving to meet AI networking needs and why efficiency and resource optimization are key to monetizing investments in AI infrastructures by watching the “Building Sustainable AI Data Center Networks” webinar now.

コンテンツはいかがでしたか?

こちらで当社のブログをご購読ください。

ブログニュースレターの購読

Aniket Khosla

VP, Wireline Product Management

Aniket Khosla is the Vice President of Wireline Product Management at Spirent Communications. Aniket has over 25 years of experience in the networking industry, with 15 years of those in Product Management. He is currently responsible for Spirent’s Ethernet test business, with a focus on transformative technologies like AI, 800G, and Automotive.