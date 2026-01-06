Broadcom’s Congestion Aware Sprayed Traffic (CAST) successfully mitigates congestion measured in a cluster of MI300X GPUs interconnected with Broadcom’s Tomahawk 5 switches and Thor 2 400G NIC

HSINCHU, Taiwan--(BUSINESS WIRE)--Edgecore Networks, a leading provider of open networking solutions, today announced the results of a joint performance study with our strategic technology partners. The study evaluated Broadcom’s Congestion Aware Sprayed Traffic (CAST) technology with Edgecore AIS800-64O 800G switch networking in AMD MI300X GPU cluster and its impact on collective communication workloads in AI and high-performance computing (HPC) environments.

Broadcom CAST optimizes multi-path communication by dynamically directing traffic based on real-time congestion metrics, specifically round-trip time (RTT). This intelligent traffic distribution significantly improves performance over traditional load balancing methods, particularly in distributed training, inference, and parallel computation.

The study focused on four key RCCL collective operations:

All-Reduce

All-Gather

Reduce-Scatter

All-To-All

Testing was conducted across three cluster configurations—oversubscribed (2:1), non-blocking (1:1), and undersubscribed (1:2)—with CAST consistently delivering performance improvements across all these three scenarios. Highlights include:

Oversubscription (2:1): Up to 26.7% improvement

Up to 26.7% improvement Nonblocking (1:1): Up to 35.6% improvement

Up to 35.6% improvement Undersubscription (1:2): Up to 29.8% improvement

The performance diagram is available at Edgecore website.

https://www.edge-core.com/press-release/edgecore-networks-demonstrates-up-to-35-6-performance-improvement-in-collective-communication-benchmark/

Edgecore contributed its expertise in open networking, high-speed Ethernet networking fabrics, and multi-rail RDMA environments, supporting system-level end-to-end performance tuning throughout the study. This study utilized a comprehensive SONiC & Broadcom Thor 2 NIC Telemetry and Monitoring Solution, providing the essential Network Observability required to capture real-time, granular visibility into configuration, utilization, thermal metrics, and crucial PFC/DCQCN congestion control states for accurate performance validation.

“Mitigating congestion is of vital importance to AI networks,” said Karen Schramm, vice president of Architecture, Data Center Solutions Group, Broadcom. “Enhanced with our CAST technology, Edgecore’s networking solutions effectively limit congestion delivering compelling performance improvements in AI applications as well as rapid link failure recovery without breaking compatibility with RoCEv2 standards — thereby ensuring compatibility with existing hardware.”

“As AI workloads scale rapidly, network performance and predictability have become critical,” said Nanda Ravindran, VP PLM of Edgecore Networks. "By integrating Broadcom’s CAST technology with Enterprise SONiC running on Edgecore’s 800G AI open networking solutions — and applying for coordinated, system-level co-optimization — we effectively find a feasible approach to mitigate congestion and deliver measurable improvements in collective communication performance while maintaining full RoCEv2 compatibility."

This collaboration underscores Edgecore Networks’ commitment to advancing open networking technologies and delivering high-impact networking solution research for next-generation cutting-edge AI and HPC systems.

