Network infrastructure has become a performance constraint in large-scale AI training, and Broadcom has spent the past three years building an AI networking portfolio that aims to solve this problem.

Over the last several months, the company rolled out Tomahawk 6 switches for scale-out networking and Jericho 4 for inter->Ultra Ethernet Consortium (UEC) 1.0 specifications and introduces hardware-accelerated capabilities to modernize RDMA.

“This is not the last piece of the puzzle, I would say, but a very important piece of what we have been working on for the last three years and delivered over the last three to four months, which is a complete portfolio,” Hasan Siraj, head of software products and ecosystem at Broadcom, told Network World. “The key message for you is this NIC is fully compliant with Ultra Ethernet features right at 800 gig, and there is nothing else in the industry that can cater to this.”

Scale-out vs. scale-up: Understanding the market segmentation

Thor Ultra targets a specific networking domain that differs fundamentally from GPU-to-GPU interconnects.

Within a single rack, GPUs connect through technologies like NVLink in what Broadcom terms “scale-up” domains. These typically span 72 to 256 XPUs that directly access each other’s memory. Thor Ultra addresses “scale-out” connectivity, the rack-to-rack networking required to create clusters spanning hundreds of thousands of XPUs. This positions it against Nvidia’s Ethernet offerings (Spectrum-X switches and BlueField NICs) and InfiniBand solutions rather than competing with NVLink.

“When you need to get out of that rack and you need to connect multiple of these racks together, you need to scale out. This is where this NIC gets used,” Hassan explained.

The NIC ships in two SerDes configurations. The 100G version provides eight 100G lanes. The 200G version offers four 200G lanes. Both deliver 800G aggregate bandwidth through 16 lanes of PCIe Gen 6. The dual configuration strategy accommodates both current 100G ecosystems and emerging 200G deployments.

srcset=”https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?quality=50&strip=all 1778w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=300%2C200&quality=50&strip=all 300w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=768%2C512&quality=50&strip=all 768w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=1024%2C682&quality=50&strip=all 1024w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=1536%2C1024&quality=50&strip=all 1536w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=1240%2C826&quality=50&strip=all 1240w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=150%2C100&quality=50&strip=all 150w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=1046%2C697&quality=50&strip=all 1046w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=252%2C168&quality=50&strip=all 252w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=126%2C84&quality=50&strip=all 126w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=720%2C480&quality=50&strip=all 720w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=540%2C360&quality=50&strip=all 540w, https://b2b-contenthub.com/wp-content/uploads/2025/10/Broadcom_PCIe-Board.jpg?resize=375%2C250&quality=50&strip=all 375w” width=”1024″ height=”682″ sizes=”auto, (max-width: 1024px) 100vw, 1024px”>

Broadcom

Breaking RDMA’s architectural constraints

Traditional RDMA protocols carry design limitations from their origins two to three decades ago. They lack multipathing support, cannot handle out-of-order packet delivery and rely on Go-Back-N retransmission. Under Go-Back-N, a single dropped packet forces retransmission of that packet plus every subsequent packet in the sequence.

These limitations become critical at scale. Network congestion increases packet loss. Go-Back-N amplifies the problem by flooding already-congested links with redundant retransmissions. Thor Ultra implements four architectural changes to break these constraints.

Packet-level multipathing. The NIC divides its eight 100G lanes into separate network planes. Packets from a single message can be distributed across all planes for load balancing. Standard RDMA requires all packets in a flow to traverse a single path, preventing this optimization.
Out-of-order data placement. Thor Ultra writes packets directly to XPU memory as they arrive, regardless of sequence. The NIC does not buffer packets awaiting in-order delivery. Instead, it tracks packet state and places each into its correct memory location immediately.
Selective acknowledgment and retransmission. Thor Ultra replaces Go-Back-N with selective acknowledgment. When packets 3 and 6 are missing from a sequence of 1 through 8, the NIC sends a SACK indicating exactly which packets arrived and which are missing. The sender retransmits only packets 3 and 6.
Programmable congestion Control. The NIC implements a hardware pipeline that supports multiple congestion control algorithms. Two schemes are currently available: receiver-based congestion control (receivers send credits to senders) and sender-based approaches (senders calculate round-trip time to determine transmission rates). The programmable pipeline can accommodate future UEC specification revisions or custom hyperscaler algorithms.

Performance and power

Thor Ultra consumes approximately 50 watts. This compares to 125-150W for products like Nvidia’s BlueField 3 DPU. The power difference stems from architectural choices rather than process technology.

DPUs target multiple use cases including front-end networking (requiring deep packet inspection and encryption), storage offload and security functions. They incorporate ARM cores, large memory subsystems and extensive acceleration engines. Thor Ultra strips out everything not required for AI backend networking.

Overall, Broadcom projects 10-15% improvement in job completion time through the combination of efficient load balancing, out-of-order delivery and selective retransmission. The company argues this improvement justifies the network investment.

“We believe we can achieve at least 10 to 15% improvement in job completion time, which, if you look at when you’re building a cluster, whether you talk about an 8,000-node cluster, or 100,000-node cluster, the network is about 10-15% of the cost,” Hassan said. “So, the network can pay for itself with this kind of innovation.”

Thor Ultra is sampling now with availability in PCIe and OCP 3.0 form factors. Broadcom expects roughly equal volume between both formats over the next two years. The company offers three additional consumption models beyond standard cards. Customers can purchase discrete chips for custom board designs, and XPU or GPU manufacturers can integrate Thor Ultra as a chiplet. Broadcom will license the design as intellectual property.

🛸 Recommended Intelligence Resource

As UAP researchers and tech enthusiasts, we’re always seeking tools and resources to enhance our investigations and stay ahead of emerging technologies. Check out this resource that fellow researchers have found valuable.

→ HomeFi

Broadcom drops the hammer on AI networking with Thor Ultra

Scale-out vs. scale-up: Understanding the market segmentation

Breaking RDMA’s architectural constraints

Performance and power

🛸 Recommended Intelligence Resource

Comments

Leave a Reply Cancel reply

More posts

Shioli Katsuna on Mitsuki's 'Invasion' season 3 journey and her cool Apex Alien connection (exclusive)

Network jobs watch: Hiring, skills and certification trends

Video Friday: Multimodal Humanoid Walks, Flies, Drives

Alien Civilizations May Only Be Detectable For A Cosmic Blink Of An Eye