Tencent’s tech team has optimized DeepSeek’s open-source DeepEP communication framework,family Archives boosting its performance across different network environments, according to the Chinese AI startup. Testing showed a 100% improvement on RoCE networks and a 30% gain on InfiniBand (IB), offering more efficient solutions for AI model training. On GitHub, DeepSeek acknowledged the Chinese tech giant’s contribution had led to a “huge speedup.” DeepEP is a communication library tailored for a mixture of experts (MoE) and expert parallelism (EP), supporting high-throughput, low-latency GPU kernels and low-precision computing, including FP8. Tencent’s Starlink Networking team identified two main bottlenecks: underutilized dual-port NIC bandwidth and CPU control latency. After targeted optimizations, performance doubled on RoCE and improved by 30% on IB. The enhanced framework is now fully open-source and has been successfully deployed in training Tencent’s Hunyuan large model, demonstrating strong versatility within environments built on Tencent’s Starlink and H20 servers, Chinese tech media outlet iThome reported. [iThome, in Chinese]
Related Articles
2025-06-26 09:19
2787 views
Amazon Prime Grubhub deal: Save $10 off orders of $20 or more
SAVE $10: From May 12 to June 8, Amazon Prime members using the Grubhub+ account included with their
Read More
2025-06-26 08:21
1692 views
Elon Musk's rant on aliens and chemtrails is your April Fools' Day science treat
Tesla and SpaceX founder Elon Musk may not be the best comedian among tech moguls, but he's pretty g
Read More
2025-06-26 07:29
2650 views
Couple announces pregnancy in a perfect Bob Ross
Beyoncé might have given us the best pregnancy announcement of 2017, but this couple comes in
Read More