Telegram Group Search
Channel created
Channel photo updated
🚀 Just published: "Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts" 🌐

Introducing Loss-Free Balancing—our latest innovation in MoE models that ditches the need for auxiliary loss. By dynamically adjusting expert biases, we ensure optimal load balance without the side effects of unwanted gradients. Validated on models up to 3B parameters, our approach delivers better validation loss and load balance than traditional methods.

Technical report: arxiv.org/abs/2408.15664
2025/05/31 14:53:49
Back to Top
HTML Embed Code: