Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

November 6, 20249m 16s

Audio is streamed directly from the publisher (media.rss.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

This document describes Hunyuan-Large, a large open-source language model developed by Tencent. This model utilizes a Mixture of Experts (MoE) architecture, which leverages multiple specialized sub-models to improve performance on a variety of tasks. Hunyuan-Large was trained on a massive dataset, including a significant amount of synthetic data, and utilizes several techniques to optimize performance, such as key-value cache compression, expert routing, and expert-specific learning rate scaling. The model is evaluated on a wide range of benchmarks, demonstrating its superior capabilities in areas such as language understanding, generation, logical reasoning, mathematics, coding, and long-context tasks. Hunyuan-Large's code and checkpoints are publicly available, aiming to accelerate future innovations and applications within the LLM community.

https://arxiv.org/pdf/2411.02265

← All episodes of AI Papers Podcast Daily