Xiaomi has quietly transformed into one of the most ambitious AI players globally over the past year and a half.
The company now offers a comprehensive suite of large language models, voice cloning, an autonomous phone agent, and massive investment backing.
>>> Vivo X Fold 6 to Debut with Custom Dimensity 9500 Super Edition Chip
Here is a complete overview of Xiaomi's AI and LLM ecosystem.
MiMo LLM Family
Xiaomi's AI journey began in April 2025 with the release of MiMo-7B, its first open-source large language model.
Despite only 7 billion parameters, it scored 95.8% on MATH-500 and outperformed OpenAI's o1-mini on math competitions.
The model was trained on 200 billion reasoning tokens and released under MIT license.
In December 2025, Xiaomi launched MiMo-V2-Flash, a 309-billion-parameter Mixture-of-Experts model with about 15 billion active parameters.
It matched GPT-5 and Claude 4.5 Sonnet on software engineering tests, generated responses at 150 tokens per second, and cost only $0.1 per million input tokens.
March 2026 brought MiMo-V2-Pro, a trillion-parameter flagship with 42 billion active parameters and a 1 million-token context window.
It first appeared anonymously on OpenRouter as "Hunter Alpha," processing over 1.5 trillion tokens before Xiaomi claimed it.
Companion models included MiMo-V2-Omni (multimodal) and MiMo-V2-TTS.
In late April 2026, Xiaomi merged its V2 family into MiMo-V2.5-Pro, a 1.02 trillion-parameter model handling text, image, audio, and video.
It ran at 60-80 tokens per second for complex tasks, while the lighter MiMo-V2.5 hit 100-150 tokens per second.
V2.5-Pro ranked as the world's top open-source model for agentic capabilities on Artificial Analysis.
>>> Asus Launches 2026 ROG Zephyrus Duo, G14, G16, ProArt PZ14, and TUF Gaming A14 in India
In early June 2026, Xiaomi launched MiMo Code, a terminal-based AI coding agent with persistent memory that tracks decisions across long projects.
Vision and Audio Models
Xiaomi released MiMo-VL (Vision-Language) and its home-focused variant MiMo-VL-Miloco-7B.
The Miloco model recognizes gestures like thumbs-up and peace signs, and identifies household activities such as watching TV or reading.
MiDashengLM-7B, released in August 2025, is an audio AI model trained on 38,662 hours of data. It understands speech, music, environmental sounds, and speaker emotion.