Xiaomi has introduced UltraSpeed mode for its MiMo-V2.5-Pro large language model, claiming it can surpass 1,000 tokens per second on general-purpose GPUs.
The 1-trillion-parameter model was developed jointly with TileRT.
>>> Nintendo Fined €35 Million in France Over Switch Joy-Con Drift
Xiaomi attributes the milestone to what it calls the "ultimate co-design" of the model and its underlying system.
To illustrate the speed, the earlier MiMo-V2-Flash model generated responses at 150 tokens per second when launched in December 2025, translating to roughly 110 words per second.
The new mode is about 10 times faster than standard MiMo-V2.5-Pro API access.
>>> Apple reveals device compatibility for iOS 27, iPadOS 27, macOS 27 Golden Gate, and watchOS 27
Pricing and availability
The speed improvement comes with a cost. The MiMo-V2.5-Pro-UltraSpeed API is priced at 3x the standard rate.
For comparison, the regular MiMo-V2.5-Pro charges 0.025 yuan per million tokens on a cache hit, 3 yuan on a cache miss for input, and 6 yuan per million tokens for output.
Xiaomi describes the UltraSpeed mode as a "3x price increase" offering a "10x output experience." The Token Plan is not supported; this is API trial access only.
>>> Xiaomi HyperOS Leak Hints at Widescreen Foldable Phone
Due to limited high-speed inference resources, Xiaomi is running an application-based trial from June 9 to June 23, 2026.
Approval is not guaranteed, and the company says it will prioritize enterprises and professional developers with genuine business needs.
Approved users receive a two-week free Chat experience with restrictions: a maximum of 10 queue entries per account per day, sessions capped at 30 minutes, and automatic resource release if idle for more than 5 minutes.
>>> Lenovo Bellator Feng 7000X Gaming Desktop Launches in China with Intel/AMD CPUs and RTX 50 GPUs
MiMo-V2.5-Pro originally launched in April 2026 as part of Xiaomi's growing model family, which now includes text, voice, and multimodal capabilities.