All Models
Xiaomi: MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
Benchmarks
Available Providers (1)
| Provider | Model ID | Input Cost | Output Cost | Context | Max Output | Docs |
|---|---|---|---|---|---|---|
| | xiaomi/mimo-v2-omni | $0.40/MTok | $2.00/MTok | 262.1K | 65.5K |
Capabilities
Reasoning
Tool Calling
Attachments
Open Weights
Structured Output