All Models
MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
Benchmarks
Available Providers (7)
| Provider | Model ID | Input Cost | Output Cost | Context | Max Output | Docs |
|---|---|---|---|---|---|---|
| | mimo-v2-omni | $0.00/MTok | $0.00/MTok | 256K | 128K | |
| | mimo-v2-omni | $0.00/MTok | $0.00/MTok | 256K | 128K | |
| | mimo-v2-omni | $0.00/MTok | $0.00/MTok | 256K | 128K | |
| | xiaomi/mimo-v2-omni | $0.40/MTok | $2.00/MTok | 262.1K | 65.5K | |
| | mimo-v2-omni | $0.40/MTok | $2.00/MTok | 262.1K | 64K | |
| | xiaomi/mimo-v2-omni | $0.40/MTok | $2.00/MTok | 265K | 265K | |
| | mimo-v2-omni | $0.40/MTok | $2.00/MTok | 256K | 128K |
Capabilities
Reasoning
Tool Calling
Attachments
Open Weights
Structured Output