All Models

MiMo-V2-Omni

mimo Reasoning Tool Calling Attachments Open Weights Structured Output

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Providers 7
Released Mar 18, 2026
Input Modalities text, image, video, audio, pdf
Output Modalities text
Tarsk Use coding

Available Providers (7)

Provider Model ID Input Cost Output Cost Context Max Output Docs
Xiaomi Token Plan (China) mimo-v2-omni $0.00/MTok $0.00/MTok 256K 128K
Xiaomi Token Plan (Singapore) mimo-v2-omni $0.00/MTok $0.00/MTok 256K 128K
Xiaomi Token Plan (Europe) mimo-v2-omni $0.00/MTok $0.00/MTok 256K 128K
OpenRouter xiaomi/mimo-v2-omni $0.40/MTok $2.00/MTok 262.1K 65.5K
OpenCode Go mimo-v2-omni $0.40/MTok $2.00/MTok 262.1K 64K
ZenMux xiaomi/mimo-v2-omni $0.40/MTok $2.00/MTok 265K 265K
Xiaomi mimo-v2-omni $0.40/MTok $2.00/MTok 256K 128K

Capabilities

Reasoning
Tool Calling
Attachments
Open Weights
Structured Output