Xiaomi: MiMo-V2-Omni

Reasoning Tool Calling Attachments Open Weights

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Providers 1

Released Mar 18, 2026

Input Modalities audio, image, text, video

Output Modalities text

Tarsk Use coding

Benchmarks

Available Providers (1)

Provider	Model ID	Input Cost	Output Cost	Context	Max Output	Docs
Kilo Gateway	`xiaomi/mimo-v2-omni`	$0.40/MTok	$2.00/MTok	262.1K	65.5K

Capabilities

Reasoning

Tool Calling

Attachments

Open Weights

Structured Output