All Models
Qwen/Qwen3-VL-8B-Instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
Available Providers (4)
| Provider | Model ID | Input Cost | Output Cost | Context | Max Output | Docs |
|---|---|---|---|---|---|---|
| | qwen/qwen3-vl-8b-instruct | $0.08/MTok | $0.50/MTok | 131.1K | 32.8K | |
| | qwen/qwen3-vl-8b-instruct | $0.08/MTok | $0.50/MTok | 131.1K | 32.8K | |
| | Qwen/Qwen3-VL-8B-Instruct | $0.18/MTok | $0.68/MTok | 262K | 262K | |
| | Qwen/Qwen3-VL-8B-Instruct | $0.18/MTok | $0.68/MTok | 262K | 262K |
Capabilities
Reasoning
Tool Calling
Attachments
Open Weights
Structured Output