All Models

ByteDance: UI-TARS 7B

Attachments

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Providers 1
Released Jul 23, 2025
Input Modalities image, text
Output Modalities text

Available Providers (1)

Provider Model ID Input Cost Output Cost Context Max Output Docs
Kilo Gateway bytedance/ui-tars-1.5-7b $0.10/MTok $0.20/MTok 128K 2.0K

Capabilities

Reasoning
Tool Calling
Attachments
Open Weights
Structured Output