All posts

NVIDIA Nemotron 3 in Tarsk

NVIDIA Nemotron 3 in Tarsk cover image

NVIDIA shipped Nemotron 3 in three sizes: Nano (30B), Super (120B), and Ultra (550B). All three are open-weight MoE models trained for tool use and long context. Tarsk already lists Nemotron 3 endpoints from nine providers, so you can point an agent at one without writing a custom integration.

The model line

Nemotron 3 targets agent workloads: multi-step tool calls, long files in context, retries when a command fails. NVIDIA released weights, training data (where redistribution is allowed), and recipes under the Open Model License. You can self-host or call a hosted endpoint.

The backbone mixes Mamba state-space layers with Transformer attention in a Mixture-of-Experts layout. Each forward pass activates a small slice of total parameters. Nano runs about 3B of 30B. Ultra runs about 55B of 550B.

Nano, Super, Ultra

  • Nano (30B total, ~3B active) fits tight inference budgets. NVIDIA reports 4× higher throughput than Nemotron 2 Nano on comparable hardware. Good fit for debug loops, summarization, and retrieval-heavy agents.
  • Super (120B total, ~12B active) adds LatentMoE routing, Multi-Token Prediction, and NVFP4 training. NVIDIA positions it for multi-agent setups like ticket routing and IT automation.
  • Ultra (550B total, ~55B active) landed at Computex 2026. Same MoE tricks at frontier scale. NVIDIA cites up to 5× throughput versus the prior generation and roughly 30% lower operating cost on Blackwell hardware.

All three tiers accept up to 1M tokens of context. Checkpoints and training artifacts are on Hugging Face. Super and Ultra also ship NVFP4-quantized variants tuned for Blackwell GPUs.

What changes for your agents

Chat billing counts tokens per turn. Agent billing counts tokens across every tool call, planning step, and retry. Nemotron 3 optimizes for that second pattern.

  • MoE routing limits active parameters per token so a 120B model does not behave like a dense 120B model at inference time.
  • Multi-Token Prediction generates several tokens per step, which speeds long outputs and supports speculative decoding without a separate draft model.
  • Mamba layers scale context length closer to linear cost than standard attention, which matters when you paste a whole repo into the prompt.
  • NeMo Gym and NeMo RL give you post-training environments if you need a domain-specific agent on top of the base checkpoints.

Providers in Tarsk

Tarsk syncs provider catalogs on a schedule. When OpenRouter or Together AI lists a new Nemotron endpoint, it shows up under Settings → Models without waiting for a Tarsk release.

First-party and local

  • NVIDIA at build.nvidia.com: Nano, Super, Ultra, plus content-safety and embedding variants.
  • Ollama Cloud: nemotron-3-nano:30b and nemotron-3-super.

Ultra (recent catalog additions)

  • OpenRouter: nvidia/nemotron-3-ultra-550b-a55b, a free tier, and a content-safety model.
  • Together AI: nvidia/nemotron-3-ultra-550b-a55b.
  • OpenCode Zen: nemotron-3-ultra-free (replaced the deprecated Super free tier).

Nano and Super

  • Kilo Gateway: Nano 30B, Super 120B (paid and free), Nano Omni reasoning.
  • NanoGPT: Super with a thinking variant, Nano, Nano Omni reasoning.
  • Nebius Token Factory: Nano Omni, Super 120B, Nano 30B.
  • Cortecs: nemotron-3-super-120b-a12b.

Try it in Tarsk

  1. Add a provider key under Settings → Providers (NVIDIA, OpenRouter, or Together AI are the fastest paths to Ultra).
  2. Open Settings → Models, search nemotron, enable the tier you want.
  3. Pick that model in your thread and send a task with tools turned on.

Start with Nano if you care about cost per step. Move to Super for production agent fleets. Use Ultra when a task needs long reasoning chains and you have the budget for 550B-class inference.

Summary

01
Three checkpoints, one architecture. Nano, Super, and Ultra share MoE routing and a 1M-token context window. You pick the size that matches your latency and cost ceiling.
02
Agent training, not chat fine-tuning. NVIDIA trained these models with multi-environment RL and ships NeMo tooling if you need further specialization.
03
Nine providers in Tarsk today. Ultra: OpenRouter, Together AI, OpenCode Zen. Nano and Super: Kilo Gateway, NanoGPT, Nebius Token Factory, Cortecs, Ollama Cloud.

Run a Nemotron agent

Download Tarsk, add a provider key, and enable a Nemotron model in your next thread.