Dynamic AI Model Routing — How It Works

Overview

Dynamic Model Routing lets the BFF transparently select an allowed, affordable LLM model when the requested model is disallowed by policy or unaffordable given live budgets. v1 covers model selection within one provider (OpenAI); v2 adds provider switching.

How it works (v1)

BFF receives a chat request with model.
PDP returns constraints (model/tokens/egress) and optional spend_snapshot.
Preflight applies prompt guard, masking, token clamps, and optional action mapping based on classifier label (stream_allowed, model_override, budget_hints).
Budget hold based on estimated cost and category-aware budget_hints (enforcement mode dependent; see budget modes in the Budgets how‑to).
If denied (policy/budget), BFF tries allowed candidates cheapest-first, re-evaluating PDP with estimated_cents until one is allowed.
Egress is re-pinned; request proceeds; receipt emitted.

Key properties

PDP-first: BFF never bypasses policy
Budget-aware: evaluates candidates against live budgets; category-aware holds via hold_with_hints
Transparent: x-aria-model-selected, x-aria-model-rerouted headers

Real scenarios

Hit monthly cap on gpt-4.1 → reroute to gpt-4o-mini
Tenant allows only gpt-4o-mini → requests for gpt-4.1 are routed to gpt-4o-mini

Algorithm details (v1)

Roadmap (v2)

Provider switching (OpenAI ⇄ Anthropic ⇄ Ollama) with PDP Search shortlist, ranking, and receipts.

See also: Tutorials → LLM Routing Quickstart; Reference → LLM Routing Config, LLM Routing PDP.

Overview​

How it works (v1)​

Key properties​

Real scenarios​

Algorithm details (v1)​

Roadmap (v2)​