Dynamic AI Model Routing — How It Works
Overview
Dynamic Model Routing lets the BFF transparently select an allowed, affordable LLM model when the requested model is disallowed by policy or unaffordable given live budgets. v1 covers model selection within one provider (OpenAI); v2 adds provider switching.
How it works (v1)
- BFF receives a chat request with
model. - PDP returns
constraints(model/tokens/egress) and optionalspend_snapshot. - Preflight applies prompt guard, masking, token clamps, and optional action mapping based on classifier label (
stream_allowed,model_override,budget_hints). - Budget hold based on estimated cost and category-aware
budget_hints(enforcement mode dependent; see budget modes in the Budgets how‑to). - If denied (policy/budget), BFF tries allowed candidates cheapest-first, re-evaluating PDP with
estimated_centsuntil one is allowed. - Egress is re-pinned; request proceeds; receipt emitted.
Key properties
- PDP-first: BFF never bypasses policy
- Budget-aware: evaluates candidates against live budgets; category-aware holds via
hold_with_hints - Transparent:
x-aria-model-selected,x-aria-model-reroutedheaders
Real scenarios
- Hit monthly cap on
gpt-4.1→ reroute togpt-4o-mini - Tenant allows only
gpt-4o-mini→ requests forgpt-4.1are routed togpt-4o-mini
Algorithm details (v1)
Roadmap (v2)
- Provider switching (OpenAI ⇄ Anthropic ⇄ Ollama) with PDP Search shortlist, ranking, and receipts.
See also: Tutorials → LLM Routing Quickstart; Reference → LLM Routing Config, LLM Routing PDP.