LLM Routing Configuration Reference
Environment
LLM_PRICING_JSONorLLM_PRICING_PATH— pricing for estimationREDIS_URL— budget holds (prod); dev can use in-process stubRECEIPT_VAULT_URL— optional receipts service- Classifier guard env toggles (ops-side overrides):
CLASSIFIER_GUARD_ENABLED(true|false)CLASSIFIER_BLOCK_MIN_CONF(e.g.,0.75)CLASSIFIER_BLOCK_LABELS(comma-separated)
Headers
- Response:
x-aria-decision-id,x-aria-model-selected,x-aria-model-rerouted
Endpoints
POST /chat/completions
Code entry points
- Endpoint:
ms_bff/src/api/v1/endpoints/llm.py - Enforcement:
ms_bff/src/services/llm_enforcement.py - Budget:
ms_bff/src/services/llm_budget.py - Receipts:
ms_bff/src/services/llm_receipts.py
Classifier guard — action mapping
Optional mapping from category labels to runtime actions. Default and per-label actions are supported. When configured, preflight returns additional fields the endpoint honors: stream_allowed, model_override, and budget_hints.
# ServiceConfigs/BFF/config/classifier.yaml
guard:
enabled: true
block_labels:
- secrets
- credential_theft
- account_takeover
- jailbreak
- policy_bypass
- malware
- exfiltration
- prompt_injection
min_conf: 0.75
per_label_thresholds:
secrets: 0.60
prompt_injection: 0.65
actions:
default:
disallow_stream: false
high_cost_request:
cap_tokens: 2048
route_model: gpt-4o-mini
budget:
hold_multiplier: 1.5
max_cents: 5000
tools_egress_required:
disallow_stream: true
cap_tokens: 1024
budget:
hold_multiplier: 1.2
min_remaining_usd: 5.0
Supported actions:
disallow_stream: boolcap_tokens: introute_model: stringbudget:hold_multiplier: float (>0)max_cents: int (>0)min_remaining_usd: float (>=0)