Skip to main content

Alerting Cookbook — Metrics and SLO Hints

Key signals (from metrics and logs)

  • Auth flow: bff_oauth_flow_duration_seconds, par_requests_total, bff_token_refresh_*
  • ForwardAuth: reject rate (401), time to 200; dashboard router protected
  • PDP: client call success/error and latency (see PDP client structured logs)
  • DPoP: dpop_validation_failures_total

Example Prometheus rules (pseudocode)

groups:
- name: bff
rules:
- alert: HighAuth401Rate
expr: sum(rate(http_requests_total{path=~"/api/.*",status="401"}[5m])) / sum(rate(http_requests_total{path=~"/api/.*"}[5m])) > 0.2
for: 10m
- alert: PDPHighLatency
expr: histogram_quantile(0.95, sum by (le) (rate(pdp_client_request_duration_seconds_bucket[5m]))) > 0.5
for: 15m
- alert: DPoPValidationFailures
expr: increase(dpop_validation_failures_total[10m]) > 10

SLO hints

  • 95% of ForwardAuth checks < 10ms
  • 95% of PDP decisions < 500ms
  • Token refresh failure rate < 1%

Dashboards

  • Auth successes/errors, ForwardAuth 401%, PDP latency, refresh success/fail, DPoP failures, cache hit/miss.