NowConnect operational validation and health guide
Audience
DC Ops and DevOps/SRE teams supporting NowConnect in the empowernow stack.
What NowConnect is (in this stack)
- Cloud hub service:
nowconnect-cloud(WS endpoint on port 8765; TCP listeners that accept client traffic, e.g., LDAP on 389). - Premise agent service:
nowconnect-premise(maintains WS tunnel to the cloud hub and forwards TCP to the actual target). - Connector mapping: listener
addomainforwards toopenldap(config:ServiceConfigs/NowConnect/config/cloud.yaml). - Validation target in this doc: OpenLDAP via the NowConnect tunnel.
Quick “ping-pong” validation (end-to-end)
Run these anytime to prove the data path and quantify traffic. Commands are PowerShell‑friendly.
- Ensure the services are up
cd C:\source\repos\CRUDService
docker compose -f docker-compose-nowconnect.yml up -d nowconnect-cloud nowconnect-premise openldap
- Sanity logs (cloud and agent)
docker compose -f docker-compose-nowconnect.yml logs --tail 100 nowconnect-cloud
docker compose -f docker-compose-nowconnect.yml logs --tail 100 nowconnect-premise
Expect to see:
- cloud:
listener_startedon 0.0.0.0:389 with connectoraddomain, thenagent_connected, thenconnection open. - agent: “connecting to cloud”, “connection open”.
- Generate tunnel traffic (read‑only LDAP search via the gateway)
docker compose -f docker-compose-nowconnect.yml run --rm `
-e OPENLDAP_URL=ldap://nowconnect-cloud -e OPENLDAP_PORT=389 `
-e OPENLDAP_BASE_DN=dc=example,dc=org `
-e OPENLDAP_USERNAME=cn=admin,dc=example,dc=org `
-e OPENLDAP_PASSWORD=admin -e OPENLDAP_SSL=false `
crud-service pytest tests/integration/test_openldap_readonly_extended.py::test_anonymous_bind_search -q
- This connects to host
nowconnect-cloud, which proves the path goes through the tunnel (not directly toopenldap).
- Verify traffic counters (Prometheus metrics from cloud hub)
- The metrics endpoint is exposed on the cloud hub process (port 8765). From inside the container:
docker compose -f docker-compose-nowconnect.yml exec nowconnect-cloud sh -lc `
"python - <<'PY'
import urllib.request
d = urllib.request.urlopen('http://localhost:8765/metrics', timeout=3).read().decode()
for l in d.splitlines():
if l.startswith('nowconnect_tcp_connections_total') or l.startswith('nowconnect_tcp_bytes_total'):
print(l)
PY"
Expect counters like:
nowconnect_tcp_connections_total{connector="addomain"} N
nowconnect_tcp_bytes_total{direction="c2a",connector="addomain"} X
nowconnect_tcp_bytes_total{direction="a2c",connector="addomain"} Y
- Re‑run the LDAP test; the bytes counters should increase (c2a = client→agent; a2c = agent→client).
Health endpoints and what “good” looks like
- Cloud hub:
- HTTP:
GET /healthz→{"status":"ok"}(on port 8765 in‑container) - WebSocket:
/tunnel(agent connects here; presence is visible in logs) - Metrics:
GET /metrics(Prometheus exposition; see counters above)
- HTTP:
- Agent: lifecycle visible in logs (“connecting to cloud”, “connection open”).
Recommended Prometheus scrape (internal network):
- job_name: 'nowconnect'
static_configs:
- targets: ['nowconnect-cloud:8765']
metrics_path: /metrics
Operational runbook
Fast diagnostic checklist
- Is the listener up?
- Cloud logs contain:
listener_startedwith the expected bind/connector.
- Cloud logs contain:
- Is the agent connected?
- Cloud logs contain:
agent_connected(agent_id: dev-agent-1; connectors: ["addomain"]).
- Cloud logs contain:
- Is traffic flowing?
- Run the LDAP test via
nowconnect-cloudhost (above). - Confirm counter deltas on:
nowconnect_tcp_connections_total{connector="addomain"}nowconnect_tcp_bytes_total{direction="c2a|a2c",connector="addomain"}
- Run the LDAP test via
Common failure signatures and actions
- Cloud log
no_agent_for_connector:- Agent not registered or not connected; check agent logs; verify
NC_CONNECTORSincludesaddomain.
- Agent not registered or not connected; check agent logs; verify
- Cloud log
cidr_blocked:- Client source IP not in
allow_cidrs(ServiceConfigs/NowConnect/config/cloud.yaml). Update allowlist if needed.
- Client source IP not in
- Metrics fetch “connection refused”:
- Query from inside the
nowconnect-cloudcontainer onhttp://localhost:8765/metrics. - If you need host access, expose 8765 externally or route via Traefik.
- Query from inside the
- LDAP errors “attribute type not present” on group membership deletion:
- This is benign on idempotent cleanup. Tests were updated to ignore this when the member is already absent. For production, we recommend the connector treats LDAP result code 16 (noSuchAttribute) as success on delete.
Reliability and observability recommendations
Dashboards
Track at minimum:
nowconnect_tcp_connections_total{connector}rate(nowconnect_tcp_bytes_total[1m]) by (direction,connector)nowconnect_tcp_active_connections(if added later)- FIN/RST counters (FIN_SENT/FIN_RECV/RST_RECV) for anomalies
Alerts (examples)
- “No traffic” burn alert:
- No increase in
nowconnect_tcp_connections_total{connector="addomain"}for 15m during business hours.
- No increase in
- “One-way traffic”:
rate(nowconnect_tcp_bytes_total{direction="c2a"}[5m]) > 0andrate(...{direction="a2c"}[5m]) == 0(or vice versa) for 5m.
- “Agent disconnected”:
- Cloud log scrape detects loss of
agent_connectedwithout reconnect, or long gap in connections while service is up.
- Cloud log scrape detects loss of
CI/CD health gate (optional but encouraged)
Add a job that:
- Spins up
nowconnect-cloud,nowconnect-premise,openldap. - Executes the read‑only LDAP test with host
nowconnect-cloud. - Pulls and asserts metrics deltas > 0 on c2a/a2c bytes counters.
Notes and improvements
-
System definition for tunnel route
- Added
ServiceConfigs/CRUDService/config/systems/av_openldap_nowconnect.yamlwithbase_url: nowconnect-cloud. - For runs that load a system by name, you can shadow
av_openldapwith this file viaSERVICE_CONFIG_DIR=/tmp/nowconnect(copy/rename) so existing code paths use the tunnel without code edits.
- Added
-
Idempotency fix in tests
tests/integration/test_openldap_write_path.pynow ignores “attribute type not present” on member removal (safe retry behavior).- Recommended connector hardening: treat LDAP result code 16 (noSuchAttribute) as success on delete; result code 20 (typeOrValueExists) as success on add.
-
Externalizing metrics (optional)
- Today we read metrics from inside the
nowconnect-cloudcontainer. - To query from host, map port 8765 or route with Traefik and restrict access (IP allowlist).
- Today we read metrics from inside the
One-command “ping-pong and metrics” bundle (copy/paste)
# 1) Start services
docker compose -f CRUDService/docker-compose-nowconnect.yml up -d nowconnect-cloud nowconnect-premise openldap
# 2) Generate tunnel traffic
docker compose -f CRUDService/docker-compose-nowconnect.yml run --rm `
-e OPENLDAP_URL=ldap://nowconnect-cloud -e OPENLDAP_PORT=389 `
-e OPENLDAP_BASE_DN=dc=example,dc=org -e OPENLDAP_USERNAME=cn=admin,dc=example,dc=org `
-e OPENLDAP_PASSWORD=admin -e OPENLDAP_SSL=false `
crud-service pytest tests/integration/test_openldap_readonly_extended.py::test_anonymous_bind_search -q
# 3) Show counters
docker compose -f CRUDService/docker-compose-nowconnect.yml exec nowconnect-cloud sh -lc `
"python - <<'PY'
import urllib.request
d = urllib.request.urlopen('http://localhost:8765/metrics', timeout=3).read().decode()
for l in d.splitlines():
if l.startswith('nowconnect_tcp_connections_total') or l.startswith('nowconnect_tcp_bytes_total'):
print(l)
PY"
This gives you a reliable, self‑service “ping‑pong” and metrics‑based validation for NowConnect at any time.
See also:
services/nowconnect/explanation/metrics-and-reliabilityservices/crud-service/explanation/ldap-connector-idempotency