🐛 Edge Debugging & Telemetry
Debugging globally distributed serverless microservices presents unique operational challenges. Because code runs on Cloudflare’s edge V8 isolates without a persistent server host, traditional step-through debugging is replaced by structured logging, real-time edge telemetry tailing, and distributed tracing.
This document is your engineering runbook for debugging and resolving runtime anomalies in the Hoox monorepo.
⚡ 1. Real-Time Telemetry: Tailing Production Logs
Cloudflare’s wrangler tail engine establishes an active, secure WebSocket stream straight from Cloudflare’s global edge nodes to your terminal, printing every console.log, exception trace, and HTTP status code in real-time.
You can trigger this streaming telemetry via the Hoox CLI or Wrangler:
# A. Recommended: Stream live console logs for a specific worker via Hoox CLI
hoox logs tail trade-worker
# B. Stream gateway traffic in real-time
hoox logs tail hoox
# C. Alternatively, invoke Wrangler directly for custom configurations
npx wrangler tail hoox --config workers/hoox/wrangler.jsonc
📝 2. Distributed Tracing: The requestId Standard
Because a single TradingView alert flows through multiple workers (Gateway → trade-worker → d1-worker → telegram-worker), locating a specific transaction log in a sea of concurrent entries is impossible without a unified trace parameter.
The RequestId Protocol
- Generation: The
hooxgateway generates a unique, cryptographically secure UUIDv4 immediately upon receiving a webhook payload. This is set as therequestIdin the transaction context. - Propagation: Every internal service binding call transmits this UUID as the
X-Request-IdHTTP header. - Structured Ingestion: When logging error telemetry or writing trade fills to R2 and D1 databases, workers attach the
requestIdas a dedicated index column. - Unified Auditing: You can audit an entire transaction’s lifecycle across all 9 workers by running a single database query filtering by the trace ID:
SELECT created_at, worker, message, severity
FROM system_logs
WHERE request_id = '9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d'
ORDER BY created_at ASC
🛡️ 3. Operational Runbooks for Common Edge Failures
A. Symptom: 401 Unauthorized on Internal Worker Calls
- Diagnostics: The calling worker gets a
401 Unauthorizedresponse when querying internal services liked1-workerortrade-worker. - Primary Root Cause: The
INTERNAL_KEY_BINDINGsecret does not match between the calling worker and the target worker. - Resolution:
- Inspect the secret existence on both workers:
hoox secrets check. - If mismatched, re-inject the secret globally using one command:
hoox secrets set INTERNAL_KEY_BINDING "your_secure_shared_internal_key"
- Inspect the secret existence on both workers:
B. Symptom: 502 Bad Gateway on Webhook Routes
- Diagnostics: An incoming signal returns a 502 error instantly.
- Primary Root Cause: A Service Binding target declared in the gateway’s
wrangler.jsonchas not been deployed yet. - Resolution:
- Run the dependency check:
hoox check health. - Redeploy the entire stack in the correct dependency order:
hoox deploy all --auto
- Run the dependency check:
C. Symptom: D1_ERROR: no such table
- Diagnostics: Worker database transactions fail with table missing errors.
- Primary Root Cause: Relational SQLite schemas were never initialized on your production D1 instance.
- Resolution:
- Initialize database tables and run pending drizzle migrations:
hoox db apply --remote hoox db migrate --remote
- Initialize database tables and run pending drizzle migrations:
D. Symptom: KV Configuration Propagation Delays
- Diagnostics: You toggled a KV configuration (e.g. turned
trade:kill_switch = false), but some incoming signals are still being rejected. - Primary Root Cause: Cloudflare KV is eventually consistent. Updates can take up to 10 seconds to propagate to all global edge locations.
- Resolution: Wait 10-15 seconds for cache validation to settle globally. Do not trigger rapid test webhooks immediately after modifying configuration keys.
🔗 Next Steps
- Testing Framework & QA Standards — Run local unit and integration tests using Bun’s native test runner.
- Self-Healing & System Repair — Run diagnostics, targeted component repairs, and full system rebuilds.