Observability¶
Starting with version 3.5.0, BotCity Runner offers observability features to monitor, troubleshoot, and gain insights into automation performance.
Overview¶
BotCity Runner provides comprehensive observability through:
- Structured Logging — TEXT or JSON format with rich contextual metadata
- OpenTelemetry Integration — Export logs to any OTLP-compatible backend
- MDC Context Propagation — Consistent correlation across log entries
Configuration Reference¶
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
BOTCITY_RUNNER_LOG_DIR |
./logs |
Directory for log files |
BOTCITY_RUNNER_LOG_FORMAT |
TEXT |
Log format: TEXT or JSON |
BOTCITY_RUNNER_LOG_LEVEL |
INFO |
Log level for dev.botcity.runner package |
BOTCITY_RUNNER_LOG_CONSOLE |
false |
Enable console output |
BOTCITY_RUNNER_OTEL_ENABLED |
false |
Enable OpenTelemetry export |
OpenTelemetry Variables¶
When BOTCITY_RUNNER_OTEL_ENABLED=true, the Runner respects standard OTEL environment variables:
| Variable | Default | Description |
|---|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT |
http://localhost:4317 |
OTLP endpoint URL |
OTEL_EXPORTER_OTLP_HEADERS |
— | Auth headers (key=value,key2=value2) |
OTEL_EXPORTER_OTLP_PROTOCOL |
grpc |
Protocol: grpc or http/protobuf |
OTEL_EXPORTER_OTLP_COMPRESSION |
— | Compression: gzip or none |
OTEL_EXPORTER_OTLP_TIMEOUT |
— | Request timeout in milliseconds |
OTEL_SERVICE_NAME |
botcity-runner |
Service name for identification |
OTEL_RESOURCE_ATTRIBUTES |
— | Additional attributes (key=value,key2=value2) |
Log Context Fields¶
Every log entry includes contextual metadata via MDC (Mapped Diagnostic Context):
| Field | Description |
|---|---|
organization |
Organization identifier |
runnerLabel |
Runner instance label |
runnerVersion |
Runner software version |
automationLabel |
Automation label |
technology |
Bot technology (Python, Java, etc.) |
botId |
Bot Label/Identifier |
botVersion |
Bot version |
taskId |
Current task identifier |
These fields enable filtering and correlation across distributed systems.
Log Formats¶
TEXT Format (Default)¶
Human-readable format suitable for local development and traditional log management:
2026-04-21 13:00:00.123 INFO [org=acme] [runner=prod-01] [runnerVersion=3.5.0] [automation=invoice-bot] [technology=python] [botId=inv-001] [botVersion=1.2.0] [taskId=12345] Starting task execution
JSON Format¶
Structured JSON format compatible with log aggregation platforms (ELK, Splunk, Datadog, etc.):
{
"@timestamp": "2026-04-21T13:00:00.123Z",
"level": "INFO",
"message": "Starting task execution",
"organization": "acme",
"runnerLabel": "prod-01",
"runnerVersion": "3.5.0",
"automationLabel": "invoice-bot",
"technology": "python",
"botId": "inv-001",
"botVersion": "1.2.0",
"taskId": "12345",
"application": "botcity-runner",
"service.name": "botcity-runner"
}
Enable with:
Observability Platform Integration¶
BotCity Runner supports direct integration with observability platforms via OpenTelemetry. No intermediary collector is required — the Runner sends logs directly to these services.
Dynatrace¶
Dynatrace provides full-stack observability with AI-powered analytics.
export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://{environment-id}.live.dynatrace.com/api/v2/otlp
export OTEL_EXPORTER_OTLP_HEADERS=Authorization=Api-Token dt0c01.xxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner
Setup steps:
- In Dynatrace, go to Settings → Integration → API tokens
- Create a token with
logs.ingestscope - Replace
{environment-id}with your Dynatrace environment ID - Replace
dt0c01.xxxxxxxxxxxxwith your API token
Viewing logs:
- Navigate to Observe and explore → Logs
- Filter by
service.name = "botcity-runner" - Use DQL to query:
fetch logs | filter service.name == "botcity-runner"
New Relic¶
New Relic offers unified observability with powerful querying via NRQL.
export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS=api-key=NRAK-xxxxxxxxxxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner
Setup steps:
- In New Relic, go to API keys (user menu → API keys)
- Create or copy your License key (starts with
NRAK-) - For EU datacenter, use
https://otlp.eu01.nr-data.net:4318
Viewing logs:
- Navigate to Logs
- Filter by
service.name = "botcity-runner" - Use NRQL:
SELECT * FROM Log WHERE service.name = 'botcity-runner'
Useful NRQL queries:
-- Error rate by automation
SELECT count(*) FROM Log
WHERE service.name = 'botcity-runner' AND level = 'ERROR'
FACET automationLabel TIMESERIES
-- Task execution timeline
SELECT * FROM Log
WHERE service.name = 'botcity-runner' AND taskId = '12345'
ORDER BY timestamp
Datadog¶
Datadog provides comprehensive monitoring with log analytics and APM correlation.
export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://http-intake.logs.datadoghq.com:443
export OTEL_EXPORTER_OTLP_HEADERS=DD-API-KEY=xxxxxxxxxxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner
Setup steps:
- In Datadog, go to Organization Settings → API Keys
- Create or copy an API key
- For EU site, use
https://http-intake.logs.datadoghq.eu:443
Viewing logs:
- Navigate to Logs → Search
- Filter by
service:botcity-runner - Create facets for
@taskId,@botId,@automationLabelfor efficient filtering
Useful queries:
service:botcity-runner status:error
service:botcity-runner @taskId:12345
service:botcity-runner @automationLabel:invoice-bot
Grafana Cloud¶
Grafana Cloud provides open-source-based observability with Loki for logs.
export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instanceId:apiKey)>
export OTEL_SERVICE_NAME=botcity-runner
Setup steps:
- In Grafana Cloud, go to your stack → Connections → OpenTelemetry
- Note your OTLP endpoint URL (varies by region)
- Create an API token with
logs:writescope - Base64 encode
instanceId:apiKeyfor the Authorization header:
Viewing logs:
- Navigate to Explore → Select Loki data source
- Use LogQL:
{service_name="botcity-runner"}
Useful LogQL queries:
# Filter by task
{service_name="botcity-runner"} | json | taskId="12345"
# Error logs with context
{service_name="botcity-runner"} | json | level="ERROR"
# Search by automation
{service_name="botcity-runner"} | json | automationLabel="invoice-bot"
Using an OpenTelemetry Collector¶
For advanced scenarios (buffering, transformation, multi-destination routing), deploy an OpenTelemetry Collector as an intermediary.
When to Use a Collector¶
- Multi-destination routing — Send logs to multiple backends simultaneously
- Data transformation — Enrich, filter, or redact log data
- Buffering — Handle network interruptions gracefully
- Credential management — Keep API keys on the collector, not the Runner
Basic Collector Setup¶
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
exporters:
# Example: Datadog
datadog:
api:
key: ${DD_API_KEY}
site: datadoghq.com
# Example: New Relic
otlp/newrelic:
endpoint: https://otlp.nr-data.net:4317
headers:
api-key: ${NEW_RELIC_LICENSE_KEY}
# Example: Dynatrace
otlphttp/dynatrace:
endpoint: https://{environment-id}.live.dynatrace.com/api/v2/otlp
headers:
Authorization: "Api-Token ${DT_API_TOKEN}"
service:
pipelines:
logs:
receivers: [ otlp ]
processors: [ batch ]
exporters: [ datadog, otlp/newrelic, otlphttp/dynatrace ]
Configure the Runner to send to the collector:
export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector-host:4317
export OTEL_SERVICE_NAME=botcity-runner
Best Practices¶
1. Use Consistent Service Naming¶
Set OTEL_SERVICE_NAME consistently across all Runner instances:
Add environment context via resource attributes:
2. Enable JSON Format for Log Aggregation¶
When using log aggregation platforms without OTEL, enable JSON format:
This enables structured parsing and field extraction.
3. Log Retention Strategy¶
Consider your retention needs:
- Local files — 7 days, 1GB cap (default)
- OTEL backend — Configure based on compliance requirements
Troubleshooting¶
Logs Not Appearing in Backend¶
-
Verify OTEL is enabled:
-
Check endpoint connectivity:
-
Verify authentication headers are correct
-
Check Runner startup logs for OTEL initialization errors
Missing Context Fields¶
Ensure the Runner is properly setting MDC context. Check that:
- Task execution is going through the standard execution path
- No exceptions are occurring before context is set
High Latency or Dropped Logs¶
If using direct integration:
- Consider deploying a collector for buffering
- Check network latency to the backend
- Verify backend rate limits aren't being hit
Retry and Resilience Behavior¶
The OpenTelemetry SDK includes built-in retry logic for transient failures when exporting logs:
Default Behavior¶
| Setting | Default Value | Description |
|---|---|---|
| Retry enabled | Yes | Automatic retry on transient failures |
| Initial backoff | 1 second | Wait time before first retry |
| Max backoff | 5 seconds | Maximum wait time between retries |
| Backoff multiplier | 1.5 | Exponential backoff factor |
| Max attempts | 5 | Total retry attempts before dropping |
| Export timeout | 10 seconds | Per-request timeout (configurable via OTEL_EXPORTER_OTLP_TIMEOUT) |
What Happens During Outages¶
- Short outages (< 30 seconds) — Logs are buffered and retried automatically
- Extended outages — After max retries, logs are dropped (logged locally if file appender is enabled)
- Graceful shutdown — On JVM exit, the SDK flushes pending logs before terminating
Tuning for Your Environment¶
For high-latency networks or unreliable connections:
# Increase request timeout (milliseconds)
export OTEL_EXPORTER_OTLP_TIMEOUT=30000
# Enable compression to reduce bandwidth
export OTEL_EXPORTER_OTLP_COMPRESSION=gzip
For production environments with strict reliability requirements, consider deploying an OpenTelemetry Collector as a local buffer.