Observability¶

Starting with version 3.5.0, BotCity Runner offers observability features to monitor, troubleshoot, and gain insights into automation performance.

Overview¶

BotCity Runner provides comprehensive observability through:

Structured Logging — TEXT or JSON format with rich contextual metadata
OpenTelemetry Integration — Export logs to any OTLP-compatible backend
MDC Context Propagation — Consistent correlation across log entries

Configuration Reference¶

Environment Variables¶

Variable	Default	Description
`BOTCITY_RUNNER_LOG_DIR`	`./logs`	Directory for log files
`BOTCITY_RUNNER_LOG_FORMAT`	`TEXT`	Log format: `TEXT` or `JSON`
`BOTCITY_RUNNER_LOG_LEVEL`	`INFO`	Log level for `dev.botcity.runner` package
`BOTCITY_RUNNER_LOG_CONSOLE`	`false`	Enable console output
`BOTCITY_RUNNER_OTEL_ENABLED`	`false`	Enable OpenTelemetry export

OpenTelemetry Variables¶

When BOTCITY_RUNNER_OTEL_ENABLED=true, the Runner respects standard OTEL environment variables:

Variable	Default	Description
`OTEL_EXPORTER_OTLP_ENDPOINT`	`http://localhost:4317`	OTLP endpoint URL
`OTEL_EXPORTER_OTLP_HEADERS`	—	Auth headers (`key=value,key2=value2`)
`OTEL_EXPORTER_OTLP_PROTOCOL`	`grpc`	Protocol: `grpc` or `http/protobuf`
`OTEL_EXPORTER_OTLP_COMPRESSION`	—	Compression: `gzip` or `none`
`OTEL_EXPORTER_OTLP_TIMEOUT`	—	Request timeout in milliseconds
`OTEL_SERVICE_NAME`	`botcity-runner`	Service name for identification
`OTEL_RESOURCE_ATTRIBUTES`	—	Additional attributes (`key=value,key2=value2`)

Log Context Fields¶

Every log entry includes contextual metadata via MDC (Mapped Diagnostic Context):

Field	Description
`organization`	Organization identifier
`runnerLabel`	Runner instance label
`runnerVersion`	Runner software version
`automationLabel`	Automation label
`technology`	Bot technology (Python, Java, etc.)
`botId`	Bot Label/Identifier
`botVersion`	Bot version
`taskId`	Current task identifier

These fields enable filtering and correlation across distributed systems.

Log Formats¶

TEXT Format (Default)¶

Human-readable format suitable for local development and traditional log management:

2026-04-21 13:00:00.123 INFO  [org=acme] [runner=prod-01] [runnerVersion=3.5.0] [automation=invoice-bot] [technology=python] [botId=inv-001] [botVersion=1.2.0] [taskId=12345] Starting task execution

JSON Format¶

Structured JSON format compatible with log aggregation platforms (ELK, Splunk, Datadog, etc.):

{
  "@timestamp": "2026-04-21T13:00:00.123Z",
  "level": "INFO",
  "message": "Starting task execution",
  "organization": "acme",
  "runnerLabel": "prod-01",
  "runnerVersion": "3.5.0",
  "automationLabel": "invoice-bot",
  "technology": "python",
  "botId": "inv-001",
  "botVersion": "1.2.0",
  "taskId": "12345",
  "application": "botcity-runner",
  "service.name": "botcity-runner"
}

Enable with:

export BOTCITY_RUNNER_LOG_FORMAT=JSON

Observability Platform Integration¶

BotCity Runner supports direct integration with observability platforms via OpenTelemetry. No intermediary collector is required — the Runner sends logs directly to these services.

Dynatrace¶

Dynatrace provides full-stack observability with AI-powered analytics.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://{environment-id}.live.dynatrace.com/api/v2/otlp
export OTEL_EXPORTER_OTLP_HEADERS=Authorization=Api-Token dt0c01.xxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

In Dynatrace, go to Settings → Integration → API tokens
Create a token with logs.ingest scope
Replace {environment-id} with your Dynatrace environment ID
Replace dt0c01.xxxxxxxxxxxx with your API token

Viewing logs:

Navigate to Observe and explore → Logs
Filter by service.name = "botcity-runner"
Use DQL to query: fetch logs | filter service.name == "botcity-runner"

New Relic¶

New Relic offers unified observability with powerful querying via NRQL.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS=api-key=NRAK-xxxxxxxxxxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

In New Relic, go to API keys (user menu → API keys)
Create or copy your License key (starts with NRAK-)
For EU datacenter, use https://otlp.eu01.nr-data.net:4318

Viewing logs:

Navigate to Logs
Filter by service.name = "botcity-runner"
Use NRQL: SELECT * FROM Log WHERE service.name = 'botcity-runner'

Useful NRQL queries:

-- Error rate by automation
SELECT count(*) FROM Log 
WHERE service.name = 'botcity-runner' AND level = 'ERROR'
FACET automationLabel TIMESERIES

-- Task execution timeline
SELECT * FROM Log 
WHERE service.name = 'botcity-runner' AND taskId = '12345'
ORDER BY timestamp

Datadog¶

Datadog provides comprehensive monitoring with log analytics and APM correlation.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://http-intake.logs.datadoghq.com:443
export OTEL_EXPORTER_OTLP_HEADERS=DD-API-KEY=xxxxxxxxxxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

In Datadog, go to Organization Settings → API Keys
Create or copy an API key
For EU site, use https://http-intake.logs.datadoghq.eu:443

Viewing logs:

Navigate to Logs → Search
Filter by service:botcity-runner
Create facets for @taskId, @botId, @automationLabel for efficient filtering

Useful queries:

service:botcity-runner status:error
service:botcity-runner @taskId:12345
service:botcity-runner @automationLabel:invoice-bot

Grafana Cloud¶

Grafana Cloud provides open-source-based observability with Loki for logs.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instanceId:apiKey)>
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

In Grafana Cloud, go to your stack → Connections → OpenTelemetry
Note your OTLP endpoint URL (varies by region)
Create an API token with logs:write scope
Base64 encode instanceId:apiKey for the Authorization header:
```
echo -n "123456:glc_xxxxxxxxxxxx" | base64
```

Viewing logs:

Navigate to Explore → Select Loki data source
Use LogQL: {service_name="botcity-runner"}

Useful LogQL queries:

# Filter by task
{service_name="botcity-runner"} | json | taskId="12345"

# Error logs with context
{service_name="botcity-runner"} | json | level="ERROR"

# Search by automation
{service_name="botcity-runner"} | json | automationLabel="invoice-bot"

Using an OpenTelemetry Collector¶

For advanced scenarios (buffering, transformation, multi-destination routing), deploy an OpenTelemetry Collector as an intermediary.

When to Use a Collector¶

Multi-destination routing — Send logs to multiple backends simultaneously
Data transformation — Enrich, filter, or redact log data
Buffering — Handle network interruptions gracefully
Credential management — Keep API keys on the collector, not the Runner

Basic Collector Setup¶

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  # Example: Datadog
  datadog:
    api:
      key: ${DD_API_KEY}
      site: datadoghq.com

  # Example: New Relic
  otlp/newrelic:
    endpoint: https://otlp.nr-data.net:4317
    headers:
      api-key: ${NEW_RELIC_LICENSE_KEY}

  # Example: Dynatrace
  otlphttp/dynatrace:
    endpoint: https://{environment-id}.live.dynatrace.com/api/v2/otlp
    headers:
      Authorization: "Api-Token ${DT_API_TOKEN}"

service:
  pipelines:
    logs:
      receivers: [ otlp ]
      processors: [ batch ]
      exporters: [ datadog, otlp/newrelic, otlphttp/dynatrace ]

Configure the Runner to send to the collector:

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector-host:4317
export OTEL_SERVICE_NAME=botcity-runner

Best Practices¶

1. Use Consistent Service Naming¶

Set OTEL_SERVICE_NAME consistently across all Runner instances:

export OTEL_SERVICE_NAME=botcity-runner

Add environment context via resource attributes:

export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production,service.version=3.5.0"

2. Enable JSON Format for Log Aggregation¶

When using log aggregation platforms without OTEL, enable JSON format:

export BOTCITY_RUNNER_LOG_FORMAT=JSON

This enables structured parsing and field extraction.

3. Log Retention Strategy¶

Consider your retention needs:

Local files — 7 days, 1GB cap (default)
OTEL backend — Configure based on compliance requirements

Troubleshooting¶

Logs Not Appearing in Backend¶

Verify OTEL is enabled:

echo $BOTCITY_RUNNER_OTEL_ENABLED  # Should be "true"

Check endpoint connectivity:
```
curl -v $OTEL_EXPORTER_OTLP_ENDPOINT
```
Verify authentication headers are correct
Check Runner startup logs for OTEL initialization errors

Missing Context Fields¶

Ensure the Runner is properly setting MDC context. Check that:

Task execution is going through the standard execution path
No exceptions are occurring before context is set

High Latency or Dropped Logs¶

If using direct integration:

Consider deploying a collector for buffering
Check network latency to the backend
Verify backend rate limits aren't being hit

Retry and Resilience Behavior¶

The OpenTelemetry SDK includes built-in retry logic for transient failures when exporting logs:

Default Behavior¶

Setting	Default Value	Description
Retry enabled	Yes	Automatic retry on transient failures
Initial backoff	1 second	Wait time before first retry
Max backoff	5 seconds	Maximum wait time between retries
Backoff multiplier	1.5	Exponential backoff factor
Max attempts	5	Total retry attempts before dropping
Export timeout	10 seconds	Per-request timeout (configurable via `OTEL_EXPORTER_OTLP_TIMEOUT`)

What Happens During Outages¶

Short outages (< 30 seconds) — Logs are buffered and retried automatically
Extended outages — After max retries, logs are dropped (logged locally if file appender is enabled)
Graceful shutdown — On JVM exit, the SDK flushes pending logs before terminating

Tuning for Your Environment¶

For high-latency networks or unreliable connections:

# Increase request timeout (milliseconds)
export OTEL_EXPORTER_OTLP_TIMEOUT=30000

# Enable compression to reduce bandwidth
export OTEL_EXPORTER_OTLP_COMPRESSION=gzip

For production environments with strict reliability requirements, consider deploying an OpenTelemetry Collector as a local buffer.

Observability¶

Overview¶

Configuration Reference¶

Environment Variables¶

OpenTelemetry Variables¶

Log Context Fields¶

Log Formats¶

TEXT Format (Default)¶

JSON Format¶

Observability Platform Integration¶

Dynatrace¶

New Relic¶

Datadog¶

Grafana Cloud¶

Using an OpenTelemetry Collector¶

When to Use a Collector¶

Basic Collector Setup¶

Best Practices¶

1. Use Consistent Service Naming¶

2. Enable JSON Format for Log Aggregation¶

3. Log Retention Strategy¶

Troubleshooting¶

Logs Not Appearing in Backend¶

Missing Context Fields¶

High Latency or Dropped Logs¶

Retry and Resilience Behavior¶

Default Behavior¶

What Happens During Outages¶

Tuning for Your Environment¶

References¶