Skip to content

Observability

Starting with version 3.5.0, BotCity Runner offers observability features to monitor, troubleshoot, and gain insights into automation performance.

Overview

BotCity Runner provides comprehensive observability through:

  • Structured Logging — TEXT or JSON format with rich contextual metadata
  • OpenTelemetry Integration — Export logs to any OTLP-compatible backend
  • MDC Context Propagation — Consistent correlation across log entries

Configuration Reference

Environment Variables

Variable Default Description
BOTCITY_RUNNER_LOG_DIR ./logs Directory for log files
BOTCITY_RUNNER_LOG_FORMAT TEXT Log format: TEXT or JSON
BOTCITY_RUNNER_LOG_LEVEL INFO Log level for dev.botcity.runner package
BOTCITY_RUNNER_LOG_CONSOLE false Enable console output
BOTCITY_RUNNER_OTEL_ENABLED false Enable OpenTelemetry export

OpenTelemetry Variables

When BOTCITY_RUNNER_OTEL_ENABLED=true, the Runner respects standard OTEL environment variables:

Variable Default Description
OTEL_EXPORTER_OTLP_ENDPOINT http://localhost:4317 OTLP endpoint URL
OTEL_EXPORTER_OTLP_HEADERS Auth headers (key=value,key2=value2)
OTEL_EXPORTER_OTLP_PROTOCOL grpc Protocol: grpc or http/protobuf
OTEL_EXPORTER_OTLP_COMPRESSION Compression: gzip or none
OTEL_EXPORTER_OTLP_TIMEOUT Request timeout in milliseconds
OTEL_SERVICE_NAME botcity-runner Service name for identification
OTEL_RESOURCE_ATTRIBUTES Additional attributes (key=value,key2=value2)

Log Context Fields

Every log entry includes contextual metadata via MDC (Mapped Diagnostic Context):

Field Description
organization Organization identifier
runnerLabel Runner instance label
runnerVersion Runner software version
automationLabel Automation label
technology Bot technology (Python, Java, etc.)
botId Bot Label/Identifier
botVersion Bot version
taskId Current task identifier

These fields enable filtering and correlation across distributed systems.

Log Formats

TEXT Format (Default)

Human-readable format suitable for local development and traditional log management:

2026-04-21 13:00:00.123 INFO  [org=acme] [runner=prod-01] [runnerVersion=3.5.0] [automation=invoice-bot] [technology=python] [botId=inv-001] [botVersion=1.2.0] [taskId=12345] Starting task execution

JSON Format

Structured JSON format compatible with log aggregation platforms (ELK, Splunk, Datadog, etc.):

{
  "@timestamp": "2026-04-21T13:00:00.123Z",
  "level": "INFO",
  "message": "Starting task execution",
  "organization": "acme",
  "runnerLabel": "prod-01",
  "runnerVersion": "3.5.0",
  "automationLabel": "invoice-bot",
  "technology": "python",
  "botId": "inv-001",
  "botVersion": "1.2.0",
  "taskId": "12345",
  "application": "botcity-runner",
  "service.name": "botcity-runner"
}

Enable with:

export BOTCITY_RUNNER_LOG_FORMAT=JSON

Observability Platform Integration

BotCity Runner supports direct integration with observability platforms via OpenTelemetry. No intermediary collector is required — the Runner sends logs directly to these services.

Dynatrace

Dynatrace provides full-stack observability with AI-powered analytics.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://{environment-id}.live.dynatrace.com/api/v2/otlp
export OTEL_EXPORTER_OTLP_HEADERS=Authorization=Api-Token dt0c01.xxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

  1. In Dynatrace, go to SettingsIntegrationAPI tokens
  2. Create a token with logs.ingest scope
  3. Replace {environment-id} with your Dynatrace environment ID
  4. Replace dt0c01.xxxxxxxxxxxx with your API token

Viewing logs:

  • Navigate to Observe and exploreLogs
  • Filter by service.name = "botcity-runner"
  • Use DQL to query: fetch logs | filter service.name == "botcity-runner"

New Relic

New Relic offers unified observability with powerful querying via NRQL.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS=api-key=NRAK-xxxxxxxxxxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

  1. In New Relic, go to API keys (user menu → API keys)
  2. Create or copy your License key (starts with NRAK-)
  3. For EU datacenter, use https://otlp.eu01.nr-data.net:4318

Viewing logs:

  • Navigate to Logs
  • Filter by service.name = "botcity-runner"
  • Use NRQL: SELECT * FROM Log WHERE service.name = 'botcity-runner'

Useful NRQL queries:

-- Error rate by automation
SELECT count(*) FROM Log 
WHERE service.name = 'botcity-runner' AND level = 'ERROR'
FACET automationLabel TIMESERIES

-- Task execution timeline
SELECT * FROM Log 
WHERE service.name = 'botcity-runner' AND taskId = '12345'
ORDER BY timestamp

Datadog

Datadog provides comprehensive monitoring with log analytics and APM correlation.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://http-intake.logs.datadoghq.com:443
export OTEL_EXPORTER_OTLP_HEADERS=DD-API-KEY=xxxxxxxxxxxxxxxxxxxx
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

  1. In Datadog, go to Organization SettingsAPI Keys
  2. Create or copy an API key
  3. For EU site, use https://http-intake.logs.datadoghq.eu:443

Viewing logs:

  • Navigate to LogsSearch
  • Filter by service:botcity-runner
  • Create facets for @taskId, @botId, @automationLabel for efficient filtering

Useful queries:

service:botcity-runner status:error
service:botcity-runner @taskId:12345
service:botcity-runner @automationLabel:invoice-bot

Grafana Cloud

Grafana Cloud provides open-source-based observability with Loki for logs.

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instanceId:apiKey)>
export OTEL_SERVICE_NAME=botcity-runner

Setup steps:

  1. In Grafana Cloud, go to your stack → ConnectionsOpenTelemetry
  2. Note your OTLP endpoint URL (varies by region)
  3. Create an API token with logs:write scope
  4. Base64 encode instanceId:apiKey for the Authorization header:
    echo -n "123456:glc_xxxxxxxxxxxx" | base64
    

Viewing logs:

  • Navigate to Explore → Select Loki data source
  • Use LogQL: {service_name="botcity-runner"}

Useful LogQL queries:

# Filter by task
{service_name="botcity-runner"} | json | taskId="12345"

# Error logs with context
{service_name="botcity-runner"} | json | level="ERROR"

# Search by automation
{service_name="botcity-runner"} | json | automationLabel="invoice-bot"

Using an OpenTelemetry Collector

For advanced scenarios (buffering, transformation, multi-destination routing), deploy an OpenTelemetry Collector as an intermediary.

When to Use a Collector

  • Multi-destination routing — Send logs to multiple backends simultaneously
  • Data transformation — Enrich, filter, or redact log data
  • Buffering — Handle network interruptions gracefully
  • Credential management — Keep API keys on the collector, not the Runner

Basic Collector Setup

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  # Example: Datadog
  datadog:
    api:
      key: ${DD_API_KEY}
      site: datadoghq.com

  # Example: New Relic
  otlp/newrelic:
    endpoint: https://otlp.nr-data.net:4317
    headers:
      api-key: ${NEW_RELIC_LICENSE_KEY}

  # Example: Dynatrace
  otlphttp/dynatrace:
    endpoint: https://{environment-id}.live.dynatrace.com/api/v2/otlp
    headers:
      Authorization: "Api-Token ${DT_API_TOKEN}"

service:
  pipelines:
    logs:
      receivers: [ otlp ]
      processors: [ batch ]
      exporters: [ datadog, otlp/newrelic, otlphttp/dynatrace ]

Configure the Runner to send to the collector:

export BOTCITY_RUNNER_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector-host:4317
export OTEL_SERVICE_NAME=botcity-runner

Best Practices

1. Use Consistent Service Naming

Set OTEL_SERVICE_NAME consistently across all Runner instances:

export OTEL_SERVICE_NAME=botcity-runner

Add environment context via resource attributes:

export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production,service.version=3.5.0"

2. Enable JSON Format for Log Aggregation

When using log aggregation platforms without OTEL, enable JSON format:

export BOTCITY_RUNNER_LOG_FORMAT=JSON

This enables structured parsing and field extraction.

3. Log Retention Strategy

Consider your retention needs:

  • Local files — 7 days, 1GB cap (default)
  • OTEL backend — Configure based on compliance requirements

Troubleshooting

Logs Not Appearing in Backend

  1. Verify OTEL is enabled:

    echo $BOTCITY_RUNNER_OTEL_ENABLED  # Should be "true"
    
  2. Check endpoint connectivity:

    curl -v $OTEL_EXPORTER_OTLP_ENDPOINT
    
  3. Verify authentication headers are correct

  4. Check Runner startup logs for OTEL initialization errors

Missing Context Fields

Ensure the Runner is properly setting MDC context. Check that:

  • Task execution is going through the standard execution path
  • No exceptions are occurring before context is set

High Latency or Dropped Logs

If using direct integration:

  • Consider deploying a collector for buffering
  • Check network latency to the backend
  • Verify backend rate limits aren't being hit

Retry and Resilience Behavior

The OpenTelemetry SDK includes built-in retry logic for transient failures when exporting logs:

Default Behavior

Setting Default Value Description
Retry enabled Yes Automatic retry on transient failures
Initial backoff 1 second Wait time before first retry
Max backoff 5 seconds Maximum wait time between retries
Backoff multiplier 1.5 Exponential backoff factor
Max attempts 5 Total retry attempts before dropping
Export timeout 10 seconds Per-request timeout (configurable via OTEL_EXPORTER_OTLP_TIMEOUT)

What Happens During Outages

  1. Short outages (< 30 seconds) — Logs are buffered and retried automatically
  2. Extended outages — After max retries, logs are dropped (logged locally if file appender is enabled)
  3. Graceful shutdown — On JVM exit, the SDK flushes pending logs before terminating

Tuning for Your Environment

For high-latency networks or unreliable connections:

# Increase request timeout (milliseconds)
export OTEL_EXPORTER_OTLP_TIMEOUT=30000

# Enable compression to reduce bandwidth
export OTEL_EXPORTER_OTLP_COMPRESSION=gzip

For production environments with strict reliability requirements, consider deploying an OpenTelemetry Collector as a local buffer.

References