Skip to content

Logging

Kumo emits structured tracing events. Applications choose how to collect and format those events by installing a tracing subscriber. Kumo does not install a subscriber for normal library use.

Most examples use tracing_subscriber::fmt():

tracing_subscriber::fmt()
    .with_env_filter(
        std::env::var("RUST_LOG")
            .unwrap_or_else(|_| "kumo::crawl=info,kumo::request=info".into()),
    )
    .init();

For normal production runs:

RUST_LOG=kumo::crawl=info,kumo::request=info

For debugging scheduling, cache, pipeline, or item-drop behavior:

RUST_LOG=kumo=debug

For quiet application logs with only final crawl summaries:

RUST_LOG=kumo::crawl=info,kumo::request=warn

Event Targets

Kumo uses stable tracing targets for important runtime areas:

Target Events
kumo::crawl Crawl start, periodic metrics, interruption, abort, completion
kumo::request Request retries, skips, robots-blocked requests, rate-limit waits
kumo::item Item drops and pipeline drop errors
kumo::cache HTTP cache hits, misses, bypasses, and skipped cache writes

Every important runtime event also includes an event field matching the log message. Common event names include crawl.start, crawl.metrics, crawl.complete, crawl.stream_error, request.fetch, request.ok, request.retry, request.retry_exhausted, request.skip, request.robots_blocked, request.rate_limit, request.autothrottle, request.proxy_ignored, item.drop, cache.hit, and cache.miss.

Common Fields

Kumo keeps high-volume crawl logs machine-readable by using predictable field names:

Field Meaning
event Stable event name, such as request.retry
spider Spider name returned by Spider::name()
spider_index Index for run_all() multi-spider crawls
url Request or response URL
domain Normalized domain key used by crawl stats
depth Crawl depth for the request
attempt Current retry attempt count for request lifecycle events
max_attempts Retry ceiling for retry-related events
retry_in_ms Delay before a scheduled retry
error_kind Stable Kumo error category
stop_reason Final crawl stop reason

JSON Logs

Use JSON logs when sending crawl output to systems such as Datadog, Loki, CloudWatch, or Vector:

tracing_subscriber::fmt()
    .json()
    .with_env_filter(
        std::env::var("RUST_LOG")
            .unwrap_or_else(|_| "kumo::crawl=info,kumo::request=info".into()),
    )
    .init();

Enable the json feature on tracing-subscriber in your application:

tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }

For OpenTelemetry export, enable Kumo's otel feature and see OpenTelemetry.

Library Boundary

Kumo logs with tracing but does not own the logging backend. This keeps the framework composable inside CLIs, services, cron jobs, and larger applications. If you need programmatic lifecycle hooks instead of logs, use the current CrawlStats and CrawlReport APIs; a typed event/signal system is planned as separate future work.