Logging

Kumo emits structured tracing events. Applications choose how to collect and format those events by installing a tracing subscriber. Kumo does not install a subscriber for normal library use.

Most examples use tracing_subscriber::fmt():

tracing_subscriber::fmt()
    .with_env_filter(
        std::env::var("RUST_LOG")
            .unwrap_or_else(|_| "kumo::crawl=info,kumo::request=info".into()),
    )
    .init();

Recommended Filters

For normal production runs:

RUST_LOG=kumo::crawl=info,kumo::request=info

For debugging scheduling, cache, pipeline, or item-drop behavior:

RUST_LOG=kumo=debug

For quiet application logs with only final crawl summaries:

RUST_LOG=kumo::crawl=info,kumo::request=warn

Event Targets

Kumo uses stable tracing targets for important runtime areas:

Target	Events
`kumo::crawl`	Crawl start, periodic metrics, interruption, abort, completion
`kumo::request`	Request retries, skips, robots-blocked requests, rate-limit waits
`kumo::item`	Item drops and pipeline drop errors
`kumo::cache`	HTTP cache hits, misses, bypasses, and skipped cache writes

Every important runtime event also includes an event field matching the log message. Common event names include crawl.start, crawl.metrics, crawl.complete, crawl.stream_error, request.fetch, request.ok, request.retry, request.retry_exhausted, request.skip, request.robots_blocked, request.rate_limit, request.autothrottle, request.proxy_ignored, item.drop, cache.hit, and cache.miss.

Common Fields

Kumo keeps high-volume crawl logs machine-readable by using predictable field names:

Field	Meaning
`event`	Stable event name, such as `request.retry`
`spider`	Spider name returned by `Spider::name()`
`spider_index`	Index for `run_all()` multi-spider crawls
`url`	Request or response URL
`domain`	Normalized domain key used by crawl stats
`depth`	Crawl depth for the request
`attempt`	Current retry attempt count for request lifecycle events
`max_attempts`	Retry ceiling for retry-related events
`retry_in_ms`	Delay before a scheduled retry
`error_kind`	Stable Kumo error category
`stop_reason`	Final crawl stop reason

JSON Logs

Use JSON logs when sending crawl output to systems such as Datadog, Loki, CloudWatch, or Vector:

tracing_subscriber::fmt()
    .json()
    .with_env_filter(
        std::env::var("RUST_LOG")
            .unwrap_or_else(|_| "kumo::crawl=info,kumo::request=info".into()),
    )
    .init();

Enable the json feature on tracing-subscriber in your application:

tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }

For OpenTelemetry export, enable Kumo's otel feature and see OpenTelemetry.

Library Boundary

Kumo logs with tracing but does not own the logging backend. This keeps the framework composable inside CLIs, services, cron jobs, and larger applications. If you need programmatic lifecycle hooks instead of logs, use the current CrawlStats and CrawlReport APIs; a typed event/signal system is planned as separate future work.