Logging
Kumo emits structured tracing events. Applications choose how to collect and format those events by installing a tracing subscriber. Kumo does not install a subscriber for normal library use.
Most examples use tracing_subscriber::fmt():
tracing_subscriber::fmt()
.with_env_filter(
std::env::var("RUST_LOG")
.unwrap_or_else(|_| "kumo::crawl=info,kumo::request=info".into()),
)
.init();
Recommended Filters
For normal production runs:
For debugging scheduling, cache, pipeline, or item-drop behavior:
For quiet application logs with only final crawl summaries:
Event Targets
Kumo uses stable tracing targets for important runtime areas:
| Target | Events |
|---|---|
kumo::crawl | Crawl start, periodic metrics, interruption, abort, completion |
kumo::request | Request retries, skips, robots-blocked requests, rate-limit waits |
kumo::item | Item drops and pipeline drop errors |
kumo::cache | HTTP cache hits, misses, bypasses, and skipped cache writes |
Every important runtime event also includes an event field matching the log message. Common event names include crawl.start, crawl.metrics, crawl.complete, crawl.stream_error, request.fetch, request.ok, request.retry, request.retry_exhausted, request.skip, request.robots_blocked, request.rate_limit, request.autothrottle, request.proxy_ignored, item.drop, cache.hit, and cache.miss.
Common Fields
Kumo keeps high-volume crawl logs machine-readable by using predictable field names:
| Field | Meaning |
|---|---|
event | Stable event name, such as request.retry |
spider | Spider name returned by Spider::name() |
spider_index | Index for run_all() multi-spider crawls |
url | Request or response URL |
domain | Normalized domain key used by crawl stats |
depth | Crawl depth for the request |
attempt | Current retry attempt count for request lifecycle events |
max_attempts | Retry ceiling for retry-related events |
retry_in_ms | Delay before a scheduled retry |
error_kind | Stable Kumo error category |
stop_reason | Final crawl stop reason |
JSON Logs
Use JSON logs when sending crawl output to systems such as Datadog, Loki, CloudWatch, or Vector:
tracing_subscriber::fmt()
.json()
.with_env_filter(
std::env::var("RUST_LOG")
.unwrap_or_else(|_| "kumo::crawl=info,kumo::request=info".into()),
)
.init();
Enable the json feature on tracing-subscriber in your application:
For OpenTelemetry export, enable Kumo's otel feature and see OpenTelemetry.
Library Boundary
Kumo logs with tracing but does not own the logging backend. This keeps the framework composable inside CLIs, services, cron jobs, and larger applications. If you need programmatic lifecycle hooks instead of logs, use the current CrawlStats and CrawlReport APIs; a typed event/signal system is planned as separate future work.