Skip to content

Examples

All examples live in the examples/ folder. Run any of them with cargo run --example <name>.

Basic Spiders

quotes.rs - minimal spider

Scrapes all quotes from quotes.toscrape.com, following pagination. The simplest possible kumo spider - CSS selectors and JsonlStore.

cargo run --example quotes

books.rs - rate limiting + retry

Scrapes all 1000 books from books.toscrape.com across 50 pages. Demonstrates RateLimiter, exponential retry, allowed_domains, max_depth, and JsonStore.

cargo run --example books

books_derive.rs - #[derive(Extract)]

Same as books.rs but uses #[derive(Extract)] with field annotations instead of manual CSS selectors.

cargo run --example books_derive --features derive

multi_spider.rs - multiple spiders

Runs two independent spiders (quotes + books) concurrently in a single engine using .add_spider() / .run_all().

cargo run --example multi_spider

Selectors

selectors.rs - CSS, regex, JSONPath

Demonstrates CSS, regex, and JSONPath selectors against local HTML and JSON - no network required.

# CSS + regex
cargo run --example selectors

# CSS + regex + JSONPath
cargo run --example selectors --features jsonpath

xpath.rs - XPath selectors

Demonstrates XPath selectors on an HTML response using the xpath feature.

cargo run --example xpath --features xpath

Middleware

autothrottle.rs - adaptive throttling

Shows AutoThrottle middleware adapting request delay based on server latency and 429/503 responses.

cargo run --example autothrottle

proxy_rotation.rs - proxy rotation

Demonstrates ProxyRotator middleware cycling through a list of proxy URLs.

cargo run --example proxy_rotation

polite_crawling.rs - polite crawl scheduling

Shows PolitenessPolicy, per-domain concurrency, per-domain delay, request priority, metadata, fingerprint-based deduplication, and crawl stats.

cargo run --example polite_crawling

Stores

sqlite.rs - SQLite store

Stores scraped items into a local SQLite file.

cargo run --example sqlite --features sqlite

postgres.rs - PostgreSQL store

Stores scraped items into PostgreSQL. Requires a running Postgres instance.

cargo run --example postgres --features postgres

cloud.rs - Cloud storage (S3 / GCS / Azure / local)

Stores scraped items as JSONL via the backend-agnostic CloudStore. The example uses LocalFileSystem - no cloud credentials needed. Swap the backend for AmazonS3, GoogleCloudStorage, or MicrosoftAzure with no other code changes.

cargo run --example cloud --features cloud

LLM Extraction

llm_extract.rs - LLM extraction

Scrapes quotes.toscrape.com without any CSS selectors - the LLM reads the HTML and fills in the struct automatically.

ANTHROPIC_API_KEY=sk-ant-... cargo run --example llm_extract --features claude

Swap the feature flag and client to use a different provider:

Provider Flag Client
Anthropic Claude claude AnthropicClient
OpenAI openai OpenAiClient
Google Gemini gemini GeminiClient
Ollama (local) ollama OllamaClient

llm_fallback.rs - CSS + LLM fallback

Uses #[extract(llm_fallback = "hint")] - tries CSS first and falls back to the LLM only when the selector returns nothing.

ANTHROPIC_API_KEY=sk-ant-... cargo run --example llm_fallback --features claude,derive

Advanced

production_crawler.rs - production crawl controls

Combines the production defaults most crawlers need: robots.txt, per-domain concurrency, per-domain delay, jitter, Retry-After aware retries, StatusRetry, persistent FileFrontier recovery state, metrics, and JSONL storage.

cargo run --example production_crawler --features persistence

crawl_events.rs - typed lifecycle events

Subscribes to typed crawl lifecycle events with .event_channel() and prints request completion plus final crawl totals. Uses MockFetcher, so it runs without network access.

cargo run --example crawl_events

crawl_hooks.rs - crawl lifecycle hooks

Registers an async CrawlHook that counts completed requests and scraped items. Uses MockFetcher, so it runs without network access.

cargo run --example crawl_hooks

http_cache.rs - HTTP response cache

Demonstrates disk-backed response caching. Run once to populate the cache, run again to see instant responses from disk.

cargo run --example http_cache

Demonstrates LinkExtractor with allow_domains, allow, deny, restrict_css, and canonicalize.

cargo run --example link_extractor

request_scheduling.rs - request scheduling

Demonstrates CrawlRequest with custom method/body, headers, priority, and metadata.

cargo run --example request_scheduling

browser.rs - headless browser

Fetches a JS-rendered page using headless Chromium. Requires the browser feature.

cargo run --example browser --features browser

stealth.rs - stealth mode

Sends requests with a Chrome 131 TLS fingerprint using the stealth feature. Requires cmake and nasm.

cargo run --example stealth --features stealth