Skip to content

Examples

All examples live in the examples/ folder. Run any of them with cargo run --example <name>.

Basic Spiders

quotes.rs — minimal spider

Scrapes all quotes from quotes.toscrape.com, following pagination. The simplest possible kumo spider — CSS selectors and JsonlStore.

cargo run --example quotes

books.rs — rate limiting + retry

Scrapes all 1000 books from books.toscrape.com across 50 pages. Demonstrates RateLimiter, exponential retry, allowed_domains, max_depth, and JsonStore.

cargo run --example books

books_derive.rs#[derive(Extract)]

Same as books.rs but uses #[derive(Extract)] with field annotations instead of manual CSS selectors.

cargo run --example books_derive --features derive

multi_spider.rs — multiple spiders

Runs two independent spiders (quotes + books) concurrently in a single engine using .add_spider() / .run_all().

cargo run --example multi_spider

Selectors

selectors.rs — CSS, regex, JSONPath

Demonstrates CSS, regex, and JSONPath selectors against local HTML and JSON — no network required.

# CSS + regex
cargo run --example selectors

# CSS + regex + JSONPath
cargo run --example selectors --features jsonpath

xpath.rs — XPath selectors

Demonstrates XPath selectors on an HTML response using the xpath feature.

cargo run --example xpath --features xpath

Middleware

autothrottle.rs — adaptive throttling

Shows AutoThrottle middleware adapting request delay based on server latency and 429/503 responses.

cargo run --example autothrottle

proxy_rotation.rs — proxy rotation

Demonstrates ProxyRotator middleware cycling through a list of proxy URLs.

cargo run --example proxy_rotation

Stores

sqlite.rs — SQLite store

Stores scraped items into a local SQLite file.

cargo run --example sqlite --features sqlite

postgres.rs — PostgreSQL store

Stores scraped items into PostgreSQL. Requires a running Postgres instance.

cargo run --example postgres --features postgres

LLM Extraction

llm_extract.rs — LLM extraction

Scrapes quotes.toscrape.com without any CSS selectors — the LLM reads the HTML and fills in the struct automatically.

ANTHROPIC_API_KEY=sk-ant-... cargo run --example llm_extract --features claude

Swap the feature flag and client to use a different provider:

Provider Flag Client
Anthropic Claude claude AnthropicClient
OpenAI openai OpenAiClient
Google Gemini gemini GeminiClient
Ollama (local) ollama OllamaClient

llm_fallback.rs — CSS + LLM fallback

Uses #[extract(llm_fallback = "hint")] — tries CSS first and falls back to the LLM only when the selector returns nothing.

ANTHROPIC_API_KEY=sk-ant-... cargo run --example llm_fallback --features claude,derive

Advanced

http_cache.rs — HTTP response cache

Demonstrates disk-backed response caching. Run once to populate the cache, run again to see instant responses from disk.

cargo run --example http_cache

Demonstrates LinkExtractor with allow_domains, allow, deny, restrict_css, and canonicalize.

cargo run --example link_extractor

browser.rs — headless browser

Fetches a JS-rendered page using headless Chromium. Requires the browser feature.

cargo run --example browser --features browser

stealth.rs — stealth mode

Sends requests with a Chrome 131 TLS fingerprint using the stealth feature. Requires cmake and nasm.

cargo run --example stealth --features stealth