Examples
All examples live in the examples/ folder. Run any of them with cargo run --example <name>.
Basic Spiders
quotes.rs — minimal spider
Scrapes all quotes from quotes.toscrape.com, following pagination. The simplest possible kumo spider — CSS selectors and JsonlStore.
books.rs — rate limiting + retry
Scrapes all 1000 books from books.toscrape.com across 50 pages. Demonstrates RateLimiter, exponential retry, allowed_domains, max_depth, and JsonStore.
books_derive.rs — #[derive(Extract)]
Same as books.rs but uses #[derive(Extract)] with field annotations instead of manual CSS selectors.
multi_spider.rs — multiple spiders
Runs two independent spiders (quotes + books) concurrently in a single engine using .add_spider() / .run_all().
Selectors
selectors.rs — CSS, regex, JSONPath
Demonstrates CSS, regex, and JSONPath selectors against local HTML and JSON — no network required.
# CSS + regex
cargo run --example selectors
# CSS + regex + JSONPath
cargo run --example selectors --features jsonpath
xpath.rs — XPath selectors
Demonstrates XPath selectors on an HTML response using the xpath feature.
Middleware
autothrottle.rs — adaptive throttling
Shows AutoThrottle middleware adapting request delay based on server latency and 429/503 responses.
proxy_rotation.rs — proxy rotation
Demonstrates ProxyRotator middleware cycling through a list of proxy URLs.
Stores
sqlite.rs — SQLite store
Stores scraped items into a local SQLite file.
postgres.rs — PostgreSQL store
Stores scraped items into PostgreSQL. Requires a running Postgres instance.
LLM Extraction
llm_extract.rs — LLM extraction
Scrapes quotes.toscrape.com without any CSS selectors — the LLM reads the HTML and fills in the struct automatically.
Swap the feature flag and client to use a different provider:
| Provider | Flag | Client |
|---|---|---|
| Anthropic Claude | claude | AnthropicClient |
| OpenAI | openai | OpenAiClient |
| Google Gemini | gemini | GeminiClient |
| Ollama (local) | ollama | OllamaClient |
llm_fallback.rs — CSS + LLM fallback
Uses #[extract(llm_fallback = "hint")] — tries CSS first and falls back to the LLM only when the selector returns nothing.
Advanced
http_cache.rs — HTTP response cache
Demonstrates disk-backed response caching. Run once to populate the cache, run again to see instant responses from disk.
link_extractor.rs — link extraction with filtering
Demonstrates LinkExtractor with allow_domains, allow, deny, restrict_css, and canonicalize.
browser.rs — headless browser
Fetches a JS-rendered page using headless Chromium. Requires the browser feature.
stealth.rs — stealth mode
Sends requests with a Chrome 131 TLS fingerprint using the stealth feature. Requires cmake and nasm.