kumo

kumo means spider/cloud in Japanese. It is an async web crawling framework for Rust - Scrapy for Rust.

It gives you a trait-based, async-first API for writing spiders that scrape, follow links, and store structured data - with batteries included for production crawls.

Why kumo?

	kumo	Scrapy (Python)	Colly (Go)
Language	Rust	Python	Go
Type safety	Compile-time	Runtime	Partial
Async model	Tokio (true async)	Twisted (event loop)	goroutines
Memory safety	Guaranteed	GC	GC
CSS / XPath / Regex / JSONPath	Yes	Yes	CSS only
`#[derive(Extract)]` macro	Yes	No	No
LLM extraction (Claude / OpenAI / Gemini / Ollama)	Yes	No	No
Browser / JS rendering	Yes (chromiumoxide)	Yes (Playwright)	No
Stealth mode (TLS/HTTP2 fingerprint spoofing)	Yes	No	No
Distributed frontier (Redis)	Yes	Yes (scrapy-redis)	No
Item stream API	Yes	No	No
OpenTelemetry export	Yes	No	No
Pluggable stores (JSONL, CSV, Postgres, SQLite, MySQL)	Yes	Yes (pipelines)	No
Single binary deploy	Yes	No	Yes
Binary size / startup	Small / instant	Large / slow	Small / fast

Benchmark snapshot - 1,000 books, concurrency 16, median of 3 runs:

	kumo	Colly (Go)	Scrapy (Python)
Real site - Items/s	76.7	73.5	53.3
Local server - Items/s	12,346	4,098	180
Peak RSS	12.5 MB	31.4 MB	77.2 MB

On this local-server parsing workload, Kumo measured 3.0x faster than Colly, 69x faster than Scrapy. Treat these as workload-specific results, not universal production guarantees. Full methodology and reproduction steps in benchmark/.

Quick Install

[dependencies]
kumo = "0.2"
async-trait = "0.1"
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }

30-Second Example

use kumo::prelude::*;
use serde::Serialize;

#[derive(Debug, Serialize)]
struct Quote {
    text: String,
    author: String,
}

struct QuotesSpider;

#[async_trait::async_trait]
impl Spider for QuotesSpider {
    type Item = Quote;

    fn name(&self) -> &str { "quotes" }

    fn start_urls(&self) -> Vec<String> {
        vec!["https://quotes.toscrape.com".into()]
    }

    async fn parse(&self, res: &Response) -> Result<Output<Self::Item>, KumoError> {
        let quotes: Vec<Quote> = res.css(".quote").iter().map(|el| Quote {
            text:   el.css(".text").first().map(|e| e.text()).unwrap_or_default(),
            author: el.css(".author").first().map(|e| e.text()).unwrap_or_default(),
        }).collect();

        let next = res.css("li.next a").first()
            .and_then(|el| el.attr("href"))
            .map(|href| res.urljoin(&href));

        let mut output = Output::new().items(quotes);
        if let Some(url) = next { output = output.follow(url); }
        Ok(output)
    }
}

#[tokio::main]
async fn main() -> Result<(), KumoError> {
    CrawlEngine::builder()
        .concurrency(5)
        .middleware(DefaultHeaders::new().user_agent("kumo/0.2"))
        .store(JsonlStore::new("quotes.jsonl")?)
        .run(QuotesSpider)
        .await?;
    Ok(())
}

Get started -> Feature flags ->