PDXRust Meetup: Spidering Wikipedia Politely In Async Rust

How many pages are reachable from Wikipedia's page on the Rust programming language in two hops? Around 30,000, it turns out, including pages on wheat flour, Welsh orthography, and the zombie apocalypse.

As it turns out, it's super easy to do this exploration using asynchronous Rust code. Wikipedia offers a cute little REST API for querying links, and it's easy to use Serde to generate requests and parse replies. And if you're feeling guilty about flooding a precious public resource with silly API requests, it's also super easy to do rate limiting.

Jim Blandy will show how to wire up Tokio, Reqwest, and Serde to do the spidering, and whip up a mock server for testing using Warp. The techniques shown work nicely for all kinds of REST API scripting, including, say, GitHub.

Tags: rust

Imported from: http://calagator.org/events/1250481763

February 13, 2025 06:30 PM

-

February 13, 2025 08:30 PM
Portland State University Fourth Avenue Building (FAB) Room FAB 86-01: 1900 Southwest 4th Avenue, Portland OR 97201 US

Location