Question 1

Why doesn't it return every URL I know exists on the site?

Accepted Answer

The Wayback Machine only captures URLs it discovers via crawling. It favors linked, publicly reachable content. Deep endpoints behind authentication, bot-protected paths, and recently-created URLs may not be archived. Conversely, historic URLs that are gone from the live site often appear — that's the entire point of the tool.

Question 2

What are 'unique parameters' useful for?

Accepted Answer

A list like ['id', 'page', 'debug', 'token', 'uid'] is the input for parameter-fuzzing tools (Arjun, Param Miner, ffuf). Historical params are MUCH more likely to be honoured by the current code than random wordlist guesses — the developer wrote them at some point, so the handler probably still exists.

Question 3

Why is the request slow?

Accepted Answer

The CDX API is read-heavy and free. A popular domain can have hundreds of thousands of captures. The tool sets a 5 000 record limit and a 15 s timeout; for really big domains you may want to narrow the query by specifying a subdomain or subpath directly in the CDX endpoint — ask in our Discord for help with advanced queries.

Question 4

Does the tool respect robots.txt?

Accepted Answer

The Wayback Machine respects robots.txt retroactively — if a site's current robots.txt excludes the crawler, old captures stop being served. So the result set will vary over time based on the target's crawler policy.

Wayback URL Miner

How to Use the Wayback URL Miner

About the Wayback Machine's CDX API

Frequently Asked Questions

Related Tools