Wayback URL Miner
Pull every URL the Wayback Machine has ever archived for a domain. Plus the full set of unique query parameters, sorted by frequency — ready to feed into Arjun or Param Miner.
Source: web.archive.org CDX API. Passive — no contact with the target domain.
How to Use the Wayback URL Miner
- Enter the root domain (e.g., example.com) — the tool queries *.example.com/*, so subdomains are included automatically.
- Click "Mine Historic URLs". The CDX API can take 10-15 seconds on large domains.
- Toggle between the URLs view (every historic URL + its capture timestamp) and the Params view (unique query parameters sorted by frequency).
- Use the filter box to narrow the URL list to a specific path (e.g., '/admin', '.php?').
- Click any URL to view it in the Wayback Machine as it looked at capture time.
- Use Copy / Download to pipe the list into gau, ffuf, or nuclei.
About the Wayback Machine's CDX API
The Internet Archive's Wayback Machine has been crawling the public web since 1996 and stores over 800 billion captures. The CDX (Capture Index) API exposes the full catalogue of URLs it has seen, which makes it the single best source for passive URL enumeration on any public domain. Every archived capture includes the full URL, the capture timestamp, the HTTP status, and the content MIME type. For reconnaissance this is gold. Old versions of a site often contain paths that no longer publicly link from the current site: /admin.php, /debug.html, /api/v1/ (replaced by v2), forgotten AJAX endpoints, exposed .git directories that were later locked down. Those URLs may still work if the underlying infrastructure was never removed. The tool also extracts unique query parameters across every historical URL and ranks them by frequency — a list of parameter names is the exact input format for Arjun, Param Miner, and ffuf's -w wordlist flag. Parameters that appear in dozens of historical URLs are far more likely to still be honoured by the current backend than random guesses from a generic wordlist. This tool is strictly passive. Every request goes to archive.org, not to the target. Nothing you do here will appear in the target's access logs, rate limits, or WAF telemetry. That makes it appropriate for early-stage recon on sensitive targets where you want to map attack surface before making yourself visible.
Frequently Asked Questions
The Wayback Machine only captures URLs it discovers via crawling. It favors linked, publicly reachable content. Deep endpoints behind authentication, bot-protected paths, and recently-created URLs may not be archived. Conversely, historic URLs that are gone from the live site often appear — that's the entire point of the tool.
A list like ['id', 'page', 'debug', 'token', 'uid'] is the input for parameter-fuzzing tools (Arjun, Param Miner, ffuf). Historical params are MUCH more likely to be honoured by the current code than random wordlist guesses — the developer wrote them at some point, so the handler probably still exists.
The CDX API is read-heavy and free. A popular domain can have hundreds of thousands of captures. The tool sets a 5 000 record limit and a 15 s timeout; for really big domains you may want to narrow the query by specifying a subdomain or subpath directly in the CDX endpoint — ask in our Discord for help with advanced queries.
The Wayback Machine respects robots.txt retroactively — if a site's current robots.txt excludes the crawler, old captures stop being served. So the result set will vary over time based on the target's crawler policy.