1126Downloading a Website from archive.org
Sometimes it is necessary to download an archival copy of a complete website, luckily https://web.archive.org crawls the interwebs regularly. Unluckily there is no direct way to download a whole website directly.
One tool that facilitates this is the aptley named Wayback Machine Downloader:
Usage
wayback_machine_downloader https://example.com
I have been running into some issues with the original version:
Downloading https://www.example.com to websites/www.example.com/
from Wayback Machine archives.
Getting snapshot pages../Users/abc/.rbenv/versions/3.1.1/lib/
ruby/3.1.0/open-uri.rb:364: in `open_http': 400 BAD REQUEST
(OpenURI::HTTPError)
...
This fork is working:
wayback_machine_downloader https://example.com
Downloading https://example.com to websites/example.com/
from Wayback Machine archives.
Getting snapshot pages from Wayback Machine API...
.. found 6350 snapshots.
Saved snapshot list to websites/example.com/.cdx.json
...
If you want to download only to a certain timestamp, use the --to
switch:
wayback_machine_downloader --to 20241231235959 https://example.com