Here's an example that I've used to get all the pages from Paul Graham's website:
$ wget --recursive --level=inf --no-remove-listing --wait=6 --random-wait --adjust-extension --no-clobber --continue -e robots=off --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36" --domains=paulgraham.com https://paulgraham.com
Parameter | Description |
---|---|
--recursive |
Enables recursive downloading (following links) |
--level=inf |
Sets the recursion level to infinite |
--no-remove-listing |
Keep ".listing" files that are created to keep track of directory listings |
--wait=6 |
Wait the given number of seconds between requests |
--random-wait |
Multiplies --wait randomly between 0.5 and 1.5 for each request |
--adjust-extension |
Make sure that ".html" is added to the files |
--no-clobber |
Do not redownload a file if exists locally |
--continue |
Allows resuming downloading a partially downloaded file |
-e robots=off |
Ignores robots.txt instructions. |
--user-agent |
Sends the given "User-Agent" header to the server |
--domains |
Comma-separated list of domains to be followed |
--span-hosts |
Allows navigating to subdomains |
Other useful parameters:
Parameter | Description |
---|---|
--page-requisites |
Downloads things as inlined images, sounds, and referenced stylesheets |
--span-hosts |
Allows downloading files from links that point to different hosts |
--convert-links |
Converts links to local links (allowing local viewing) |
--no-check-certificate |
Bypasses SSL certificate verification. |
--directory-prefix=/my/directory |
Sets up the destination directory. |
--include-directories=posts |
Comma-separated list of allowed directories to be followed when crawling |
--reject "*?*" |
Rejects URLs that contain query strings |
Top comments (0)