GNU Wget is a utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user's presence, which can be a great hindrance when transferring a lot of data.
Wget can follow links in HTML pages and create local versions (mirrors) of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as ``recursive downloading.'' While doing that, Wget respects the Robot Exclusion Standard (/robots.txt). Wget can be instructed to convert the links in downloaded HTML files to the local files for offline viewing.
Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.
Wget supports many options and features, for which you should consult its man page. However, some particularily useful ones for various types of administrators are as follows:
This makes wget act as a robot spider, indexing its way through web pages and checking if the links or pages exist but not actually download them. This is very useful if you are a web server administrator and wish to check your site for dead links. You can even use it to check your bookmarks file for your browser:
wget --spider --force-html -i bookmarks.html
Turns on time-stamping.
This causes wget to grab web-pages recursively, allowing local mirrors to be made. It can be combined with the following other options:
-l specifies the maximum recursion depth. It defaults to 5 levels.
-k tells wget to convert the links after downloading. This will convert the paths orginally specified to local ones. It's useful for making web-backups or even for fixing unwanted absolute links in a web-site.
Turns on options suitable to web-site mirroring (equivalent to "-r -N -l inf -nr").
Though wget is a useful command line utility, more often than not administrators find it is a very valuable scripting element. Scripts which monitor web-sites for defacement, periodically check for dead links, or even automatically backup web-sites are easily written and set to run at a given interval.
Curl is somewhat of a "spiritual successor" to Wget. It is more featureful than normal wget, and is also written more generally. While wget is used to fetch files from HTTP, HTTPS, or FTP servers, curl can be used to get files from nearly any communications infrastructure (at the time of this writing, this includes HTTP, HTTPS, FTP, GOPHER, DICT, TELNET, LDAP or ordinary files). Curl provides a large amount of options such as upload abilities, authentication, proxies, kerberos, HTTP PUT and POST, and cookie handling.
Curl is also a programming library, that allows its features and functionality to be implimented into other applications. It has bindings in nearly every language you can think of (C, C++, Perl, Python, PHP, Java, Ruby, etc). It also supports multiple networking protocols (including TCP IPv6) and offers seamless cross-platform development across UNIX, Windows, Mac, OS/2 and just about any OS you'd want to use.
Of course, all of thus functionality comes at a price: curl is much less straight forward to use than wget.
Curl's basic syntax is very similar to wget:
curl [options] [URL...]
If you wanted to again download the same file as above, the curl command would look very similar:
$ curl http://www.foo.com/temp/nano.tar.gz
However, curl has extra functionality with respect to it's URL specifications. For example, you can specify specify multiple URLs or parts of URLs by writing part sets within braces as in:
or you can get sequences of alphanumeric series by using  as in:
ftp://ftp.numericals.com/file[001-100].txt (with leading zeros)
It is possible to specify up to 9 sets or series for a URL, but no nesting is supported at the moment:
You can specify any amount of URLs on the command line. They will be fetched in a sequential manner in the specified order.
Curl will attempt to re-use connections for multiple file transfers, so that getting many files from the same server will not do multiple connects / handshakes. This improves speed. Of course this is only done on files specified on a single command line and cannot be used between separate curl invokes.
Again, curl has many options and features, including the several we listed above for wget. However, they are beyond the scope of this class. Since wget will work for all of the activities we have planned for this and future UNIX courses, we will not cover curl options.
Take a look at curl's man page for more information on it.