In a recent post about downloading files in Python, we have learned how to use the
requests module to check downloadable resources and restrict some measures to grab remote files. To advance our knowledge of this topic, I am gonna give you another introduction to familiarise yourself with the
wget module that is the implementation of the
wget command coming along with the operating system. Now, let’s get started.
What is the wget command?
wget command is a non-interactive utility to download remote files from the internet which is built-in with Unix based operating systems. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
By default, Wget is very simple to invoke. The basic syntax is:
wget [options]... [URL]
Wget will simply download all the URLs specified on the command line. URL is a Uniform Resource Locator, as defined below.
It isn’t too hard to learn the options of the wget command. Using
man wget, you can get instructions of this command. Below is an example extracted from the output of running
OPTIONS Option Syntax Since Wget uses GNU getopt to process command-line arguments, every option has a long form along with the short one. Long options are more convenient to remember, but take time to type. You may freely mix different option styles, or specify options after the command-line arguments. Thus you may write: wget -r --tries=10 http://fly.srk.fer.hr/ -o log The space between the option accepting an argument and the argument may be omitted. Instead of -o log you can write -olog. You may put several options that do not require arguments together, like: wget -drc <URL> This is completely equivalent to: wget -d -r -c <URL> Since the options can be specified after the arguments, you may terminate them with --. So the following will try to download URL -x, reporting failure to log: wget -o log -- -x The options that accept comma-separated lists all respect the convention that specifying an empty list clears its value. This can be useful to clear the .wgetrc settings. For instance, if your .wgetrc sets "exclude_directories" to /cgi-bin, the following example will first reset it, and then set it to exclude /~nobody and /~somebody. You can also clear the lists in .wgetrc.
Example 1: Download PowerISO software without providing any options
We often download files and leave their names as they were called by the owner. To do so, the syntax is very simple.
The command above will download file PowerISO7-x64.exe located at the URL in question. The result is that file at the current working directory under the name PowerISO7-x64.exe.
Example 2: Using Wget Command to Save the Downloaded File Under Different Name
In practice, we often save the downloading file into a place and rename it to interest and meaningful name. To obtain this need, we could do with the option
O (the uppercase of letter o) like
wget -O latest-hugo.zip https://github.com/gohugoio/hugo/archive/master.zip
The command above will save the latest hugo zip file from GitHub as
latest-hugo.zip instead of its original name.
Read more about this command at https://www.gnu.org/software/wget/manual/wget.html
A short introduction to the wget module
To provide an API to Python developers communities, the maintainer and developer
wget module hopes to ease applications and implementations of the
wget command with Python. The latest version has been released since 2015. The last update was done in October 2015 (https://pypi.org/project/wget/). At the time of writing, the repository of project source code in bitbucket hasn’t been available anymore. The source code repository was probably moved or removed.
I came across this archived repository in GitHub which was forked from the one of this module’s owner. As long as the latest version was released, nobody continues maintaining and developing the module.
How to download files with wget
python -m wget [options] <URL>
- -o –output FILE|DIR output filename or directory
>>> import wget >>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3' >>> filename = wget.download(url) 100% [................................................] 3841532 / 3841532 >>> filename 'razorback.mp3'
The skew that you see above is a documented side effect. Alternative progress bar:
>>> wget.download(url, bar=bar_thermometer)
bar defines thermometer style progress bar string. `total` argument can not be zero. The minimum size of the bar returned is 3.
Control and trailing symbols (and spaces) are not included. See `bar_adaptive` for more information.
Customise the progress bar
download method has
bar argument indicating how the progress bar looks like. By default, the progress bar is a series of dots. We can customise it by applying the idea presented in the snippet below.
def bar_custom(current, total, width=80): print("Downloading: %d%% [%d / %d] bytes" % (current / total * 100, current, total)) wget.download('http://download.geonames.org/export/zip/US.zip', bar=bar_custom)
In this post, I have just presented the most commonly used method to download files in Python. Personally, I prefer to use the
requests module for downloading files due to its combination of simplicity and power. However, your project may have constraints preventing you from using 3rd party libraries, in which case I would use the
urllib2 module (for Python 2) or the
urllib.request module (for Python 3).
Which library do you prefer and why? Let us know in the comments!
 GNU Wget 1.20 Manual, https://www.gnu.org/software/wget/manual/wget.html, accessed on 6.9.2020
 wget 3.3 on PyPI, https://pypi.org/project/wget/, accessed on 6.9.2020
 Download files with progress in Python, https://medium.com/@petehouston/download-files-with-progress-in-python-96f14f6417a2, accessed on 4.9.2020