Downloading Files in Python Using wget Module

Print Friendly, PDF & Email

In a recent post about downloading files in Python, we have learned how to use the requests module to check downloadable resources and restrict some measures to grab remote files. To advance our knowledge of this topic, I am gonna give you another introduction to familiarise yourself with the wget module that is the implementation of the wget command coming along with the operating system. Now, let’s get started.

What is the wget command?

The wget command is a non-interactive utility to download remote files from the internet which is built-in with Unix based operating systems. It supports HTTPHTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

Invoking

By default, Wget is very simple to invoke. The basic syntax is:

wget [options]... [URL]

Wget will simply download all the URLs specified on the command line. URL is a Uniform Resource Locator, as defined below.

It isn’t too hard to learn the options of the wget command. Using man wget, you can get instructions of this command. Below is an example extracted from the output of running man wget.

OPTIONS
   Option Syntax
       Since Wget uses GNU getopt to process command-line arguments, every option has a long form along with the short one.  Long options are more
       convenient to remember, but take time to type.  You may freely mix different option styles, or specify options after the command-line
       arguments.  Thus you may write:

               wget -r --tries=10 http://fly.srk.fer.hr/ -o log

       The space between the option accepting an argument and the argument may be omitted.  Instead of -o log you can write -olog.

       You may put several options that do not require arguments together, like:

               wget -drc <URL>

       This is completely equivalent to:

               wget -d -r -c <URL>

       Since the options can be specified after the arguments, you may terminate them with --.  So the following will try to download URL -x,
       reporting failure to log:

               wget -o log -- -x

       The options that accept comma-separated lists all respect the convention that specifying an empty list clears its value.  This can be
       useful to clear the .wgetrc settings.  For instance, if your .wgetrc sets "exclude_directories" to /cgi-bin, the following example will
       first reset it, and then set it to exclude /~nobody and /~somebody.  You can also clear the lists in .wgetrc.

Examples

Example 1: Download PowerISO software without providing any options

We often download files and leave their names as they were called by the owner. To do so, the syntax is very simple.

wget https://d32si1eewy6hfa.cloudfront.net/ov2so5=3ex040/PowerISO7-x64.exe

The command above will download file PowerISO7-x64.exe located at the URL in question. The result is that file at the current working directory under the name PowerISO7-x64.exe.

Example 2: Using Wget Command to Save the Downloaded File Under Different Name

In practice, we often save the downloading file into a place and rename it to interest and meaningful name. To obtain this need, we could do with the option O (the uppercase of letter o) like

wget -O latest-hugo.zip https://github.com/gohugoio/hugo/archive/master.zip

The command above will save the latest hugo zip file from GitHub as latest-hugo.zip instead of its original name.

Read more about this command at https://www.gnu.org/software/wget/manual/wget.html

A short introduction to the wget module

To provide an API to Python developers communities, the maintainer and developer wget module hopes to ease applications and implementations of the wget command with Python. The latest version has been released since 2015. The last update was done in October 2015 (https://pypi.org/project/wget/). At the time of writing, the repository of project source code in bitbucket hasn’t been available anymore. The source code repository was probably moved or removed.

I came across this archived repository in GitHub which was forked from the one of this module’s owner. As long as the latest version was released, nobody continues maintaining and developing the module.

How to download files with wget

Usage

python -m wget [options] <URL>

options:

-o –output FILE|DIR output filename or directory

API Usage

>>> import wget 
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3' 
>>> filename = wget.download(url) 100% [................................................] 3841532 / 3841532
>>> filename
'razorback.mp3'

The skew that you see above is a documented side effect. Alternative progress bar:

>>> wget.download(url, bar=bar_thermometer)

The argument bar defines thermometer style progress bar string. `total` argument can not be zero. The minimum size of the bar returned is 3.

Example:

[………. ]

Control and trailing symbols (and spaces) are not included. See `bar_adaptive` for more information.

Customise the progress bar

The download method has bar argument indicating how the progress bar looks like. By default, the progress bar is a series of dots. We can customise it by applying the idea presented in the snippet below.

def bar_custom(current, total, width=80):
    print("Downloading: %d%% [%d / %d] bytes" % (current / total * 100, current, total))

wget.download('http://download.geonames.org/export/zip/US.zip', bar=bar_custom)

Conclusion

In this post, I have just presented the most commonly used method to download files in Python. Personally, I prefer to use the requests module for downloading files due to its combination of simplicity and power. However, your project may have constraints preventing you from using 3rd party libraries, in which case I would use the urllib2 module (for Python 2) or the urllib.request module (for Python 3).

Which library do you prefer and why? Let us know in the comments!

References

[1] GNU Wget 1.20 Manual, https://www.gnu.org/software/wget/manual/wget.html, accessed on 6.9.2020

[1] wget 3.3 on PyPI, https://pypi.org/project/wget/, accessed on 6.9.2020

[3] Download files with progress in Python, https://medium.com/@petehouston/download-files-with-progress-in-python-96f14f6417a2, accessed on 4.9.2020

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code

This site uses Akismet to reduce spam. Learn how your comment data is processed.