In the former posts, we have shared the methods to download files from the web using the requests
or wget
module. As we know, they are external modules which we need to install them before using. Therefore, we are going to present you an alternative method to download remote resources using a built-in module called urllib2
(in Python 2) or urllib
(in Python 3). Now, let’s get started.
Notes: urllib3 is a powerful, user-friendly HTTP client for Python. Much of the Python ecosystem already uses urllib3 and you should too. urllib3 brings many critical features that are missing from the Python standard libraries
Table of Contents
A short introduction to the modules
The urllib
module is the replacement of urllib2
as long as migrating from Python 2 to Python 3. In other words, you won’t find urllib2
in Python 3 instead of urllib
.
The urllib module in Python 3 is a collection of modules that you can use for working with URLs. If you are coming from a Python 2 background you will note that in Python 2 you had urllib and urllib2. These are now a part of the urllib package in Python 3. The current version of urllib is made up of the following modules:
- urllib.request
- urllib.error
- urllib.parse
- urllib.rebotparser
We will be covering each part individually except for urllib.error. The official documentation actually recommends that you might want to check out the 3rd party library, requests, for a higher-level HTTP client interface. However, we believe that it can be useful to know how to open URLs and interact with them without using a 3rd party and it may also help you appreciate why the requests package is so popular.
Note: This urllib.request.urlretrieve
is considered a “legacy interface” in Python 3, and it may be deprecated at some point in the future. Because of this, we wouldn’t recommend using it in favour of one of the methods below. We’ve included it here due to is popularity in Python 2.
How to do
Using the urllib.request package in Python 3
This script works only in Python 3
import urllib.request print('Beginning file download with urllib2...') url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg' urllib.request.urlretrieve(url, '/Users/tnguyen/Downloads/tmp/cat.jpg')
In the earlier snippet, we first import the urllib.request
module. Next, we create a variable url
that contains the path of the file to be downloaded. Finally, we call the urlretrieve
method and pass it the url
variable as the first argument, “/Users/tnguyen/Downloads/tmp/cat.jpg” as the second parameter for the file’s destination. Keep in mind that you can pass any filename as the second parameter and that is the location and name that your file will have, assuming you have the correct permissions.
Using the urllib2 module in Python 2
This module only exists in Python 2. Therefore, the snippet below works only in Python 2.
import urllib2 filedata = urllib2.urlopen('http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg') datatowrite = filedata.read() with open('/Users/tnguyen/Downloads/tmp/cat2.jpg', 'wb') as f: f.write(datatowrite)
The open
method accepts two parameters, the path to the local file and the mode in which data will be written. Here “wb” states that the open
method should have permission to write binary data to the given file.
Execute the above script and go to your “~/Downloads/tmp” directory. You should see the downloaded document as “cat2.jpg”
Conclusion
We have explained how to use urllib module to download remote files. Besides the recently introduced methods, we hope you can leverage what we are sharing in your work.
If you have any feedback, please don’t hesitate to leave your comments in the box below.
Please consider to donate us if you find the blog useful. Thanks!