In the former posts, we have shared the methods to download files from the web using the
wget module. As we know, they are external modules which we need to install them before using. Therefore, we are going to present you an alternative method to download remote resources using a built-in module called
urllib2 (in Python 2) or
urllib (in Python 3). Now, let’s get started.
Notes: urllib3 is a powerful, user-friendly HTTP client for Python. Much of the Python ecosystem already uses urllib3 and you should too. urllib3 brings many critical features that are missing from the Python standard libraries
A short introduction to the modules
urllib module is the replacement of
urllib2 as long as migrating from Python 2 to Python 3. In other words, you won’t find
urllib2 in Python 3 instead of
The urllib module in Python 3 is a collection of modules that you can use for working with URLs. If you are coming from a Python 2 background you will note that in Python 2 you had urllib and urllib2. These are now a part of the urllib package in Python 3. The current version of urllib is made up of the following modules:
We will be covering each part individually except for urllib.error. The official documentation actually recommends that you might want to check out the 3rd party library, requests, for a higher-level HTTP client interface. However, we believe that it can be useful to know how to open URLs and interact with them without using a 3rd party and it may also help you appreciate why the requests package is so popular.
urllib.request.urlretrieve is considered a “legacy interface” in Python 3, and it may be deprecated at some point in the future. Because of this, we wouldn’t recommend using it in favour of one of the methods below. We’ve included it here due to is popularity in Python 2.
How to do
Using the urllib.request package in Python 3
This script works only in Python 3
import urllib.request print('Beginning file download with urllib2...') url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg' urllib.request.urlretrieve(url, '/Users/tnguyen/Downloads/tmp/cat.jpg')
In the earlier snippet, we first import the
urllib.requestmodule. Next, we create a variable
url that contains the path of the file to be downloaded. Finally, we call the
urlretrieve method and pass it the
url variable as the first argument, “/Users/tnguyen/Downloads/tmp/cat.jpg” as the second parameter for the file’s destination. Keep in mind that you can pass any filename as the second parameter and that is the location and name that your file will have, assuming you have the correct permissions.
Using the urllib2 module in Python 2
This module only exists in Python 2. Therefore, the snippet below works only in Python 2.
import urllib2 filedata = urllib2.urlopen('http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg') datatowrite = filedata.read() with open('/Users/tnguyen/Downloads/tmp/cat2.jpg', 'wb') as f: f.write(datatowrite)
open method accepts two parameters, the path to the local file and the mode in which data will be written. Here “wb” states that the
open method should have permission to write binary data to the given file.
Execute the above script and go to your “~/Downloads/tmp” directory. You should see the downloaded document as “cat2.jpg”
We have explained how to use urllib module to download remote files. Besides the recently introduced methods, we hope you can leverage what we are sharing in your work.
If you have any feedback, please don’t hesitate to leave your comments in the box below.
Please consider to donate us if you find the blog useful. Thanks!