Download Files in Python using the urllib Module

Print Friendly, PDF & Email

In the former posts, we have shared the methods to download files from the web using the requests or wget module. As we know, they are external modules which we need to install them before using. Therefore, we are going to present you an alternative method to download remote resources using a built-in module called urllib2 (in Python 2) or urllib (in Python 3). Now, let’s get started.

Notes: urllib3 is a powerful, user-friendly HTTP client for Python. Much of the Python ecosystem already uses urllib3 and you should too. urllib3 brings many critical features that are missing from the Python standard libraries

A short introduction to the modules

The urllib module is the replacement of urllib2 as long as migrating from Python 2 to Python 3. In other words, you won’t find urllib2 in Python 3 instead of urllib.

The urllib module in Python 3 is a collection of modules that you can use for working with URLs. If you are coming from a Python 2 background you will note that in Python 2 you had urllib and urllib2. These are now a part of the urllib package in Python 3. The current version of urllib is made up of the following modules:

  • urllib.request
  • urllib.error
  • urllib.parse
  • urllib.rebotparser

We will be covering each part individually except for urllib.error. The official documentation actually recommends that you might want to check out the 3rd party library, requests, for a higher-level HTTP client interface. However, we believe that it can be useful to know how to open URLs and interact with them without using a 3rd party and it may also help you appreciate why the requests package is so popular.

Note: This urllib.request.urlretrieve is considered a “legacy interface” in Python 3, and it may be deprecated at some point in the future. Because of this, we wouldn’t recommend using it in favour of one of the methods below. We’ve included it here due to is popularity in Python 2.

How to do

Using the urllib.request package in Python 3

This script works only in Python 3

import urllib.request

print('Beginning file download with urllib2...')

url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg'
urllib.request.urlretrieve(url, '/Users/tnguyen/Downloads/tmp/cat.jpg')

In the earlier snippet, we first import the urllib.requestmodule. Next, we create a variable url that contains the path of the file to be downloaded. Finally, we call the urlretrieve method and pass it the url variable as the first argument, “/Users/tnguyen/Downloads/tmp/cat.jpg” as the second parameter for the file’s destination. Keep in mind that you can pass any filename as the second parameter and that is the location and name that your file will have, assuming you have the correct permissions.

Using the urllib2 module in Python 2

This module only exists in Python 2. Therefore, the snippet below works only in Python 2.

import urllib2

filedata = urllib2.urlopen('http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg')
datatowrite = filedata.read()
 
with open('/Users/tnguyen/Downloads/tmp/cat2.jpg', 'wb') as f:
    f.write(datatowrite)

The open method accepts two parameters, the path to the local file and the mode in which data will be written. Here “wb” states that the open method should have permission to write binary data to the given file.

Execute the above script and go to your “~/Downloads/tmp” directory. You should see the downloaded document as “cat2.jpg”

Conclusion

We have explained how to use urllib module to download remote files. Besides the recently introduced methods, we hope you can leverage what we are sharing in your work.

If you have any feedback, please don’t hesitate to leave your comments in the box below.

Please consider to donate us if you find the blog useful. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code

This site uses Akismet to reduce spam. Learn how your comment data is processed.