Home / Programming / Programming Languages / Scripting Language / How to use urllib in Python

How to use urllib in Python

Print Friendly, PDF & Email
Today, with the development of the internet, network programming to exchange packets is no longer as complicated as it was a few decades ago. Python is no exception when it provides programmers with urllib  packages to interact with applications via the HTTP protocol.

This post is going to present the usage of the urllib2 and urllib3  library, called sanity-friendly HTTP client, which allows us to work with HTTP request and response. While the title just addresses that the post focuses primarily on urllib2, we are also examining some examples where we shall use urllib.

What is this library used for?

As the name suggests, this package is used for sending and receiving packets between HTTP based applications. A simple and understandable usage is to hit a web service endpoint and then retrieve JSON based results. Simply put, this library gives us tools to fetch contents from URLs.

With urllib library in Python 2.x, we can open any resources having URL format as described in its reference [1]. We don’t need to write more complicated code to obtain a simple task. The usages of this package will be addressed below.

urllib library exists in both Python 2.x and 3.x but some functions and classes don’t maintain either of them. Bear in mind that the following examples will work in a specific environment.

Examples

In my experience, learning via examples and illustrated applications is the fastest way to approach new knowledge. Do not let you wait for more. Let’s go with the first example of fetching a fragment of text from a webpage.

Read the content of a webpage

To obtain this need, we can use urllib.urlopen to create an URL object which provides read method to load the entire content of a page given in an URL. Below is a simple solution with urllib.

Fetch a piece of text from a webpage

In this illustrated application, we expect to fetch a bulk of desired paragraphs from a webpage. Below is the snippet.

We have used urllib.request to retrieve the content of a url as you see in the sample source code.

References

  1. Open arbitrary resources by URL, Python 2.7, accessed on July 2nd, 2019.
  2. Homepage of the latest version of urllib3, accessed on July 1st, 2019.

If you have any comments, feel free to put your words into comment sections below.

comments

About Nguyen Vu Ngoc Tung

I love making new professional acquaintances. Don't hesitate to contact me via nguyenvungoctung@gmail.com if you want to talk about information technology, education, and research on complex networks analysis (i.e., metabolic networks analysis), data analysis, and applications of graph theory. Specialties: researching and proposing innovative business approaches to organizations, evaluating and consulting about usability engineering, training and employee development, web technologies, software architecture.