Home / Networking / Internet / Using Feedparser in Python to read RSS

Using Feedparser in Python to read RSS

Print Friendly, PDF & Email

In this post, we will take a closer look at how to fetch and parse syndicated feeds with Feedparser library in Python. The detailed documentation of this library can be found here. As usual, the post will begin with an introduction of RSS. The next section will be a brief explanation of what Feedparser is. Then we will directly jump into how to install and use the library. To make a complete demonstration, we will present an example where BioModels’ Models of The Month RSS feeds are fetched and processed in terms of client’s need. The post will be closed by the conclusions references.

What is RSS?

RSS stands for Rich Site Summary, also known Really Simple Syndication, which allows the audiences of websites or web-based applications to access the latest updates standardised in a computer readable format, XML basically. An RSS document (also shortly called feed, web feed or channel) often includes full or summarised text, metadata like publishing date and author’s name.

To check this RSS feeds, the user usually uses a program or plugin/extension (i.e. you read RSS feeds by using Web Browser), so-called RSS reader or news aggregator, to track of many different websites they want to keep updates. In the other side, common programming languages support developers to parse RSS content by providing libraries. At the time when I write this article, ROME API, written in Java, and Python-based library Feedparser presented in this post, are mostly used.

BioModels' Models of The Month RSS feeds

BioModels’ Models of The Month RSS feeds fetched in Firefox

Common structure of an RSS document

Look at the link below:

https://www.ebi.ac.uk/biomodels/modelOfTheMonth/rss

That is BioModels’ Models of The Month RSS feed where you can track of all models published monthly by BioModels’ Data Curators. The XML-based document includes required tags that are concisely explained below.

  • The document is begun with RSS tag while channel contains a title, link, description as the mandatory fields. The language, copyright, managingEditor and image are optional properties. There properties are common information/metadata of the feeds news.
  • Each channel can have multiple items.
  • Each item should include title, link, description, pubDate and guid.

What is Feedparser?

Feedparser is a Python-based library which provides us facilities in order to parse feeds in a variety of known formats, such as Atom, RSS and RDF. It can properly work on Python 2.4 or later to Python 3.6 as stated in its development repository (see tox.ini file).

Install Feedparser

I am using conda to manage Python packages. The command used to install Feedparser is

conda install feedparser

For many Python developers, they could end up with using pip. The command is also similar to the one in conda.

Verify the installation

To verify a package installed in your system yet, we can run the command conda list or pip list. The command will display a list of installed packages where you determine feedparser package has been installed or not.

Naturally, you can enter import feedparser  into Python interactive mode. If the output displays nothing without any errors, it’s sure that feedparser library was successfully installed.

I won’t let you be patient anymore because it’s time to have a play and go with Feedparser.

Fetch and parser BioModels’ Models of The Month Feeds

As explained above, we will familiarise with Feedparser by learning how to fetch and extract information from BioModels’ Models of The Month Feeds.

You start your program with importing the feedparser package.

Fetch the document

Fetch a document means creating a feed by using the parse method with feed link as the unique required argument.

Access parsed data

As explained above, d['feed']  gives you common information/metadata of this feeds. The output looks like below.

The output shows in JSON format so that you easily know which field you want to retrieve.

Now, move on a bit further where we want to get news/feeds entries. To know the number of the entries/items, we can run the following statement.

You certainly access each item either via the index or via a loop. Below are the snippet of extracting all entries’ links.

The output looks like.

The complete example can be found below.

Conclusions

We have gone through RSS feed introduction, Feedparser explanation then played the library directly with hands-on examples. The library is power so that we can use it to parse other formats of feeds. Apart from this library, ROME API preserves for Java’s developer. The post guiding you how to use ROME API to work with RSS feeds will be published soon.

References

[1] Using Feedparser in Python

comments

About Nguyen Vu Ngoc Tung

I love making new professional acquaintances. Don't hesitate to contact me via nguyenvungoctung@gmail.com if you want to talk about information technology, education, and research on complex networks analysis (i.e., metabolic networks analysis), data analysis, and applications of graph theory. Specialties: researching and proposing innovative business approaches to organizations, evaluating and consulting about usability engineering, training and employee development, web technologies, software architecture.