How to Get Data from an API & Process XML in Python

#nlp #datascience #tutorial #beginners

Thanks to the requests library it is quick and easy to get data from an API, even for beginners, and the ElementTree library makes it easy to process the XML that an API provides. This tutorial, aimed at beginners, will show you an end-to-end example using python to get information about property listings from the Zoopla API.

We start by going to the Zoopla API page and clicking 'Register':

Fill in the form:

You'll be given an API key (a string of text) that looks something like this:

abc79x5kxxx8xxx6def4abc1

You're then allowed to use their interactive tool to search for property listings. I searched for a postcode by typing in 'WR10', with 'area' left blank, and clicked the 'Try it!' button.

This produced a request URL which will be needed if we want to do this search programmatically in our python code:

(Note: my API key has been cropped from the image but your API key will be shown after 'api_key=')

Notice how the URL contains the postcode that was used in the search, as well as the API key.

The 'Try it' feature also produces the XML output containing the search results:

It's a good idea to look through this to get an idea of the structure of the XML to make it easier to use the information later. Tip: if you copy-paste the XML into a tool like Sublime Text, you can collapse/expand the tags to make it easier to see the hierarchy of attributes.

Now let's start coding. We first need to import the following libraries (install them first if you need to):



import requests  # for using API
import xml.etree.ElementTree as ET  # for parsing XML
import numpy as np  # for using pandas
import pandas as pd  # for using dataframes

Then save your api key as a variable. I've called mine api_key:



api_key = 'abc79x5kxxx8xxx6def4abc1'

Copy-paste the request URL that was created by the 'Try it!' feature on the Zoopla API website and save it as a variable called request_url:



request_url = 'http://api.zoopla.co.uk/api/v1/property_listings.xml?postcode=WR10&api_key=abc79x5kxxx8xxx6def4abc1'

(Note: the API key at the end of the request URL above is a dummy string. You should have your own API key shown there.)

We need to use this request_url to call the API from our python code, like this:



r = requests.get(request_url)  # call API

We can then use ElementTree to process the XML that the API sent back in its response:



root = ET.fromstring(r.content)  # parse XML

Let's say that we're only interested in the description for each property listing. We can print each description by looping through the description tags and printing the text, like this:



# loop through each "description" element in the XML 
for description in root.iter('description'):  
        print(description.text, # print the description
              '\n')  # leave space under the description

My output looks like this:

Let's modify the loop to put the descriptions into a dataframe instead of printing them:



# create a list to store the descriptions
desc_list = []

# loop through each "description" element in the XML
for description in root.iter('description'):
        # add the description text to the list of descriptions   
        desc_list.append(description.text)  

# convert the list of descriptions to a dataframe
properties_df = pd.DataFrame({"description":desc_list})

# view the properties dataframe
# (with a wider column "description" so we can see the full text)
properties_df\
.style.set_properties(subset=['description'], **{'width': '800px'}) \
.hide_index() # don't show the index column

My output:

Let's get the property type for each listing and add them to the dataframe:



# create a list to store the property type for each property
prop_types_list = []

# loop through the property listings
for prop_type in root.iter('property_type'):
        # add the property type to the list
        prop_types_list.append(prop_type.text)

# add the property types to the dataframe as a new column
properties_df["property_type"] = prop_types_list

# view the updated dataframe
properties_df\
.style.set_properties(subset=['description'], **{'width': '800px'}) \
.hide_index()

The output:

What if we wanted to get the property listings for multiple postcodes? This is one way of doing it using a list of postcodes to dynamically build a request URL for each postcode:



# create a list of the postcodes to get property listings for
postcodes = ['PE19','WR10', 'PO38'] 

# create a list to store the request URL for each postcode
request_urls = []

# loop through the postcodes
for postcode in postcodes:

    # create a request URL for the postcode
    request_url = "http://api.zoopla.co.uk/api/v1/property_listings.xml?postcode=" + postcode + "&api_key=" + api_key

    # add the request URL to the list
    request_urls.append(request_url)

# view the request URLs
request_urls

My output (cropped so you can't see my API key):

Now lets loop through these request URLs and get the description and property type for each listing in each postcode, and store them in a dataframe:



# create lists to store descriptions and property types
desc_list = []
prop_types_list = []

# loop through the request URLs to get
# the property descriptions and 
# property types
for request_url in request_urls:
    r = requests.get(request_url)
    root = ET.fromstring(r.content)

    for description in root.iter('description'):
        desc_list.append(description.text)

    for prop_type in root.iter('property_type'):
        prop_types_list.append(prop_type.text)


# create a dataframe of the descriptions and property types 
properties_df = pd.DataFrame({"description":desc_list, "property_type":prop_types_list})

# view the updated dataframe
properties_df\
.style.set_properties(subset=['description'], **{'width': '800px'}) \
.hide_index()

My output:

You could make the data frame more useful by adding the postcodes and other information from the XML such as the street name, post town, county, number of bedrooms and much more. Then you could extend the fun and do some data analysis on it and create some data visualisations. If you want some help getting started with text analysis then this tutorial might help, and if you'd like some tips on how to create good-looking charts in python you may like this tutorial.