Web Scraping with Python, getting the table of countries and country codes from countrycode.org

#webscraping #python

Ever needed some data that is sitting on a webpage you cannot easily copy and paste from?
I'm going to show you how to get data from webpages (webscraping, essentially) with Python. Specifically we'll be getting the countries, iso codes and phone number codes in the table found on this page

Before we get into the code proper, we need to install some Python packages first (requests and beautifulsoup4)
pip install requests beautifulsoup4

Follow the following steps.

Import required packages

from bs4 import BeautifulSoup
import json
import requests

Get the webpages content using the requests package

url = "https://countrycode.org/"
r = requests.get(url)
r.raise_for_status()

The last line raises an exception if the request's response code is not a successful one, thus stopping the program.

Create the 'soup' and select all the rows found in the table's body. This 'soup' object helps us to get particular elements from the HTML page's content.

soup = BeautifulSoup(r.content, 'html.parser')
rows = soup.select("tbody>row") # select all the rows that are direct descendants of a tbody element

Get the countries from the table

list_of_countries = []
for row in rows:
    keys = ["name", "country_code", "iso_codes", "population", "area/km2", "gdp $USD"] # the different columns in the table
    country_object = {}
    for key in keys:
        country_object[key] = '' # creating a dictionary for the row

    for index, cell in enumerate(row.find_all('td')): # looping through the different td elements found in this row
        if index < len(keys):
            if index ==  0:
                # get the text found in the hyperlink in the cell
                country_object[keys[index]] = cell.find('a').text
            else:
                # get the text found in the cell
                country_object[keys[index]] = cell.text
    list_of_countries.append(country_object)

Save the list to a json file

with open("countries.json", "w") as _: # replace countries.json with whatever you want
    json.dump(list_of_countries, _)

VOILA! You have successfully gotten the list of countries, their ISO and area codes, surface areas and gdp.

DEV Community

Web Scraping with Python, getting the table of countries and country codes from countrycode.org

Top comments (0)

Read next

Code Better, Debug Smarter: Tips Every Developer Needs

Why Rewriting Everything in Rust Won’t Solve All Your Problems

Automated Session Control with Bluetooth: An Insight into ble-lock-session

Advent of Code 2024 - Day 14 : Restroom Redoubt