Ever needed some data that is sitting on a webpage you cannot easily copy and paste from?
I'm going to show you how to get data from webpages (webscraping, essentially) with Python. Specifically we'll be getting the countries, iso codes and phone number codes in the table found on this page
Before we get into the code proper, we need to install some Python packages first (requests and beautifulsoup4)
pip install requests beautifulsoup4
Follow the following steps.
Import required packages
from bs4 import BeautifulSoup
import json
import requests
Get the webpages content using the requests package
url = "https://countrycode.org/"
r = requests.get(url)
r.raise_for_status()
The last line raises an exception if the request's response code is not a successful one, thus stopping the program.
Create the 'soup' and select all the rows found in the table's body. This 'soup' object helps us to get particular elements from the HTML page's content.
soup = BeautifulSoup(r.content, 'html.parser')
rows = soup.select("tbody>row") # select all the rows that are direct descendants of a tbody element
Get the countries from the table
list_of_countries = []
for row in rows:
keys = ["name", "country_code", "iso_codes", "population", "area/km2", "gdp $USD"] # the different columns in the table
country_object = {}
for key in keys:
country_object[key] = '' # creating a dictionary for the row
for index, cell in enumerate(row.find_all('td')): # looping through the different td elements found in this row
if index < len(keys):
if index == 0:
# get the text found in the hyperlink in the cell
country_object[keys[index]] = cell.find('a').text
else:
# get the text found in the cell
country_object[keys[index]] = cell.text
list_of_countries.append(country_object)
Save the list to a json file
with open("countries.json", "w") as _: # replace countries.json with whatever you want
json.dump(list_of_countries, _)
VOILA! You have successfully gotten the list of countries, their ISO and area codes, surface areas and gdp.
Top comments (0)