I always thought getting worldwide postal codes by myself was an easy task because postal codes seem to be nothing more than a simple shortcode that is publicly available. I quickly realized this was not the case, because:
- There is no single source of truth
- Most sources were incomplete
- Data was very often presented in a very unstructured way
After doing some general research, I soon understood, that the reason for the problems above had their origin in the history of postal codes. Each country has a different format, area granularity, and way of structuring postal codes as a whole.
I first tried to scrape Wikipedia with the following code. For this post, I will use the example of Austria.
For this, I a small python script.
Before running it make sure to install all dependencies:
pip3 install lxml
-
pip3 install requests
, -
pip3 install bs4
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_in_Austria'
# fire GET request
response = requests.get(url)
# parse content
content = BeautifulSoup(response.text, 'lxml')
# get postal codes
postcodes = [
postcode.text for postcode in content.find_all('li')
if ' - ' in postcode.text
]
# filter edge cases
postcodes = [
postcode.split()[0] for postcode in postcodes
if len(postcode.split()) == 3 or
len(postcode.split()) == 4
]
# write output to file
with open('at_postcodes.txt', 'a') as f:
for postcode in postcodes:
f.write(postcode + '\n')
The obtained data set and the related approach might be enough for some use cases, but since I wanted to get global postal code data, I was not satisfied.
I live in Austria and realized very quickly that the data that I have just scraped is not complete (some postal codes are missing). Considering the time it took my to build the parser and the fact that I would have to adapt it for every single data source (adaptions are even needed across Wikipedia since every article is written differently), I decided to give up.
This was the moment I gave up and started to look for ready-to-use solutions:
I hope this article will save you some time, in case you are trying to achieve the same.
Top comments (0)