Hello there.
Are you curious about how geographic locations can be represented as a short string of letters and numbers? Or how this technique helps in fast retrieval of location data?
This post will take you on a deep dive into geohashes, explaining what they are, how they work, and their practical uses. Whether you're a data scientist, a software engineer, or just someone with a knack for geography and data, this post will help you grasp the concept of geohashes and their significance.
The purpose of this post is to explain in depth how a geohash works and its subtleties step-by-step.
What is a geohash?
Chat GPT says:
A geohash is a location encoding system that represents a geographic location using a short string of letters and numbers. It works by dividing the world into a grid of squares, assigning each square a unique hash. The length of the hash controls the precision, with longer hashes corresponding to smaller, more precise squares.
That gives us a general idea. You use a geohash to identify areas on a map. The easiest way to visualize them is with an interactive map
Why geohashes
Databases are better optimized for text than geolocation queries. If you need fast retrieval over a large collection of data, R-Tree indexes might not be enough.
Quoting Chris Hewett's website:
Geohash avoids/minimises the use of (usually) slow geospacial functions. Who you going to call when your ST_CONTAINS() is slow and an EXPLAIN shows it's already using an index...
Also, geohashes can be used as an alternative to tile coordinates in order to load chunks of a map and possibly cache them on the client side.
Different precisions
As chat gpt said, a geohash length describes its precision. For instance, dr7
is a geohash with precision 3. That means that its parent, dr
(precision 2), encloses a larger area that also contains its neighbors.
2 types of geohashes
There are 2 types of geohash grids (width x height): 8x4
and 4x8
. That means that for any given zoom level, there are 32
possible squares inside.
We start with an 8x4
grid with the whole world inside. Each time you go one precision deeper, the type of grid you get alternates. This is made this way, because if we kept grids in the same format for all zoom levels, we would eventually get super thin areas.
Since there are 8x4
and 4x8
grids only, we use these 32 characters to encode each precision: 0123456789bcdefghjkmnpqrstuvwxyz
.
They range from 0 to z, but beware, there are a few letters missing in this specific encoding, such as a;i;l and o
.
Keep dividing by 2
If you want to find out the geohash in which a point lies into, you have to bisect it 5
times. Why 5? Because 2^5
is 32
, which is the total amount of geohashes for each precision level.
Let's do a dry run of the algorithm to figure out in which geohash1 a location L
in argentina lies into.
- purple: divide the world in 2.
L
is in the first half - red:
L
is beneath - orange:
L
is to the right - yellow:
L
is above - green:
L
is to the left - We have run 5 tests already, so the geohash is
6
Notice that we always have two possible moves for each step: to choose between the first and the second half. The moves alternate between latitude and longitude, so we are able to move in 2D.
Also, we have to somehow translate the accumulated result of those moves into an index for the string 0123456789bcdefghjkmnpqrstuvwxyz
.
The rules for that are:
- We start with an index valued 0
- If the location lies in the first half (lower than mid), multiply the index by 2
- If it's the second half (higher than mid), multiply the index by 2 and add 1
- Repeat 4 more times
That means we are using even numbers to identify first-half
moves and odd numbers for second-half
moves.
The highest value we can get is 31:
0 * 2 + 1 = 1
1 * 2 + 1 = 3
3 * 2 + 1 = 7
7 * 2 + 1 = 15
15 * 2 + 1 = 31
Since our index starts at 0, that would be 32 possibilities.
This is exactly what the code is doing:
// (c) Chris Veness 2014-2019
let latMin = -90, latMax = 90
let lonMin = -180, lonMax = 180
while (geohash.length < precision) {
if (evenBit) {
// bisect E-W longitude
const lonMid = (lonMin + lonMax) / 2
if (lon >= lonMid) {
idx = idx * 2 + 1
lonMin = lonMid
} else {
idx = idx * 2
lonMax = lonMid
}
} else {
// bisect N-S latitude
const latMid = (latMin + latMax) / 2
if (lat >= latMid) {
idx = idx * 2 + 1
latMin = latMid
} else {
idx = idx * 2
latMax = latMid
}
}
evenBit = !evenBit
if (++bit === 5) {
// 5 bits gives us a character: append it and start over
geohash += base32.charAt(idx)
bit = 0
idx = 0
}
}
console.log(geohash)
Then we stop once the precision is met.
Alternatives to geohash
If you check the h3geo comparison page, you should see plenty of alternatives to geohash, such as s2 or even h3 itself.
Glossary
ST_CONTAINS: postgis function that finds geometries inside areas;
EXPLAIN: postgres statement that calculates query cost
Top comments (0)