In the beginning
In a previous post I touched a bit on how lately building static websites has piqued my interest. I have been contemplating starting a blog and fell into my usual routine of digging myself into a rabbit hole. It's kinda annoying the amount of yak-shaving I volunteer my brain for 🤣.
One of the things I am finding limiting is that due to the nature of static sites you kinda need to flip the script a little bit, data comes in at build time normally. This in turn makes it a bit weird to think about site search.
Normally search would be based on some kind of database, search index and a server side algorithm and filter API.
Why?
In this particular instance I think the answer is self explanatory, the search feature is not really a luxury anymore. It allows for better discoverability, and even though the outreach is from various external posts via backlinks, without search the blog would look rudimentary.
How?
While you usually wouldn't fetch data for pages in a static website, the best performing search in terms of relevance would require server side implementation. One example of this that is also workable on static websites is Algolia. It's pretty good and it's great that it also supports static websites.
Algolia is paid so you might want to take that into consideration if you are thinking about, especially you are not some kind of social media influencer.
I am not that so my first instinct is looking into free options. Luckily there are free options that are fully implemented on the client side.
Architecturally they implement a dumbed down version of the server side alternative. As mentioned before I started digging into rust and zola and came across this really amazing blog post about implementing a static search using WASM and rust.
Essentially what the dumbed down version is, it's an index implementation that allows for a fully embedded search experience, allowing it to work in a static website. It is not designed for millions of pages and performance tends to degrade as you reach larger numbers.
When getting into performance territory you might want to work on the performance of the index, there are multiple options, you could implement fuse filters or Bloom Filters or XOR Filters like the ones suggested in the blog post. If you want to go for a further performance bump, server side is your best bet.
Implementation quirks
I chose fuse.js for one of my projects. For smaller scale it seems to perform just fine. I haven't reached any performance bottleneck yet.
When It comes to implementing search you want to aim for keyword density, but also facets. This is where the pure static solution may feel a bit naive.
So, we are talking static websites, which means that in order to do a faceted search you need to filter content based on meta embedded in the pages before you pass it in to the fuse.js search "engine".
You can do the filtering in JS
, or do an exact match search on a single field using fuse.js.
import Fuse from 'fuse.js';
const options = {
includeScore: true,
keys: [
{
name: "author"
}
],
useExtendedSearch: true
}
const books = [
{ title: 'The Great Gatsby', author: 'F. Scott Fitzgerald' },
{ title: 'The Da Vinci Code', author: 'Dan Brown' },
{ title: 'The Catcher in the Rye', author: 'J.D. Salinger' }
]
const fuse = new Fuse(books, options);
async function main(engine) {
let results = engine.search('="Dan Brown"')
console.log(results)
}
main(fuse)
Then once you have the facet filtered results you can pass it on to the regular fuzzy search that would give you the static search.
async function main(engine) {
let results = engine.search({
$and: [{ $path: ['author'], $val: 'cott'}, {$path: 'title', $val: 'The'} ],
})
console.log(results)
}
const fuse = new Fuse(books, options);
main(fuse)
In a server side implementation this would be done in a single operation usually.
Conclusions
- you can do some of the search functionality without a backend server - think static
- while it may not scale, do you really have that as a hard requirement? Paul Graham's do things that don't scale
- this approach is good for prototyping because it allows you to check the traction of the search
Top comments (0)