DEV Community

Arindam Roy
Arindam Roy

Posted on

Building a Simple Search Engine with Rust and WebAssembly for web

Introduction

You might wonder why we need another search engine implementation, especially one using Rust and WebAssembly. While it's true that many developers start with a TODO app, I wanted to challenge myself and explore something different for my first blog post. I chose to create an Algolia-like search engine because that the first thing I came across Algolia on dev.to and I had no better idea.

What is Algolia?

Algolia is a powerful search provider for web applications. It's known for its speed and efficiency in delivering search results. Their tagline, "Show users what they need,"

Our Project: A WebAssembly Search Engine

In this project, we'll create a WebAssembly (Wasm) package using Rust, compile it to JavaScript, and run it in the browser. While you could build a similar search engine using JavaScript frameworks or vanilla JS (possibly with comparable performance), our goal is to explore the possibilities of Rust and WebAssembly.

Let's get started!

Setting Up the Rust Project

First, create a new Rust library project:

cargo new --lib algolia-clone-wasm
Enter fullscreen mode Exit fullscreen mode

Add the following to your Cargo.toml:

[lib]
crate-type = ["cdylib"]

[dependencies]
wasm-bindgen = "0.2"
serde = { version = "1.0", features = ["derive"] }
serde-wasm-bindgen = "0.4"
Enter fullscreen mode Exit fullscreen mode

Implementing the Search Engine

Let's define the core types for our search engine:

use serde::{Serialize, Deserialize};
use std::collections::{HashMap, HashSet};
use wasm_bindgen::prelude::*;

#[derive(Serialize, Deserialize)]
struct Document {
    id: String,
    content: String,
}

#[derive(Serialize, Deserialize)]
struct SearchResult {
    id: String,
    score: usize,
}

#[wasm_bindgen]
pub struct SimpleSearch {
    documents: Vec<Document>,
    index: HashMap<String, HashSet<String>>,
    visited_documents: HashMap<String, Document>,
}
Enter fullscreen mode Exit fullscreen mode

The core of our simple search engine implementation consists of three main components:

Data Structures:

Document: Represents a searchable item with an ID and content.
SearchResult: Represents a search result with an ID and a relevance score.
SimpleSearch: The main struct that holds our search engine's state.

Now, let's implement the SimpleSearch struct and its methods:

#[wasm_bindgen]
impl SimpleSearch {
    #[wasm_bindgen(constructor)]
    pub fn new() -> Self {
        SimpleSearch {
            documents: Vec::new(),
            index: HashMap::new(),
            visited_documents: HashMap::new(),
        }
    }

    pub fn add_document(&mut self, id: String, content: String) {
        let doc = Document {
            id: id.clone(),
            content: content.clone(),
        };
        self.documents.push(doc);

        for word in content.to_lowercase().split_whitespace() {
            self.index
                .entry(word.to_string())
                .or_insert_with(HashSet::new)
                .insert(id.clone());
        }
    }

    pub fn search(&self, query: String, limit: usize) -> Result<JsValue, JsValue> {
        let lowercase_query = query.to_lowercase();
        let query_words: Vec<&str> = lowercase_query.split_whitespace().collect();
        let mut scores: HashMap<String, usize> = HashMap::new();

        for word in query_words {
            if let Some(matching_docs) = self.index.get(word) {
                for doc_id in matching_docs {
                    *scores.entry(doc_id.clone()).or_insert(0) += 1;
                }
            }
        }

        let mut results: Vec<SearchResult> = scores
            .into_iter()
            .map(|(id, score)| SearchResult { id, score })
            .collect();
        results.sort_by(|a, b| b.score.cmp(&a.score));
        results.truncate(limit);

        serde_wasm_bindgen::to_value(&results).map_err(|e| e.into())
    }

    pub fn get_document(&self, id: String) -> Result<JsValue, JsValue> {
        if let Some(doc) = self.documents.iter().find(|d| d.id == id) {
            serde_wasm_bindgen::to_value(doc).map_err(|e| e.into())
        } else {
            Ok(JsValue::NULL)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The search method implements a basic search algorithm:

It splits the search query into individual words.
For each word, it looks up matching documents in the inverted index.
It keeps a score for each document, incrementing it for each matching word.
Finally, it sorts the results by score and returns the top matches.

The wasm-bindgen crate provides a bridge between JavaScript and Rust types, allowing seamless interaction between the two languages.

Compiling to WebAssembly

To compile our Rust code to WebAssembly, run:

wasm-pack build --target web
Enter fullscreen mode Exit fullscreen mode

This command creates a pkg folder in your project root, containing the necessary files to run your WebAssembly code.

Using the Search Engine in a Web Page

Here's a simple HTML file demonstrating how to use our search engine:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Rust + WebAssembly Search Engine</title>
    <script type="module">
        import init, { SimpleSearch } from './pkg/algolia_clone_wasm.js';

        async function run() {
            await init();

            const searchEngine = new SimpleSearch();

            // Add sample documents
            searchEngine.add_document("1", "TypeScript is a typed superset of JavaScript");
            searchEngine.add_document("2", "JavaScript is a popular programming language");
            searchEngine.add_document("3", "Algolia provides powerful search capabilities");

            // Set up event listener for search
            document.getElementById('searchForm').addEventListener('submit', (e) => {
                e.preventDefault();
                const query = document.getElementById('searchInput').value;
                const results = searchEngine.search(query, 10);
                displayResults(results);
            });

            function displayResults(results) {
                const resultsContainer = document.getElementById('results');
                resultsContainer.innerHTML = '';
                results.forEach(result => {
                    const doc = searchEngine.get_document(result.id);
                    const resultElement = document.createElement('div');
                    resultElement.innerHTML = `<p><strong>ID:</strong> ${result.id}, <strong>Score:</strong> ${result.score}</p><p><strong>Content:</strong> ${doc.content}</p>`;
                    resultsContainer.appendChild(resultElement);
                });
            }
        }

        run();
    </script>
</head>
<body>
    <h1>Rust + WebAssembly Search Engine</h1>
    <form id="searchForm">
        <input type="text" id="searchInput" placeholder="Enter search query">
        <button type="submit">Search</button>
    </form>
    <div id="results"></div>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Conclusion

Forgive me if you thought this will be helpful it was just for fun and as i was bored and wanted to try something it was about demonstrating the potential of combining Rust and WebAssembly to create a simple search engine. While this implementation may not be as feature-rich as Algolia, it showcases the possibilities of using Rust in web development.

In future iterations, we could:

  1. Implement an in-memory database to store visited pages
  2. Benchmark this implementation against JavaScript frameworks
  3. Add more advanced search features, such as fuzzy matching or relevance scoring

Thank you for reading!

Top comments (0)