Introduction
Have you ever been searching for an item but you don’t know how it's supposed to be spelled?
Most eCommerce sites are not typo-tolerant. Ease of use is a great plus in the user experience . This is why ux is important for building search engines.
In this tutorial I will use the typesense API to demonstrate how typesense handles typo errors made by users. I will use data that I scraped from jumia .
The data is stored in .csv format and you can download it from here.
STEP 1: Convert our CSV data to JSONL
The documents to import need to be formatted as a newline delimited JSON string,aka JSONLines format.
sudo apt install miller
mlr --c2j cat deodorants.csv > deodorants.jsonl
STEP 2: Install typesense cloud
The easiest way to run typesense is using typesense cloud
Go to Typesense
Sign-in with Github
Pick a configuration and click on Launch. You'll have a ready-to-use cluster in a few minutes.
Then click on "Generate API Key", which will give you the hostnames and API keys to use in your code.
STEP 3: Install client
pip install typesense
STEP 4: Initialising the client
Configuring the typesense client by pointing it to the typesense node.
Since we are using Typesense Cloud, click on the "Generate API key" button on the cluster page. This will give you a set of hostnames and API keys to use.
import typesense
client = typesense.Client({
'nodes': [{
'host': '51owicq83p97fb0xp-1.a1.typesense.net',
'port': '443',
'protocol': 'https'
}],
'api_key': 'gQZZpXyeoubYaQoz4UrrHuN2uKHFPVYn',
'connection_timeout_seconds': 2
})
STEP 5: Creating a collection
Collection - a group of related documents. sort of like relational tables
When we create a collection, we give it a name and describe the fields that will be indexed when a document is added to the collection.
In Typesense, every record you index is called a Document
#creating a schema
deodorants_schema={
'name':'deodorants', #The name of our collection
'fields':[
{'name':'Name','type':'string', 'facet':True}, # Name field with a string fieldtype
{'name':'Price','type':'float'}# price field with a float fieldtype
],
'default_sort':'price' #determines how the results must be sorted when no sort_by clause is provided.
}
client.collections.create(deodorants_schema)
A facet field allows us to cluster the search results into categories and lets us drill into each of those categories
STEP 6: Add documents to the collection
There are two ways of doing this.
1. Using Python
with open('deodorants.jsonl') as jsonl_file:
client.collections['deodorants'].documents.import_(jsonl_file.read().encode('utf-8'), {'action': 'create'})
2. Using Typesense Cloud
This also very simple if you have very few documents.
Click add document and in simply enter the documents inside the curly braces
Check the number of records by clicking schema and check "num_documents"
STEP 7: Search for items from deodorants collection
#search for a deodorant
search_parameters ={
'q':'NIVEA',
'query_by':'Name',
}
client.collections['deodorants'].documents.search(search_parameters)
Sample Response
{'facet_counts': [],
'found': 14,
'hits': [{'document': {'Name': 'NIVEA Fresh Cherry Anti-Perspirant Spray For Women - 150ml',
'Price': 365,
'id': '61'},
'highlights': [{'field': 'Name',
'matched_tokens': ['NIVEA'],
'snippet': '<mark>NIVEA</mark> Fresh Cherry Anti-Perspirant Spray For Women - 150ml'}],
'text_match': 33514496},
{'document': {'Name': 'NIVEA MEN Cool Kick Anti-Perspirant Spray For Men, 48h - 150ml',
'Price': 375,
'id': '60'},
'highlights': [{'field': 'Name',
'matched_tokens': ['NIVEA'],
'snippet': '<mark>NIVEA</mark> MEN Cool Kick Anti-Perspirant Spray For Men, 48h - 150ml'}],
'text_match': 33514496},
{'document': {'Name': 'NIVEA MEN Black & White Invisible Original, 48h - 150ml (Pack Of 2)',
'Price': 800,
'id': '59'},
'highlights': [{'field': 'Name',
'matched_tokens': ['NIVEA'],
'snippet': '<mark>NIVEA</mark> MEN Black & White
'out_of': 14,
'page': 1,
'request_params': {'collection_name': 'deodorants',
'per_page': 10,
'q': 'NIVEA'},
'search_cutoff': False,
'search_time_ms': 0}
STEP 8: Filtering results
In this I will search for the item nivea but type it as Nyvea. Then filter the reults for nivea products that cost less than 500 and sort the prices in ascending order
#Filtering results
search_parameters ={
'q':'nyvea',
'query_by':'Name',
'filter_by':'Price:<500',
'sort_by':'Price:asc'
}
client.collections['deodorants'].documents.search(search_parameters)
Sample response
{'facet_counts': [],
'found': 26,
'hits': [{'document': {'Name': 'NIVEA Pearl & Beauty Anti-Perspirant Rollon For Women, 48h - 50ml',
'Price': 270,
'id': '94'},
'highlights': [{'field': 'Name',
'matched_tokens': ['NIVEA'],
'snippet': '<mark>NIVEA</mark> Pearl & Beauty Anti-Perspirant Rollon For Women, 48h - 50ml'}],
'text_match': 33317888},
{'document': {'Name': 'NIVEA Fresh Cherry Anti-Perspirant Rollon For Women - 50ml',
'Price': 330,
'id': '93'},
'highlights': [{'field': 'Name',
'matched_tokens': ['NIVEA'],
'snippet': '<mark>NIVEA</mark> Fresh Cherry Anti-Perspirant Rollon For Women - 50ml'}],
'text_match': 33317888},
{'document': {'Name': 'NIVEA MEN Black & White Invisible Fresh Anti-Perspirant Rollon - 50ml',
'Price': 355,
'id': '89'},
You can see that Typesense is typo-tolerant from the response.
Conclusion
That is how simple it is to build a search application with Typesense API. There is a lot that can be done with Typesense, like creating Search UI. To get more functionalities check the API reference. To add to all of these, it is Open Source!
Top comments (8)
Always knowledgeable....
Thanks
awesome work
Thank you
Wow, brilliant work!
Thanks
UX on websites is really important
I agree