Photo by Andrew Ridley on Unsplash
Recently I was tasked with improving the existing search functionality of a web application, as part of a much greater long-term effort to improve the overall user experience of the product.
The app in question is a Software-as-a-Service (SaaS) platform targeted to small businesses and medium enterprises. The specifics of the application are not relevant to this post, only that each client gets their own "portal" in our cloud-hosted environment and can manage users scoped to their organization.
The existing search functionality works exclusively as a way to find and navigate to the profile of other users in the portal. However, there were several drawbacks that customers complained about and that our product team recognized could be improved with redesign and re-implementation. Simply put, those were:
- Lack of flexibility. The logic for finding entries was straight-forward and didn't capture very common use-cases. The search capabilities lacked in comparison to other products and did not meet user expectations.
- Lack of functionality. Much more could be baked into the search functionality. Not just finding users, but site navigation in general. It could and should be a feature capable of answering as many questions a user could have about the app.
- Out-dated design. Since it was one of the first features ever built, its looks did not match the design language used more recently elsewhere in the app.
- Performance. It was unacceptably slow and users noticed. Its speed was considerably slower than what one would expect for this type of feature.
The goal of the project was to address all those items and release a more intuitive and capable new search experience that users would want to use more often, reduce the number of support cases asking simple questions, and naturally help our customers be more productive on their own.
An entire re-write made sense given the conditions, rather than a simple fix or changes on top of the existing code. Besides the user-facing goals of the project, this was also an opportunity for us to remove legacy code that relied on old frameworks and libraries in the client-side, and replace it with a modern component written with React and carefully tested.
New Functionality
The app in question is really big and complicated. Over time our team had received feedback on the difficulties users had navigating it.
This is when the product team recognized that we could do something to address that with an improved search. The existing search functionality could only find other registered users in the portal and you would use it to navigate to their profiles. However, the way it was built was very simplistic and not very helpful.
First, we improved the user search by factoring in some other data in the filtering logic instead of just the usernames or full names; like connections, identification numbers, and anything else that made sense that was associated with the user entity in the database.
Beyond that, we also enabled it to search through the entire site map so that results would show up when keywords related to specific pages or tools were searched for. If you searched for "settings", a result would show up for the Settings page and you could just click to get to it, instead of manually relying on the regular navigation menu. This is advantageous since some of the parts in the app are hard to find and deeply nested within other menus or routes.
To achieve this we had to build a massive object that contained all of the necessary metadata of all the routes in the site. That metadata would contain properties like tool or page name, associated search keywords, and URL path, and also had to account for logged-in user permissions since not all routes are visible to everyone depending on their role.
This object had to be manually crafted and maintained since the metadata cannot be automatically derived. This means that when adding new routes to the app we had to remember to go back and update that object or otherwise it wouldn't show up in the new search tool.
To avoid this, I refactored the way our routes were defined throughout the app and created a single function that would return all the route definitions instead. I then added a check at the end of that function that would compare the collection of routes with the search tool metadata object. If there are any discrepancies, I render a full-screen error overlay in the app during development mode with instructions on how to proceed. It looks like this:
This was extremely important for us because there are four development teams with about five engineers each contributing to this repository daily in a very fast-paced environment. Unless we have an automatic way to make sure it is kept up-to-date, we would not have been able to keep the search tool working as expected over time. It is not feasible for us as an organization to review every single pull request that is merged in.
There were a few other things that the product team wanted to include in the search results that did not match the "navigation" category. We have some widgets like real-time chat and help desk support that can be used anywhere. If we wanted to promote this new search tool as an all-in-one place to find everything you need, a way to trigger those from it had to be included.
This was not particularly difficult, but the fact that the search results could be anything meant that the API design, filtering logic, and UI had to be flexible enough to support this. Beyond that, the possibility of adding different types of results in the future required an additional level of thought effort as well.
Another very subtle detail was added. At first, I did not think anything of it when I saw it on the designs, but it ended up becoming my overall favorite feature after implementation and release: a list of recently selected search results every time you focus the search input and open up the search panel. This can save the user many clicks and navigations, notably speeding-up the process of moving around the app. This alone accelerates productivity and enhances the user experience tremendously.
Improving user search performance
The existing search functionality was built using Backbone.js and relied on jQuery UI Autocomplete. Its UI did not look very different than the vanilla example hosted on that site. It had a "typeahead" or "autocomplete" behavior that would suggest entries to the user as they typed into the textbox. Those entries would be the names of other users in the portal.
Behind the scenes, the technical approach was the usual associated with this type of component. There is a debounced change event listener that only triggers after the user has stopped typing for an arbitrary short amount of time chosen by the developer. When that debounce timer is cleared, a callback is executed with the logic to compute the suggestions. This callback was mostly an asynchronous network call to a server that would query a database and do some logic based on the input.
The debounce aspect is an optimization that aims to reduce the amount of unnecessary work as much as possible. It does not make much sense to compute suggestions for every single keystroke on the text input, since the user is most interested in those pertaining to the already complete or semi-complete search term.
What I have described so far is practically the de-facto way of building typeahead or autocomplete components and almost every site out there with a search functionality behaves this way.
What makes the most sense as an approach to improve the performance consists of optimizing the server code that accesses the database and computes the suggestions as much as possible. After analyzing the endpoint used I noticed a lot of low-hanging fruits that would have a noticeable positive impact without much effort.
The endpoint in place was a general-purpose resource controller action and used in several other places of the application. It had a lot of code in it that was irrelevant to the search. This meant that not only the execution duration was longer, but also the returned payload from the server was much bigger than necessary as it contained an excessive amount of data that the search didn't use. This resulted in an overall longer network round-trip, and a higher memory footprint.
Let's look at some real production metrics:
This shows the duration of network round-trips for this endpoint when used specifically for the legacy search functionality. The unusual random peaks obfuscate the visual information a little bit. I tried to find a significant period that did not have one but could not, so left it in as it represents the real nature of the behavior of the endpoint anyway.
We can focus on the averages and minimums. Even when looking at longer periods, the average of ~500ms (half a second) is maintained. However, the reality is that the performance differs per portal.
Organizations with fewer users will experience a duration much closer to the minimum of 150 - 200 ms, whereas our biggest portals so far experience a consistent 1 - 1.1 seconds, with some peaks of up to 5 or 10 seconds occasionally.
So, if you are unlucky enough to be part of one of the biggest organizations, you would have to wait at a minimum 1.5 seconds before the search displayed suggestions when we account for the debounce time and DOM rendering duration in the browser. This would be an awful user experience.
Generally, I am a huge advocate for standard and spec-compliant RESTful APIs and very much against single-purpose endpoints in most cases. For this scenario, however, doing just that makes total technical sense given the constraints, the goal, and the return of investment.
If we create a new endpoint that only does and returns the bare minimum the same metrics would look considerably different. This was discussed with the rest of the development team and we all agreed. Now we had a plan to move forward.
Nevertheless, after sleeping on it, it occurred to me that although that approach makes sense in general, for our particular case a filtering logic happening on the client-side rather than on the server could potentially yield drastically better performance improvements, as the number of records to be searched through for each portal is in the order of magnitude of thousands in the worst-case scenario, rather than millions.
In other words, if you have to perform a search over millions and millions of records, without a doubt you need to execute this logic on the server and have an optimized database or search engine to do that heavy lifting. But if you are only searching through hundreds or thousands of records, up to a certain limit it makes sense to not involve a server at all and let the user's device do it.
This is our case because our haystack is the users that belong to a certain organization, and not only do we know exactly that number, we also have an established business target that caps that number to a limit that we control.
With that hypothesis in place, I needed to confirm that it was indeed a good idea. Using this approach would mean that we would have to return a payload to the browser with a set of ALL users registered so that when they used the search bar, we already had them in memory and ready to be filtered through. This brings up a few questions that would concern any experienced front-end engineer:
- What would the total size of that payload be?
- How long would it take to download that payload?
- Are there significant memory implications of having this big data set in the browser instance?
- When performing the search, wouldn't this heavy computation of filtering through thousands of array items in the client potentially freeze the browser's tab?
- How fast can the browser filter through thousands of records?
To make a technical decision we need to take into account business variables too. When dimensioning, it is wise and common to discuss worst-case scenarios, e.g. how big is the total size of the payload for our theoretically biggest organization, but we also have to recognize that that scenario might only account for 0.01% or less of the user population and that we can have a 99% percentile or above with completely more reasonable numbers.
Take the payload download duration, for instance. It is true that under a 2G/EDGE or low bandwidth connection this approach could fail to meet an acceptable user experience when the haystack is big enough but is it not true that every application out there is meant to or will be used with this type of connection.
This is when having good reliable data about your users and your business audience pays off. Just as an example, it makes no sense to rule out a technical solution because it does not work in low-end mobile devices if none of your users are relying on mobile to access the application in the first place. I believe this is where a lot of optimization-oriented engineers drop the ball. When they fail to recognize or to account for the demographics of their users.
With this in mind, I turned to our analytics and databases to scoop out all the information necessary to answer the questions above using sensitive percentiles. In other words, what would the answer be for 80%, 90%, 95%, 99%, 99.5% of our users, and so on? With this data, I put together low effort proofs of concept in our testing servers that could illustrate the problem in practice and started doing some experiments.
The results were extremely positive. The browser was much faster than I had anticipated even in environments of low computational power, and I started to get excited at how much of a perceived difference it would be in the user experience after we completed the project. It was time to start building the real thing.
Typeahead component
In the legacy implementation, I mentioned that jQuery UI's Autocomplete plugin was used in a component built with BackboneJS. For the new one, we wanted to re-write it in React. We could have still relied on jQuery UI, but the truth is that the plugin itself had a few bugs associated with race-conditions so it was not perfect by any means.
We also wanted more flexibility and potentially remove any jQuery dependency in the app altogether in the future, so parting ways, and doing it from scratch was a better option. Thanks to the ergonomic design of React's API it is not that hard to build an autocomplete or typeahead anyway, so it was a no-brainer.
The component can be summarized as "a textbox that displays suggestions to the user as they type in it". As for technical acceptance criteria, we can establish:
- The suggestions are not computed on every keystroke.
- The suggestions should be computed after the user has stopped typing.
- Should be fast.
- If there are more suggestions than what can be displayed, the suggestions panel should be scrollable.
- Should support mouse and keyboard interactions.
- Arrow keys highlight the suggestion below or above.
- Home and end keys take the user to the first or last suggestion result.
- Page up and down keys scroll the suggestions panel.
- Mouse wheel scrolls the suggestions panel.
- Enter key on a highlighted suggestion selects it.
- Escape key closes the suggestions panel and clears the text in the input.
- Should be fully accessible and conform to the "listbox" role requirements as established by the Accessible Rich Internet Applications (WAI-ARIA) 1.1 specification (see https://www.w3.org/TR/wai-aria-1.1/#listbox and https://www.w3.org/TR/wai-aria-practices-1.1/#Listbox).
As far as the asynchronous nature of the interactions on the input and the suggestions computation, the Observer pattern paradigm fits perfectly well with the problem domain, so I built a solution using RxJS. The reason why it fits so well becomes clear if you try to compare the code that achieves the same visible behavior with and without it.
This is not meant to be an RxJS tutorial so I will not spend too much time focusing on the reactive details. A simple version of the subscription that achieves what we want could look like this:
import { BehaviorSubject } from 'rxjs'
import {
debounceTime,
distinctUntilChanged,
filter,
switchMap,
retry,
} from 'rxjs/operators'
import { computeSuggestions } from './computeSuggestions'
const minLength = 2
const debounceDueTime = 200
const behaviorSubject = new BehaviorSubject('')
// ...
const subscription = behaviorSubject
.pipe(
debounceTime(debounceDueTime),
distinctUntilChanged(),
filter((query: string) => query.length >= minLength),
switchMap((query: string, _: number) => {
return computeSuggestions(query)
}),
retry(0)
)
.subscribe(
value => {
// set suggestions
},
error => {
// handle errors
}
)
// ...
input.addEventListener('click', e => {
behaviorSubject.next(e.currentTarget.value)
})
If we pass through the input value to the behavior subject every time the input changes, the operators piped to it guarantee that this subscription will execute the first callback passed to .subscribe()
if:
a) the value is 2 or more characters long,
b) the user has stopped typing for 200 milliseconds, and
c) the last value that triggered the callback execution is not the same as the current one.
This could be easily integrated into a React component and we would have a very elegant and concise way of handling a stream of input change events in the way we need for our typeahead. Add the keyboard events handling logic, and we have all we need.
However, instead of doing that we can offer a more flexible solution if this is packed into a "headless" React hook with no UI concerns and shift that responsibility to the consumer. This way, we achieve a true separation between logic and view that allows us to re-use this hook in any situation without any changes no matter what design we have to adhere to.
This CodeSandbox has a complete and very similar implementation of the "useTypeahead
" hook that I wrote for the feature, but with a completely different UI treatment, which demonstrates the flexibility of the API design.
Blocking the Main Thread
JavaScript is a single-threaded programming language. The fact that we would be doing the filtering in the browser instead of the server implies that the computation would not be an asynchronous operation anymore.
This is problematic because it means that as long as JavaScript is busy running our filtering logic and iterating through thousands of items the browser cannot do anything else, which results in a literal freeze of the tab. In this scenario, many interactions like JS-based animations, typing in inputs, selecting text, and others, become completely unresponsive. You have most likely experienced this before, and we usually refer to this as "blocking the Main Thread".
MDN has a much better definition of what's going on:
The main thread is where a browser processes user events and paints. By default, the browser uses a single thread to run all the JavaScript on your page, as well as to perform layout, reflows, and garbage collection. This means that long-running JavaScript functions can block the thread, leading to an unresponsive page and a bad user experience.
— MDN
Thankfully though, the browser is extremely fast. Even when filtering through thousands of records it only takes a few dozen milliseconds at worst on medium-end devices, which is not long enough for a user to notice any frozen or blocked behavior.
I wanted to be responsible and professional anyway and not block the main thread if possible. Thankfully (again), it is possible to do so by using a browser feature called "Web Workers".
Web Workers have been around for over 10 years but for some reason, they haven't gone mainstream yet. I blame it on how difficult they are to integrate into your development and deployment flow ergonomically. If you haven't heard of them, they're essentially an escape hatch that browsers provide to run code in a separate thread different from the Main Thread, as to not cause any blocking. There are certain caveats to using them but nothing that represented a deal-breaker for my use-case. The only real challenge was being able to integrate them seamlessly into our architecture and having them work with our infrastructure.
Web Workers are a little bit awkward to use in the sense that you have to pass in a path to a JavaScript file where your threaded code lives in, then you use asynchronous event messages to pass information back and forth.
// main.js
const worker = new WebWorker('../my-worker-file.js')
worker.postMessage('hello world')
// ../my-worker-file.js
onmessage = function(msg) {
console.log(msg)
}
Just like any modern big scope single-page application, we bundle all our code together into a few processed files that we then statically serve to the browser at runtime, so there is never a one-to-one relationship between the file that lives in our source code and the file that is served to a user. Meaning, although we might have a file in our repo located at src/my-worker-file.js
, that does not mean there is going to be a my-worker-file.js
hosted in a server, since it is going to be prepackaged into our production bundle, with the rest of the codebase.
We could simply just opt to not bundle it and serve it directly as-is so that the code snippet above would work, but that means we would have to be manually editing our bundling configuration every time we wanted to rename, add or remove worker files. With the added risk that there would be a disconnect between our main thread code and those files at compile-time. We would have to remember to keep these changes in sync and do that manually, without any automated help from the build tooling. Needless to say, this is very brittle and not a good developer experience at all.
Ideally, it would be great to have an abstraction that allowed us to instantiate Web Workers anywhere in the codebase without having to update bundling configuration at all, while at the same time allowing usage of dependencies, share code across threads and keep all our compile-time checks in place like linting, import and exports checks, and type-safety.
The goal would be to have something similar to this work as expected, even when bundling is involved:
// main.js
import worker from '../my-worker-file'
worker.postMessage('hello world')
// ../my-worker-file.js
onmessage = function(msg) {
console.log(msg)
}
Of course, one can build tooling to achieve this, but there are great ones already available in the community, like Comlink by Surma and Workerize by Jason Miller.
I used workerize
since it fit my use-case better, and along with workerize-loader
, it provided exactly what I wanted and even more. I replicated the configuration used in this minimal set-up repo which even includes test setups for both Jest and Mocha: https://github.com/reyronald/minimal-workerize-setup.
You can see an online demo here, which also demonstrates the Main Thread problem that I stated before pretty clearly.
No web worker | Using web worker |
---|---|
I used that same set-up and located the filtering logic in a separate thread, which guaranteed the browser's responsiveness even when heavily throttling down the CPU.
There is something else in the set-up that is included in the sample repo that I want to bring attention to. While working on this part of the project I started thinking of other places in the app that could benefit from moving code into a separate thread, but I did not want to spawn a new thread each time for every different piece of logic because in some instances there could be multiple needed in the same page.
Instead, I wanted to have a simple easy-to-use mechanism that could be leveraged to share Web Worker instances across the entire application, while making sure they were always terminated when no longer needed. This is the API I went with:
function ComponentA() {
const [
requestWorkerInstance,
releaseWorkerInstance,
getWorkerInstance,
] = workerManager()
React.useEffect(() => {
requestWorkerInstance()
return () => {
releaseWorkerInstance()
}
}, [requestWorkerInstance, releaseWorkerInstance])
// ...
const instance = getWorkerInstance()
instance.doSomeHeavyAsyncWork()
}
In any component, you can get an instance to a single Web Worker thread by calling getWorkerInstance()
. However, you have to make sure to call requestWorkerInstance()
before so that a new one is spawned for you if it does not exist yet. If one is already available, you will get that instead.
When you are done and not going to need access to the thread anymore, then you call releaseWorkerInstance()
, which will terminate it as long as no other consumer is depending on it.
The references of requestWorkerInstance
and requestWorkerInstance
never change so it is safe to include them as React.useEffect
's dependencies, which makes it easy to integrate this system into any component. The most common flow would be requesting an instance when the component mounts and releasing it when it unmounts.
Internally, those functions keep track of how many consumers are depending on those instances at any given time so that they know when to instantiate a new one or terminate the current one. It is a singleton pattern applied to Web Worker threads.
The "worker manager"'s code is very simple and looks a little bit like this:
import workerizeFactory from './my-worker.worker'
let instance
let instanceCreated = false
let consumers = 0
const requestInstance = () => {
if (!instanceCreated) {
instance = workerizeFactory()
instanceCreated = true
}
consumers++
}
const releaseInstance = () => {
if (--consumers === 0) {
instance.terminate()
instanceCreated = false
}
}
const getWorkerInstance = () => instance
export function workerManager() {
return [requestInstance, releaseInstance, getWorkerInstance]
}
The actual version that I used is a little more complicated to accommodate for correct and proper type checks with TypeScript. You can see the full version in the CodeSandbox and repo posted above.
Smart Search logic
I mentioned earlier that we wanted this new search to be more flexible and smarter. I thought it would be cool if the matching algorithm worked similarly to other tools we developers use every day. I am talking about the approximate or fuzzy matching baked into the navigation search bar that apps like VSCode, Sublime Text, and even Chrome's DevTools have.
If you are not familiar, the logic will match any results that have the same input characters in the same order of appearance, but without the requirement that those characters appear consecutively. For example, the input "shnet" will match "Show Network". See the screenshot below.
Personally, I completely abuse and adore this feature of every software I use that has it. To me, it was a no brainer that this would improve the user experience. I went with it.
We released a version of the search with this matching logic, and to my surprise, users did not like it at all. A lot of them were very confused when they saw results that did not obviously resemble what they searched for, and instead of ignoring it or accepting it, they got concerned and even reached out to the support team to report them as bugs.
After getting overwhelmed with this type of feedback, we decided to remove the fuzzy matching aspect and go with exact matches. But product managers still wanted some level of tolerance to typos, and they also wanted results to be prioritized in their order of appearance in a "smarter" way, but they could not articulate properly how they wanted this to happen.
It was up to me to come up with a logic that was not just filtering out items that did not match the query, but that also had sensitive ordering and less aggressive approximate matching.
This was going to be a nightmare to deliver because we had to please the "gut feeling" that the results were good, without having explicit acceptance criteria items or clear requirements. It was obvious that it would require numerous iterations of design, development, release, then back to the drawing board to refine whatever heuristics were in place until the product managers and stakeholders were satisfied.
Instead of doing that, I decided to have a more unconventional approach to what we usually have in our team when it comes to new features. I built a CodeSandbox with about 2 or 3 different filtering strategies and some sample data, that would display the results of all of them side by side on the same screen, and sent it to our product manager. He would play around with it and give me feedback on what he liked, disliked, and what he would expect. I used this feedback to build unit tests, improved the heuristics, added a new iteration of the search logic, and repeated the process.
Ultimately we ended up with about 9 different strategies before we settled on one we were comfortable with. Many different libraries were used including Fuse.js, match-sorter, fuzzladrin-plus, and others. Some approaches were completely zero-dependencies, and some others were hybrids.
The one that took the cake worked something like this:
For user search...
- Use Regex to find exact partial or complete matches of different words separately. Input terms have to be properly sanitized since the regular expression is built dynamically.
- Sort the results that matched based on the index of the match. Matches that are closer to the start of the word should show up first. E.g., for the term "ron", "RONald" should show up before "byRON".
- Break sort ties to the above alphabetically, so that if several results had the same match index, they show up A-Z in the UI, making it easier for the user to find what they want.
For non-user search (questions, tools, commands, pages, etc.)...
This is a little more complex since those items have search keywords associated with them in the metadata that user entities do not need to have, and these need to be factored into the logic.
- Use Regex to compare the search term with a computed string that contains both the entity's primary name or string representation, and its search tags. If the regular expression matches, we then do a direct comparison of the search term only with the name. If both match, it is pushed to the results collection with a priority of 0. In this algorithm the lower the priority score the better. If just the regular expression matches, and not the direct equal comparison, it is pushed with a priority of 1. For example, if there's an item called "Settings" and the user searches for "settings", it would be a match with a score of 0. If they searched for "setti", it would be a match with a score of 1.
-
If the previous step failed, the user most likely made a typo. In this case, we cannot use a regular expression anymore. Instead, I iterate over all the separate words of the search term that are 5 characters or longer and compute the Levenshtein distance between them and all the search tags associated with each result individually. The 5 character limitation is there because the fewer characters you have in a word, the many more other words it resembles by just changing 1 or 2 characters. In other words, there were too many mismatches otherwise.
If for all cases there is an acceptable distance, we decide that it is a match. Before we push it though, we check if the term that matched also equals the item's primary name. If it does, it is pushed with a priority of 2, otherwise 3.
Finally, we sort these results based on the aforementioned "priority" so that ones with a lower score show up first.
This produces a set of results for each search term that is very intuitive, feels organic, almost hand-picked, and is very easy to navigate through.
End Result
As with every release, we always try to gather as much data and feedback as possible so that we can gauge the success of every project. On this one, we included many statistical metrics to help us understand how our users were employing the new search and how we could improve either the implementation or the metadata associated with each result to bump their visibility appropriately.
A good one to discuss is usage duration. It measures how long it takes the user from the moment they focus the search input to the moment they select a search result or exit the search. This helps us know if they are finding what they need quickly enough. If it is too long, it means that the users are struggling.
The image above shows that in the last 30 days, in 73.4% of the instances a user result was selected within 0 to 5 seconds. The next runner-up is 5-10 seconds with 20.8%. Both of these account for 94.2% of the searches, and the biggest percentile corresponds to the shortest amount of time, so I consider this a positive outcome.
We also include a survey box in the app itself via Appcues. On a scale from 1-6, with one being the worst and six being the best, the new search functionality was well received with an average of 5.2 out of 6. Some quotes from participants:
I love this enhancement! This is very helpful when we work on reports.
and
I can speak to it as exciting to have such a great update.
Now let us look at the most interesting metric to me, performance. This graph is over a longer period than the legacy one, two weeks instead of just one.
Legacy | New | |
---|---|---|
min | 158.21ms | 3.25ms |
avg | 562.47ms | 17.11ms |
max | 9,950.00ms | 121.13ms |
The difference is astounding across the board. On average, it is 30 times faster than the legacy implementation. Not only that, but this duration is much more consistent across different portals regardless of size and it's not dependent on network conditions, meaning that our bigger portals are experiencing up to 80 times the performance, maybe even more.
This validates all of the hypotheses I made at the grooming stage of the process, so I was very satisfied to see that my predictions came true. I closely monitored this metric following the formal release to make sure there were no exceptions and everyone was having a smooth experience. No surprises were found.
Conclusion
The biggest conclusion I want to draw attention to is that even though something may sound sub-optimal in theory and does not fit already established best practices, it does not mean that it will be in the real world when we factor in actual business variables and data.
A client-side approach like this would never work in the majority of cases of search functionalities. This scenario usually makes it more difficult to think outside of the box and come up with alternate solutions. The nature of our problem specifically was different and we failed to recognize that as a team in our first discussions about the project, but thankfully, we recognized that before investing any significant effort.
Another success of the process was writing down the questions and concerns we had with the approach, and answering them experimentally with real data and low-effort proofs of concept in a spike early in the project. This gave us the confidence we needed before formally committing to any technical decisions, and above everything, real, not just theoretical technical arguments to back-up those decisions. This in particular is something that our team was not used to doing and has struggled with in the past, and we have had to pay a big price as a result.
Just for completeness sake, the CodeSandbox below is an oversimplified visual representation of what I built. It is lacking many of the details I described in the post and some others that I did not mention. For instance, it just searches for one entity type, users, does not rely on Web Workers, is lacking a lot of code we included to gather metrics, and has no automatic tests.
Top comments (0)