Introduction
I'll warn you now, I may sound like an Elasticsearch salesperson in this article but I can assure you that I'm completely impartial.
Until six months ago, I'd used Elasticsearch little more basic document retrieval on simple search terms. I almost resented it as another query "language" to learn. That all changed when I moved into a position which used ES extensively and I needed to get up to speed quickly.
I read the book Relevant Search (https://www.manning.com/books/relevant-search) in two weeks. A superb read which really highlighted the difference between basic document retrieval and real relevant search. I began to see the real power in Elasticsearch. It fired up my imagination in to ways I could have used it with great effect in previous projects.
I wished I'd taken some time to understand the features it offers before now and I believe it could be very valuable to many developers to at least have an overview which is what I hope to give here. It may enable you to make a better decision on software selection.
Notable features
Search engine
Ok, so it's a search engine but your company doesn't require a "search engine". You use some database queries to your NoSQL/relational database which does a decent job?
That's what I thought however the key here is relevance. It's about getting the user where they want/need to be as quickly and as simply as possible. Yes, you can write a database query which puts some data in front of a user which roughly relates to what they put in the search box. And you can tinker with query but it soon becomes very complex and hard to maintain.
Consider search results specific to a user weighted by their previous searches, their favourites, their age, their gender. I don't envy you having to write that SQL query!
Elasticsearch will allow you to query on any of your indexed fields. You can filter your results to create a context for your search. For example, if you the user had specifically said they are searching for a book in your multi-department store, you would filter much as you would in the WHERE
clause of a SQL statement.
The good stuff comes when you start telling Elasticsearch that the result may have a certain term or that term but one term should be boosted above the other by a certain factor. You may want to boost more highly rated products or maybe you have some stock that you need to move, that could be boosted in the results.
If you had a library of articles, you may want the most recent ones to be boosted above older articles. If your results were location based, you could boost on distance from the user.
I hope this starts giving you an idea of what relevance is and how a search engine could be useful after all. But that's just the start...
Auto-suggest
ES has built in functionality for auto-suggest. When the user is typing in the search box, you can preempt their request and put it in front of them as they type. Once again, this may have been something you could do with your current database.
ES can be "smarter" though. You can build your indexes so that it looks in any part of words or phrases in multiple fields. If indexed well with good thought, ES can return its results very quickly and minimal overhead on the server.
Aggregation and Facets
ES can return aggregation results in the same result set as the query results. This alone is an appealing feature but it really makes sense. When searching for something, if a user can't find what they were looking for in the results returned, they want an easy step to filter their result set further. Consider I've just searched for TVs on Amazon and it returns thousands of results, how do I get closer to what I want? Facet filters - these are the properties that Amazon display on the left hand side of their desktop page. It allows me to filter further on brand, size, price range etc. ES can easily return these facets and counts for the result set returned.
Aggregations follow what you'd expect from any other database and allowing a pipeline to really dig in to your results.
More like this
A very simple feature to implement but really powerful. It does exactly as it suggests. It allows you to point to a document (or more) and tell ES that you want other results similar to the document you've suggested. This can be very useful for a user to potentially continue their journey on your site or an alternative product similar to the one they're viewing in a shop.
Kibana and visualisations
Kibana is a web UI allowing you to easily build visualisations for your data. It's relatively simple to use and really helps you dig deep in to your data.
It offers all the standard graphing types but also some quite imaginative alternatives. All of these can be put together in dashboards which can be shared and displayed in other pages or on a big screen.
Further reading
I've only touched the surface on an overview of Elasticsearch but I recommend following up with this slightly more in depth overview -https://www.elastic.co/blog/found-uses-of-elasticsearch
Summary
To make it clear, while I make several comparisons to databases, I'm not suggesting that Elasticsearch should replace your database however in many cases I believe it can complement your database.
It could open up functionality to your product that you maybe didn't consider possible. It's very, very fast so can really improve your user experience in places and potentially reduce the load on your database server.
Even if you can't think of a use case within your products, it could potentially open up a whole now world to your logs and analysis on your product performance. With the ability to analyse your data in time series data and dig deep, it could really revolutionise the way you look at your product.
Top comments (6)
Thanks Chris, for highlighting these not-widely-known capabilities of #elasticsearch. Relevance is an extremely powerful capability and possibilities are endless once you start using that. Another capability that we often forget is the capability to do analytics. I have worked with a complete, full-stack analytics platform that uses #elasticsearch at its core. Elastic is the storage engine and Kibana is the visualization layer, while Beats are used for specific capabilities as needed.
On Elastic's website, there are case studies talking about Petabytes of data. Companies like Uber and Fireeye have used Elastic's product to address use cases that are far beyond the classical 'enterprise search' use case. And that too at scale. Assuming there's been some use case-specific, intelligent modeling and capable hardware, Elastic is very, very fast. Plus, there are ways to make it work together with your preferred analytics/ML/DS toolset -- like R or Spark -- to provide customers with storage, analysis and visualization capabilities at the same time, especially when we are talking about an experimental, innovation platform within an enterprise.
Thank you for your comments. Absolutely spot on. I must admit, it's quite a gaping omission and I should have probably explained why in the article (it was intentional!).
The analytics abilities of ES are awesome as you say. The article was aimed at developers who may have had a misconception that ES is just another data store. The handful of features (a massive subset of those available!) were picked to highlight where ES may assist these developers to get certain functionality off the ground in a project where they may have previously sought far more complex solutions using databases.
I considered the analytics side to be a more complex feature not at the introductory level.
It's definitely worth mentioning though along with the uses and use cases you highlight. Thank you.
Thanks. Will look forward to more such informative articles from you, Chris. Cheers!
Thanks Chris for the insights on Elastic search, i just had a quick question though, if i wanted to build a search engine, what role would elastic search play or how can i leverage it. Thanks in advance
Hi. Elasticsearch is where the index resides for your data. This index makes your data searchable in ways not possible by databases. It's unlikely to be your main document store (although, I suspect many projects do use it as such) and you would populate your index from that. It's highly likely that your search index would be a different structure to your data store so it would likely to go through some level of transformation on the way in.
I appreciate there seems a lot of uncertainty in the previous paragraph but most of it depends on your search requirements.
I'll run through the basic steps to set up an index which may assist with some understanding.
You would design your mappings dependant on how you want your search engine used and how you want the data indexed (to allow for efficient searches). An overview of designing mappings is available on the ES website - elastic.co/guide/en/elasticsearch/...
Part of the mapping, you would look to analyze and tokenize certain fields. This is the process of indexing parts of the field, maybe down to the individual word, phrase or part phrase - tokens.
elastic.co/guide/en/elasticsearch/...
elastic.co/guide/en/elasticsearch/...
Once your mapping is set up, you can start populating your index. This is done simply by HTTP request.
Then you can start building your search queries. Basic queries via DSL (elastic.co/guide/en/elasticsearch/...) are quite simple.
At this point, I find it's time for a bit of educated trial and error. Is your search query finding expected results? If not, why not? Tweak query to better score expected results, bearing in mind that tweaking query to affect the result of this document may affect the result of others.
This may appear daunting but the Getting Started guide on ES site is pretty good - elastic.co/guide/en/elasticsearch/...
I'm getting into ES right now. Curious that the ES docs don't have search...?