In my previous blog, we covered how precision
, recall
, and score
play important roles in measuring the relevance of your search.
Now that we have the basics under our belt, we are ready to write some queries
!
When we use the search bar, we are often looking to find sources(web pages, videos and etc.) that contain the information we want.
In Elasticsearch, queries
work in a similar way. Queries
are used to retrieve sources (documents) that contain the information we are looking for.
Let's break this down.
In Elasticsearch, data is stored as documents. Documents contain fields(key-value pairs) that contain information.
Take a look at the following document of a news headline:
In a json object, we see multiple fields that contain the following information about a news headline:
- date
- short_description
- @timestamp
- link
- category
- headline
- authors
What if we wanted to see documents that were published on "2017-11-07"?
You would write a query
that retrieves documents whose field "date" contains the value "2017-11-07"!
Take Home Message
Whenever you need to retrieve documents that match specific criteria, write a query
!
Read on to learn how to:
1) write queries designed to search text fields
2) build a combination of queries to answer more complex questions
3) fine-tune the relevance of search results
Prerequisite work
Watch this video from time stamp 15:00-21:46. This video will show you how to complete steps 1-3.
- Set up Elasticsearch and Kibana
- Add news headlines dataset to Elasticsearch
- Open the Kibana console(AKA Dev Tools).
- Keep two windows open side by side(this blog and the Kibana console)
We will be sending queries
from Kibana to Elasticsearch to learn how queries
work!
Note
1) If you would rather download Elasticsearch and Kibana on your own machine, follow the step outlined in Downloading Elasticsearch and Kibana(macOS/Linux and Windows).
2) This blog builds on Beginner's guide to understanding the relevance of your search with Elasticsearch and Kibana.
If you have trouble understanding some of the terms and concepts mentioned in this blog, refer to the blog above for further explanation!
Additional Resources
Interested in beginner friendly workshops on Elasticsearch and Kibana? Check out my Beginner's Crash Course to Elastic Stack series!
This blog is a complementary blog to Part 3 of the Beginner's Crash Course to Elastic Stack. If you prefer learning by watching videos instead, check out the recording!
2) Beginner's Crash Course to Elastic Stack Table of Contents
This table of contents includes repos of all workshops in the series. Each repo includes resources shared during the workshop including the video recording, presentation, related blogs, Elasticsearch requests and more!
Running queries with Elasticsearch and Kibana
At this time, you should have two windows(the Kibana console and this blog) open side by side as shown below.
Look at the Kibana console on the left. It is divided into two panels.
The left panel(red box) is where you send requests from Kibana to Elasticsearch. The right panel(blue box) is where you receive a response from Elasticsearch.
We are now ready to run some queries
!
Whenever we need to retrieve documents that match specific criteria, we write a query
.
We will be sending queries
from Kibana to Elasticsearch to see how these queries
work.
In the prerequisite steps, we added the news headlines dataset to an index named news_headlines.
Get information about documents in an index
Before we ask questions about our dataset, it is helpful to see the content of our document so we know what type of questions we can ask.
The following query
retrieves information about documents that exist in the news_headlines index.
For every query
we will go over, the syntax has been included for you so you can customize this for your own use case.
We will be sending the example query
to Elasticsearch to see how the query
works.
Syntax:
GET Enter_name_of_the_index_here/_search
Example:
GET news_headlines/_search
This query
asks Elasticsearch to retrieve information about documents in the index news_headlines.
Copy and paste this query
into the left panel of the Kibana console. Click on the query
to make sure it is selected(dark grey bar) and click on the green arrow to send the query
.
Expected response from Elasticsearch:
Elasticsearch displays a number of hits and a sample of 10 search results by default. The field "_ source" lists all fields(the content) included in a document.
Our document contains fields called:
- date
- short_description
- @timestamp
- link
- category
- headline
- authors
Now that we know what type of information our documents include, let's retrieve documents that match specific criteria by using queries
.
Searching for search terms using the match query
The match query
is a standard query
for performing a full text search. This query
retrieves documents that contain the search terms. The order and the proximity in which the search terms are found(i.e. phrases) are not taken into account.
Syntax:
GET Enter_name_of_index_here/_search
{
"query": {
"match": {
"Specify the field you want to search": {
"query": "Enter search terms"
}
}
}
}
Let's say we wanted to use the match query
to search for news headlines about Ed Sheeran's song "Shape of you".
The following is how you would write the query
. It asks to search for terms "Shape" or "of" or "you" in the field headline.
Example:
GET news_headlines/_search
{
"query": {
"match": {
"headline": {
"query": "Shape of you"
}
}
}
}
Copy and paste this query
into the Kibana console and send it!
Expected response from Elasticsearch:
Elasticsearch returns greater than 10,000 hits. The top hit as well as many others in the search results only contain the search terms "you" and "shape". These terms are not found in the same order or in proximity to each other as the search terms "Shape of you".
Along with a few news headlines about the song "Shape of you", it pulls up news headlines about being in shape or what shape of your face says about you.
When the match query
is used to search for a phrase, it has high recall but low precision.
It pulls up more loosely related documents as it uses "OR" logic by default.
It pulls up documents that contains any one of the search terms in the specified field. Moreover, the order and the proximity in which the search terms are found are not taken into account.
Searching for phrases using the match_phrase query
If the order and the proximity in which the search terms are found(i.e. phrases) are important in determining the relevance of your search, you should use the match_phrase query
.
Syntax:
GET Enter_name_of_index_here/_search
{
"query": {
"match_phrase": {
"Specify the field you want to search": {
"query": "Enter search terms"
}
}
}
}
Take a look at the example of a match_phrase query
below. It is almost identical to the match query
except that the match
parameter has been replaced with the match_phrase
parameter.
Example:
GET news_headlines/_search
{
"query": {
"match_phrase": {
"headline": {
"query": "Shape of You"
}
}
}
}
When the match_phrase
parameter is used, all hits must meet the following criteria:
- the search terms "Shape", "of", and "you" must appear in the field headline.
- the terms must appear in that order.
- the terms must appear next to each other.
Let's copy and paste this query
into the Kibana console and send it!
Expected response from Elasticsearch:
With the match_phrase
parameter, we get 3 hits returned. All 3 hits satisfy the criteria mentioned above.
The match_phrase
parameter yields higher precision but lower recall as it takes the order and the proximity in which the search terms are found into account.
Running a multi_match query
against multiple fields
When designing a query
, you don't always know the context of a user's search. When a user searches for "Michelle Obama", the user could be searching for statements written by Michelle Obama or articles written about her.
To accommodate these contexts, you can write a multi_match query
, which searches for terms in multiple fields.
The multi_match query
runs a match query
on multiple fields and calculates a score for each field. Then, it assigns the highest score among the fields to the document.
This score will determine the ranking of the document within the search results.
Syntax:
GET Enter_the_name_of_the_index_here/_search
{
"query": {
"multi_match": {
"query": "Enter search terms here",
"fields": [
"List the field you want to search over",
"List the field you want to search over",
"List the field you want to search over"
]
}
}
}
The following multi_match query
asks Elasticsearch to query
documents that contain the search terms "Michelle" or "Obama" in the fields headline, or short_description, or authors.
Example:
GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "Michelle Obama",
"fields": [
"headline",
"short_description",
"authors"
]
}
}
}
Expected response from Elasticsearch:
We see 3044 hits that contain the terms "Michelle" or "Obama" in the field headline or short_description or authors.
While the multi_match
query increased the recall, it decreased the precision of the hits.
For example, in our search for "Michelle Obama" related headlines, the top hit is a news headline featuring Bernie Sanders as the main topic. In this headline, Michelle Obama is mentioned once in the field short_description.
How can we improve the precision of our search?
Per-field boosting
Headlines mentioning "Michelle Obama" in the field headline are more likely to be related to our search than the headlines that mention "Michelle Obama" once or twice in the field short_description.
To improve the precision of your search, you can designate one field to carry more weight than the others.
This can be done by boosting the score of the field headline(per-field boosting
). This is notated by adding a carat(^) symbol and number 2 to the desired field as shown below.
Syntax:
GET Enter_the_name_of_the_index_here/_search
{
"query": {
"multi_match": {
"query": "Enter search terms",
"fields": [
"List field you want to boost^2",
"List field you want to search over",
"List field you want to search over"
]
}
}
}
The following example boosts the score of documents that contain the search terms in the field headline. If the term "Michelle" or "Obama" are found in the field headline of a document, that document is given a higher score and is ranked higher in the search results.
Example:
GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "Michelle Obama",
"fields": [
"headline^2",
"short_description",
"authors"
]
}
}
}
Expected response from Elasticsearch:
Per-field boosting
yields the same number of hits(5128). However, it changes the ranking of the hits. The hits ranked higher on the list contains the search terms "Michelle Obama" in the boosted field, headline.
The documents containing the search terms "Michelle Obama" in the field headline are more likely to be about Michelle Obama. By using theper-field boosting
, we have improved the precision of our search!
What happens when you use the multi_match query
to search for a phrase?
Let's say while searching for "Michelle Obama", the user remembers that she is throwing a party for all of her friends this weekend. She searches for news headlines regarding "party planning" to get some ideas for it.
She uses the multi_match query
to search for the phrase party planning.
Example:
GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "party planning",
"fields": [
"headline^2",
"short_description"
]
}
}
}
Copy and paste this query
into the console and send it!
Response from Elasticsearch:
This query
yields a lot of hits(2846).
But why does one of our top 10 hits feature Bernie Sanders planning a national tour for grassroots party activism?
The terms "party" and "planning" are popular terms found in many documents.
With the multi_match query
, a document is considered as a hit if any one of these search terms were found in any one of the specified fields. It does not take into account the order or the proximity in which these search terms are found.
Because of that, you will see loosely related search results included among the top hits.
Improving precision with phrase type match
You can improve the precision of a multi_match query
by adding the "type":"phrase" to the query
.
The phrase type performs a match_phrase query
on each field and calculates a score for each field. Then, it assigns the highest score among the fields to the document.
Syntax:
GET Enter_the_name_of_the_index_here/_search
{
"query": {
"multi_match": {
"query": "Enter search phrase",
"fields": [
"List field you want to boost^2",
"List field you want to search over",
"List field you want to search over"
],
"type": "phrase"
}
}
}
The following query
asks Elasticsearch to look up the phrase "party planning" in the fields headline and short_description.
Using per field boosting
, this query
assigns a higher score to documents containing the phrase "party planning" in the field headline. The documents that include the phrase "party planning" in the field headline will be ranked higher in the search results.
Copy and paste the following query
into the Kibana console and send it.
Example:
GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "party planning",
"fields": [
"headline^2",
"short_description"
],
"type": "phrase"
}
}
}
Expected response from Elasticsearch:
The recall is much lower(6 vs 2846 hits) but every one of the hits have the phrase "party planning" in either the field headline or short_description or both.
Among these, the hits that have the phrase "party planning" in the boosted field headline are ranked higher in the search results and presented towards the top of the search results.
Combined Queries
There will be times when a user asks a multi-faceted question that requires multiple queries
to answer.
For example, a user may want to find political headlines about Michelle Obama published before the year 2016.
This search is actually a combination of three queries:
1) Query headlines that contain the search terms "Michelle Obama" in the field headline.
2) Query "Michelle Obama" headlines from the "POLITICS" category.
3) Query "Michelle Obama" headlines published before the year 2016
One of the ways you can combine these queries is through the bool query
.
Bool Query
The bool query retrieves documents matching boolean combinations of other queries.
With the bool query
, you can combine multiple queries
into one request and further specify boolean clauses to narrow down your search results.
There are four clauses to choose from:
- must
- must_not
- should
- filter
You can build combinations of one or more of these clauses. Each clause can contain one or multiple queries
that specify the criteria of each clause.
These clauses are optional and can be mixed and matched to cater to your use case. The order in which they appear does not matter either!
Syntax:
GET name_of_index/_search
{
"query": {
"bool": {
"must": [
{One or more queries can be specified here. A document MUST match all of these queries to be considered as a hit.}
],
"must_not": [
{A document must NOT match any of the queries specified here. It it does, it is excluded from the search results.}
],
"should": [
{A document does not have to match any queries specified here. However, it if it does match, this document is given a higher score.}
],
"filter": [
{These filters(queries) place documents in either yes or no category. Ones that fall into the yes category are included in the hits. }
]
}
}
}
A combination of query and aggregation request
A bool query
can help you answer multi-faceted questions. Before we go over the four clauses of the bool query
, we need to first understand what type of questions we can ask about Michelle Obama.
Let's first figure out what headlines have been written about her.
One way to figure that out is by searching for categories of headlines that mention Michelle Obama.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"query": {
"Enter match or match_phrase here": { "Enter the name of the field": "Enter the value you are looking for" }
},
"aggregations": {
"Name your aggregation here": {
"Specify aggregation type here": {
"field": "Name the field you want to aggregate here",
"size": State how many buckets you want returned here
}
}
}
}
The following query
asks Elasticsearch to query
all data that has the phrase "Michelle Obama" in the headline. Then, perform aggregations on the queried data and retrieve up to 100 categories that exist in the queried data.
Example:
GET news_headlines/_search
{
"query": {
"match_phrase": {
"headline": "Michelle Obama"
}
},
"aggregations": {
"category_mentions": {
"terms": {
"field": "category",
"size": 100
}
}
}
}
Expected response from Elasticsearch:
When you minimize the hits field(line 10), you will see an aggregations report called category_mentions. This report displays an array of all the categories that exist in the queried data and the number of headlines that have been written about each category.
We see that many news headlines about Michelle Obama has been written under categories such as "POLITICS", "BLACK VOICES", "PARENTING", "TASTE", and even "WEDDINGS"!
Now let's get back to the bool query!
With the bool query
, you can combine multiple queries
into one request and further specify boolean clauses to narrow down your search results.
There are four clauses to choose from:
- must
- must_not
- should
- filter
The must clause
The must clause
defines all queries
(criteria) a document MUST match to be returned as hits. These criteria are expressed in the form of one or multiple queries
.
All queries
in the must clause
must be satisfied for a document to be returned as a hit. As a result, having more queries
in the must clause
will increase the precision of your query.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"query": {
"bool": {
"must": [
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
},
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
}
]
}
}
}
The following is a bool query
that uses the must clause
. This query
specifies that all hits must match the phrase "Michelle Obama" in the field headline and match the term "POLITICS" in the field category.
Example:
GET news_headlines/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"headline": "Michelle Obama"
}
},
{
"match": {
"category": "POLITICS"
}
}
]
}
}
}
Let's copy and paste this query
into the console and send it!
Expected response from Elasticsearch:
You get 45 hits. All documents contain the phrase "Michelle Obama" in the field headline and the term "POLITICS" in the
field category.
The must_not clause
The must_not
clause defines queries
(criteria) a document MUST NOT match to be included in the search results.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"query": {
"bool": {
"must": [
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
},
"must_not":[
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
}
]
}
}
}
What if you want all Michelle Obama headlines except for the ones that belong in the "WEDDINGS" category?
The following bool query
specifies that all hits must
contain the phrase "Michelle Obama" in the field headline. However, the hits must_not
contain the term "WEDDINGS" in the field category.
Example:
GET news_headlines/_search
{
"query": {
"bool": {
"must": {
"match_phrase": {
"headline": "Michelle Obama"
}
},
"must_not":[
{
"match": {
"category": "WEDDINGS"
}
}
]
}
}
}
Let's copy and paste this query
into the console and send it.
Expected response from Elasticsearch:
This query
increases the recall(203 hits). It pulls up all the hits that contain the phrase "Michelle Obama" in the field headline. Among the hits, Elasticsearch excludes all documents that contain the term "WEDDINGS" in the field category.
The should clause
The should clause
adds "nice to have" queries
(criteria). The documents do not need to match the "nice to have" queries
to be considered as hits. However, the ones that do will be given a higher score and are placed higher in the search results.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"query": {
"bool": {
"must": [
{
"Enter match or match_phrase here: {
"Enter the name of the field": "Enter the value you are looking for"
}
},
"should":[
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
}
]
}
}
Let's talk about a scenario where we may use the should clause
. During the Black History Month, it is possible that the user may be looking up "Michelle Obama" in the context of "BLACK VOICES" category rather than in the context of "WEDDINGS", "TASTE", or "STYLE" categories.
To accommodate this scenario, you may write a query where all hits MUST contain "Michelle Obama" in the field headline. Having the phrase "BLACK VOICES" in the category is not required. However, if a document contains the phrase "BLACK VOICES" in the field category, this document should be given a higher score and should be placed higher in the search results.
To accommodate this scenario, you would write the following bool query
. It specifies that all hits must
match the phrase "Michelle Obama" in the field headline. Should
a hit match the phrase "BLACK VOICES" in the field category, this hit will be given a higher score and will be shown higher in the search results.
Example:
GET news_headlines/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"headline": "Michelle Obama"
}
}
],
"should":[
{
"match_phrase": {
"category": "BLACK VOICES"
}
}
]
}
}
}
Let's copy and paste this query
into the console and send it!
Expected response from Elasticsearch:
We should still get the same number of hits(207) as the should clause
does not add or exclude more hits. However, you will notice that the ranking of the documents has been changed. The documents with the phrase "BLACK VOICES" in the field category are now presented at the top of the search results.
The filter clause
The filter clause
contains filter queries
that place documents into either "yes" or "no" category.
For example, let's say you are looking for headlines published within a certain time range. Some documents will fall within this range(yes) or do not fall within this range(no).
The filter clause
only includes documents that fall within the yes category.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"query": {
"bool": {
"must": [
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
}
],
"filter":{
"range":{
"date": {
"gte": "Enter lowest value of the range here",
"lte": "Enter highest value of the range here"
}
}
}
}
}
}
Let's say we wanted to retrieve hits that must include the phrase "Michelle Obama" in the field headline. Among these hits, you want to include documents published within the date range "2014-03-25" and "2016-03-25".
You bool query
will look something like this.
Example:
GET news_headlines/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"headline": "Michelle Obama"
}
}
],
"filter":{
"range":{
"date": {
"gte": "2014-03-25",
"lte": "2016-03-25"
}
}
}
}
}
}
Let's copy and paste the query
into the console and send it.
Expected response from Elasticsearch:
You will see 33 hits returned. All hits contain the phrase "Michelle Obama" in the field headline. All hits were published within the date range we specified under the filter clause
.
All right. Now that we have mastered the bool query
, let's figure out how we can fine-tune the relevance of bool queries
!
Fine-tuning the relevance of bool queries
There are many ways you can fine-tune the relevance of bool queries
.
One of the ways is to add multiple queries under the should clause
.
Adding multiple queries under the should clause
This approach ensures that you maintain a high recall but also offers a way to present more precise search results at the top of your search results.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"query": {
"bool": {
"must": [
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
}
],
"should": [
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
},
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
},
{
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
}
]
}
}
}
Let's say you want to run a search for news headlines with the phrase "Michelle Obama" in the field headline. But you want to favor articles that mention her biography "Becoming", and terms like "women" and "empower".
To do this, you can add multiple queries
to the should clause
.
This will cast a wider net because none of the queries
in the should clause
need to match. However, the ones that match the queries
under the should clause
will be given a higher score and placed higher in the search results.
This approach allows you to maintain a high recall but also gives you a way to customize the precision of top hits.
Example:
GET news_headlines/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"headline": "Michelle Obama"
}
}
],
"should": [
{
"match": {
"headline": "Becoming"
}
},
{
"match": {
"headline": "women"
}
},
{
"match": {
"headline": "empower"
}
}
]
}
}
}
Expected response from Elasticsearch:
Adding many queries under the should clause
did not reduce the number of hits(207). However, it favored documents that match the queries in the should clause
and improved the precision of top search results.
Conbratulations! You have mastered the skills to:
1) write queries designed to search text fields
2) build a combination of queries to answer more complex questions
3) fine-tune the relevance of search results
The concepts covered in this blog are just the tip of the iceberg when it comes to all the awesome queries you can write with Elasticsearch.
Go explore on your own and see what you can do with Elasticsearch queries!
Top comments (6)
Let suppose I have an index with multiple date (eg. news_headlines-11.04.22, news_headlines-10.04.22) and so on) then how can I search all the results for news_headlines-dates. How can I scan all the data?
Hey @vivek Gupta!
Are you asking if there is a way to search multiple indices at one time?
If that is what you are asking, you can set an alias for the group of indices you are looking to search. Then search against the alias in place of index name.
Check out this documentation for more details!
elastic.co/guide/en/elasticsearch/...
Thanks for the guidance. I am able to create alias for group of indices. Now, I am not aware about how to index or
route the alias and what query I need to write to get the logs for that alias.
I am so glad you found my response helpful and you were able to create an alias for group of indices!
For your follow up question, would you please post that on discuss.elastic.co?
It's a great question that our community could really benefit from. Also, it seems like a question that needs more clarification to answer. The Discuss forum will be a better place for that! :)
Thank you SOOO MUCH ,This was SUPER HELPFUL : )
You are so welcome @sainathwingman! :)