Have you ever wondered why after updating the document in the ElasticSearch index score for that document is lower? Even when you have one node and one shard and the search value does not change?
I was searching all over the internet to answer this question for several days, but no straight answer was to be found. For me, the lower scores on the freshly updated documents are counterintuitive, possibly for you too.
I have bad news about that, it is mostly unavoidable, and there is a reason for it. Fortunately, the scores go back to old values after few minutes after the update. This article compiles few slices of information to explain why, as best as I can.
What is in ElasticSearch documentation about it?
The only clear information about problems with scoring in ElasticSearch is on the getting consistent scoring documentation page. Still, it focuses on many nodes/shards and not simply updating documents causing a loss in search score. It even states that:
"If you have a small dataset, the easiest way to work around this issue is to index everything into an index that has a single shard (index.number_of_shards: 1), which is the default. Then index statistics will be the same for all documents and scores will be consistent."
This is not exactly true, as the old versions of documents affect the search score of the updated documents for a short time after the update (until the next shard merge event). However, it leads to another question:
How the updating/deleting of the document works under the hood?
ElasticSearch does not remove document right after user request; it is only marked as deleted and waits for the shard merge event to fully remove deleted documents. The same happens for the document update, where the old version is marked as deleted and the new one available for search.
The more in-depth explanation you can find in this article on medium about the ElasticSearch document life-cycle. So maybe, we can ask other question to tackle this mystery:
Would shard refresh help?
Unfortunately, no. Shard refresh after document updating only makes the operation visible for search faster. It does not remove old versions of documents. Let's try another approach:
Would deleting the document and then adding it again help?
Here the answer is also no. The document version will be bumped once by removal and once by update, as described above. Even when the length of time that a deleted document's version number remains available for further versioned operations (index.gc_deletes) is changed to 0, the deleted version waits for the next shard merge to be entirely removed. (One more about it: Do not send the time as '0s', as it does not work, it has to be numerical 0).
Can I do anything about it?
If you really, really, really must, you can, but the solution can drop ElasticSearch efficiency a lot. You can force merge shards after every update of the document to remove old versions of updated documents and deleted documents. Like we said before, it can be costly, as ElasticSearch does shard merges when it does not have a lot of requests to handle, and because of that handles the requests a lot faster. You should always wait for the "natural" shard merge to make the most of your ElasticSearch instance.
I hope this article helped you understand your search engine more :)
Top comments (0)