DEV Community

tackme
tackme

Posted on • Updated on

Implementing keyword search with field-level boosting in Sitecore

I had the opportunity to implement keyword search with field-level boosting at work. It was my first experience creating such functionality, so I had a hard time doing it. If you make similar functionality, this post may help you.

NOTE:
Sitecore has a feature for field-level boosting, but this is not supported until Sitecore 9.4. So boosting is implemented manually (coded) in this post.

UPDATE (2020/2/5):
I made a library for generating an efficient query of keyword search that supports field-level boosting. If you are interested in, see the next link.

Problem

My first code is like this:

public SearchResults<SearchResultItem> Search(string[] keywords)
{
    using (var context = index.CreateSearchContext())
    {
        // the "title" field contains all keywords. (boost: 10)
        var titlePred = PredicateBuilder.True<SearchResultItem>(); 
        foreach (var keyword in keywords) {
            titlePred = titlePred.And(item => item["title"].Contains(keyword).Boost(10));
        }

        // OR the "body" field contains all keywords. (boost: 5)
        var bodyPred = PredicateBuilder.True<SearchResultItem>();  
        foreach (var keyword in keywords) {
            bodyPred = bodyPred.And(item => item["body"].Contains(keyword).Boost(5));
        }

        var keywordSearchPred = PredicateBuilder
            .False<SearchResultItem>()
            .Or(titlePred)
            .Or(bodyPred);

        return context.GetQueryable<SearchResultItem>().Where(keywordSearchPred).GetResult();
    }
}
Enter fullscreen mode Exit fullscreen mode

This worked well at first, but I noticed this doesn't work when the keywords are contained across some fields.

Here is an example of an invalid case:

  • Keywords: Sitecore, Experience, Platform
Field Value
title What means Sitecore "XP"?
body XP stands for eXperience Platform.

As a simple solution, enumerate all permutation with repetition of fields and keywords, and determine if they match for each one. The following code would be generated by this solution:

   (item["title"].Contains("Sitecore").Boost(10) && item["title"].Contains("Experience").Boost(10) && item["title"].Contains("Platform").Boost(10))
|| (item["title"].Contains("Sitecore").Boost(10) && item["title"].Contains("Experience").Boost(10) && item["body"].Contains("Platform").Boost(5))
|| (item["title"].Contains("Sitecore").Boost(10) && item["body"].Contains("Experience").Boost(5) && item["title"].Contains("Platform").Boost(10))
|| (item["title"].Contains("Sitecore").Boost(10) && item["body"].Contains("Experience").Boost(5) && item["body"].Contains("Platform").Boost(5))
|| (item["body"].Contains("Sitecore").Boost(5) && item["title"].Contains("Experience").Boost(10) && item["title"].Contains("Platform").Boost(10))
|| (item["body"].Contains("Sitecore").Boost(5) && item["title"].Contains("Experience").Boost(10) && item["body"].Contains("Platform").Boost(5))
|| (item["body"].Contains("Sitecore").Boost(5) && item["body"].Contains("Experience").Boost(5) && item["title"].Contains("Platform").Boost(10))
|| (item["body"].Contains("Sitecore").Boost(5) && item["body"].Contains("Experience").Boost(5) && item["body"].Contains("Platform").Boost(5))
Enter fullscreen mode Exit fullscreen mode

Too long! The number of Contains condition is calculated with the following formula.

formula1

If you have 5 target fields and 3 keywords input, 375 conditions will be generated. So in many cases, the query ends up exceeding the request size limit.

Solution

Now, to solve the problem, divide the query into ① "checking whether keywords are contained in" part and ② "applying boost value to results" part.

For making ① part, create a "contents" field that has concatenated value of all the target fields. Using this field, the query can be written as follows:

item["contents"].Contains("Sitecore") && item["contents"].Contains("Experience") && item["contents"].Contains("Platform")
Enter fullscreen mode Exit fullscreen mode

It's very simple.

Then, the ② part is composed of all combinations of fields and keywords. Boost each field when a keyword is contained, and combine all the boosting query with OR condition.

   item["title"].Contains("Sitecore").Boost(10)
|| item["title"].Contains("Experience").Boost(10)
|| item["title"].Contains("Platform").Boost(10)
|| item["body"].Contains("Sitecore").Boost(5)
|| item["body"].Contains("Experience").Boost(5)
|| item["body"].Contains("Platform").Boost(5)
Enter fullscreen mode Exit fullscreen mode

Finally, we can get the whole query by combining ① and ② with AND condition. This query has fewer conditions compare with the previous one.

Alt Text

This query actually works well. When ① part is evaluated as true, it means "all keywords are in some fields at least". So ② part becomes true, and the whole query returns true. When ① is false, the whole query is naturally false.

Implementation

First, we need to create the "contents" field used in ① part. This field can be created with the Computed Field in Sitecore.

Here is a sample code:

public class ContentsField : IComputedIndexField
{
    public string FieldName { get; set; }
    public string ReturnType { get; set; }

    public object ComputeFieldValue(IIndexable indexable)
    {
        if (!(indexable is SitecoreIndexableItem item))
        {
            return null;
        }

        // The fields for keyword search
        var targetFields = new[] { "Title", "Body", "Summary", "Category", "Author" };

        // Concatenate all value of the target fields
        return string.Join(" ", targetFields.Select(keyword => item.Item[keyword]));
    }
}
Enter fullscreen mode Exit fullscreen mode

This class has to be registered in the configuration. Here is a patch file to register:

<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
  <sitecore role:require="Standalone or ContentManagement or ContentDelivery" search:require="solr">
    <contentSearch>
      <indexConfigurations>
        <defaultSolrIndexConfiguration type="Sitecore.ContentSearch.SolrProvider.SolrIndexConfiguration, Sitecore.ContentSearch.SolrProvider">
          <documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
            <fields  hint="raw:AddComputedIndexField">
              <!-- Add contents field -->
              <field fieldName="contents" returnType="string" type="NamespaceTo.ContentsField, Assembly"/>
            </fields>
          </documentOptions>
        </defaultSolrIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>
Enter fullscreen mode Exit fullscreen mode

Then, execute "Populate solr managed schema" and "Rebuild index" in your Sitecore. The "contents" field will be generated in sitecore_master_index (and web, core).

The main program of the keyword search can be written as follows:

public class KeywordSearchApi
{
    // The target fields and its boosting value for keyword searching (You'd better load this from item or configuration)
    protected static IReadOnlyDictionary<string, int> TargetFields = new Dictionary<string, int>()
    {
        ["title"] = 10,
        ["body"] = 8,
        ["summary"] = 6,
        ["category"] = 2,
        ["author"] = 1
    };

    public static SearchResults<SearchResultItem> Search(ICollection<string> keywords)
    {
        var index = ContentSearchManager.GetIndex("sitecore_master_index");

        using (var context = index.CreateSearchContext())
        {
            // The predicate for ①
            var matchPred = keywords
                .Aggregate(
                    PredicateBuilder.True<SearchResultItem>(),
                    (acc, keyword) => acc.And(item => item["contents"].Contains(keyword))); // without boosting

            // The predicate for ②
            var boostPred = TargetFields.Keys
                // Make all pairs of field/keyword with boosting value
                .SelectMany(_ => keywords, (field, keyword) => (field, keyword, boost: TargetFields[field]))
                .Aggregate(
                    PredicateBuilder.Create<T>(item => item.Name.MatchWildcard("*").Boost(0)), // always true
                    (acc, pair) => acc.Or(item => item[pair.field].Contains(pair.keyword).Boost(pair.boost))); // with boosting

            return context.GetQueryable<SearchResultItem>()
                .Filter(matchPred)
                .Where(boostPred) // Use 'Where' instead because 'Filter' ignores the boosting values.
                .OrderByDescending(item => item["score"])
                .GetResults();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

If you use Sitecore PowerShell Extensions, you can easily check this method by the next script.

$keywords = "Sitecore","Experience","Platform"

[NamespaceTo.KeywordSearchApi]::Search($keywords)
Enter fullscreen mode Exit fullscreen mode

Conclusion

This solution is only one of many ideas. If you have more smart ideas, let me know in the comment or your post.

Happy searching!

Top comments (0)