I had the opportunity to implement keyword search with field-level boosting at work. It was my first experience creating such functionality, so I had a hard time doing it. If you make similar functionality, this post may help you.
NOTE:
Sitecore has a feature for field-level boosting, but this is not supported until Sitecore 9.4. So boosting is implemented manually (coded) in this post.
UPDATE (2020/2/5):
I made a library for generating an efficient query of keyword search that supports field-level boosting. If you are interested in, see the next link.
Problem
My first code is like this:
public SearchResults<SearchResultItem> Search(string[] keywords)
{
using (var context = index.CreateSearchContext())
{
// the "title" field contains all keywords. (boost: 10)
var titlePred = PredicateBuilder.True<SearchResultItem>();
foreach (var keyword in keywords) {
titlePred = titlePred.And(item => item["title"].Contains(keyword).Boost(10));
}
// OR the "body" field contains all keywords. (boost: 5)
var bodyPred = PredicateBuilder.True<SearchResultItem>();
foreach (var keyword in keywords) {
bodyPred = bodyPred.And(item => item["body"].Contains(keyword).Boost(5));
}
var keywordSearchPred = PredicateBuilder
.False<SearchResultItem>()
.Or(titlePred)
.Or(bodyPred);
return context.GetQueryable<SearchResultItem>().Where(keywordSearchPred).GetResult();
}
}
This worked well at first, but I noticed this doesn't work when the keywords are contained across some fields.
Here is an example of an invalid case:
- Keywords:
Sitecore
,Experience
,Platform
Field | Value |
---|---|
title | What means Sitecore "XP"? |
body | XP stands for eXperience Platform. |
As a simple solution, enumerate all permutation with repetition of fields and keywords, and determine if they match for each one. The following code would be generated by this solution:
(item["title"].Contains("Sitecore").Boost(10) && item["title"].Contains("Experience").Boost(10) && item["title"].Contains("Platform").Boost(10))
|| (item["title"].Contains("Sitecore").Boost(10) && item["title"].Contains("Experience").Boost(10) && item["body"].Contains("Platform").Boost(5))
|| (item["title"].Contains("Sitecore").Boost(10) && item["body"].Contains("Experience").Boost(5) && item["title"].Contains("Platform").Boost(10))
|| (item["title"].Contains("Sitecore").Boost(10) && item["body"].Contains("Experience").Boost(5) && item["body"].Contains("Platform").Boost(5))
|| (item["body"].Contains("Sitecore").Boost(5) && item["title"].Contains("Experience").Boost(10) && item["title"].Contains("Platform").Boost(10))
|| (item["body"].Contains("Sitecore").Boost(5) && item["title"].Contains("Experience").Boost(10) && item["body"].Contains("Platform").Boost(5))
|| (item["body"].Contains("Sitecore").Boost(5) && item["body"].Contains("Experience").Boost(5) && item["title"].Contains("Platform").Boost(10))
|| (item["body"].Contains("Sitecore").Boost(5) && item["body"].Contains("Experience").Boost(5) && item["body"].Contains("Platform").Boost(5))
Too long! The number of Contains
condition is calculated with the following formula.
If you have 5 target fields and 3 keywords input, 375 conditions will be generated. So in many cases, the query ends up exceeding the request size limit.
Solution
Now, to solve the problem, divide the query into ① "checking whether keywords are contained in" part and ② "applying boost value to results" part.
For making ① part, create a "contents" field that has concatenated value of all the target fields. Using this field, the query can be written as follows:
item["contents"].Contains("Sitecore") && item["contents"].Contains("Experience") && item["contents"].Contains("Platform")
It's very simple.
Then, the ② part is composed of all combinations of fields and keywords. Boost each field when a keyword is contained, and combine all the boosting query with OR condition.
item["title"].Contains("Sitecore").Boost(10)
|| item["title"].Contains("Experience").Boost(10)
|| item["title"].Contains("Platform").Boost(10)
|| item["body"].Contains("Sitecore").Boost(5)
|| item["body"].Contains("Experience").Boost(5)
|| item["body"].Contains("Platform").Boost(5)
Finally, we can get the whole query by combining ① and ② with AND condition. This query has fewer conditions compare with the previous one.
This query actually works well. When ① part is evaluated as true
, it means "all keywords are in some fields at least". So ② part becomes true
, and the whole query returns true
. When ① is false
, the whole query is naturally false
.
Implementation
First, we need to create the "contents" field used in ① part. This field can be created with the Computed Field in Sitecore.
Here is a sample code:
public class ContentsField : IComputedIndexField
{
public string FieldName { get; set; }
public string ReturnType { get; set; }
public object ComputeFieldValue(IIndexable indexable)
{
if (!(indexable is SitecoreIndexableItem item))
{
return null;
}
// The fields for keyword search
var targetFields = new[] { "Title", "Body", "Summary", "Category", "Author" };
// Concatenate all value of the target fields
return string.Join(" ", targetFields.Select(keyword => item.Item[keyword]));
}
}
This class has to be registered in the configuration. Here is a patch file to register:
<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
<sitecore role:require="Standalone or ContentManagement or ContentDelivery" search:require="solr">
<contentSearch>
<indexConfigurations>
<defaultSolrIndexConfiguration type="Sitecore.ContentSearch.SolrProvider.SolrIndexConfiguration, Sitecore.ContentSearch.SolrProvider">
<documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
<fields hint="raw:AddComputedIndexField">
<!-- Add contents field -->
<field fieldName="contents" returnType="string" type="NamespaceTo.ContentsField, Assembly"/>
</fields>
</documentOptions>
</defaultSolrIndexConfiguration>
</indexConfigurations>
</contentSearch>
</sitecore>
</configuration>
Then, execute "Populate solr managed schema" and "Rebuild index" in your Sitecore. The "contents" field will be generated in sitecore_master_index
(and web, core).
The main program of the keyword search can be written as follows:
public class KeywordSearchApi
{
// The target fields and its boosting value for keyword searching (You'd better load this from item or configuration)
protected static IReadOnlyDictionary<string, int> TargetFields = new Dictionary<string, int>()
{
["title"] = 10,
["body"] = 8,
["summary"] = 6,
["category"] = 2,
["author"] = 1
};
public static SearchResults<SearchResultItem> Search(ICollection<string> keywords)
{
var index = ContentSearchManager.GetIndex("sitecore_master_index");
using (var context = index.CreateSearchContext())
{
// The predicate for ①
var matchPred = keywords
.Aggregate(
PredicateBuilder.True<SearchResultItem>(),
(acc, keyword) => acc.And(item => item["contents"].Contains(keyword))); // without boosting
// The predicate for ②
var boostPred = TargetFields.Keys
// Make all pairs of field/keyword with boosting value
.SelectMany(_ => keywords, (field, keyword) => (field, keyword, boost: TargetFields[field]))
.Aggregate(
PredicateBuilder.Create<T>(item => item.Name.MatchWildcard("*").Boost(0)), // always true
(acc, pair) => acc.Or(item => item[pair.field].Contains(pair.keyword).Boost(pair.boost))); // with boosting
return context.GetQueryable<SearchResultItem>()
.Filter(matchPred)
.Where(boostPred) // Use 'Where' instead because 'Filter' ignores the boosting values.
.OrderByDescending(item => item["score"])
.GetResults();
}
}
}
If you use Sitecore PowerShell Extensions, you can easily check this method by the next script.
$keywords = "Sitecore","Experience","Platform"
[NamespaceTo.KeywordSearchApi]::Search($keywords)
Conclusion
This solution is only one of many ideas. If you have more smart ideas, let me know in the comment or your post.
Happy searching!
Top comments (0)