Where to search
Now that you have decided what type of queries you want to execute (and, phrase, or), the next step is to decide where do you want this search to occur. When asked "where do you want to search?" people usually reply with "everywhere, of course!". Yet it is important to step back and think if that's really what you want. Take a look at the query below:
Where in the document do you believe this query will try to find the terms financial and systems? The answer is: everywhere.
If you do not define exactly which fields should be searched on, by default cX::search will do a search using _all, which would translate this query to:
The _all option here defines that all the fields for each document will be used for search (with exception of fields explicitly declared using indexOps:"storeOnly"). This means that cX::search will look for the terms financial and systems in the fields
body, but also in fields such as
related_content, or even
unitsInStock which may not be exactly what you are looking for.
The best way to avoid future problems with some undesirable fields being matched is to be explicit about which fields you want to search on. This can be done by adding the desired list of fields before the query term, as shown below:
In the example above we are being very clear about which fields should be used when looking for the query terms defined by the user, which makes it a lot easier to debug and answer questions like "why was this document returned in the results?".
You know how you want to search (and, phrase, or) and also where to search (title, body, etc.), so it's time to decide which fields matter more to you among all the ones that were selected to be searched in the previous step. As a starting point, take look at these document examples below:
|Document 1||Market Research Findings - 2012||This document summarizes the findings from the 2012 market research study...||research, 2012|
|Document 2||About the market crash of 1929||All the available research on the market crash of 1929...||stock, market, 1929|
|Document 3||XYZ begins to explore new market||After a few years focused on research, company XYZ began exploring a new market...||XYZ|
And now consider the following query:
Based on the sample query and documents above, which document would you expect to be ranked higher?
Most people would say Document 1 listed above should be ranked higher, and the reason is that users got trained by search engines to expect, among other things, that anything that is found in the
title of a document should have more relevance than something found somewhere in the
body of the document. This is a very reasonable expectation, because we tend to accept that if someone went through the trouble of choosing specific terms to put in the
title of a document, then those terms must be important.
So let's change the query again to be explicit about what fields should have higher importance for us:
What the query above is defining is that cX::search should:
- look for documents containing the terms market and research;
- these terms must be found in the
bodyfields; and, even more importantly
- terms found in the
titlehave 5 times (
title^5) more importance than terms found in the
body(the default field boost is 1)
- terms found in the
tagshave 3 times (
tags^3) more importance than terms found in the
As you can see, using field boosts give you the flexibility to be very precise about which fields matter most according to your specific business rules.
The last important piece of this puzzle of configuring basic relevance settings for your search application is to decide how results should be sorted before being returned. This is crucial because, in the end, this is what decides what results will be displayed on top.
Remember the previous example above that uses field boosts to define the importance of each field? Well, now take a look at this request below:
As you can see above, this query is explicitly requesting that results be sorted by
publication_date in descending order. What this means is that any field boost are completely ignored by the search engine. Yes, they are simply ignored, since we directly requested results to be sorted based on a date field, instead of the default sorting that is based on the ranking score.
Now what if you want to mix the two options, sorting by ranking score (and therefore taking advantage of field boosts), but also taking into consideration how recent a document is? For that you need the options available in the custom ranking score, which is our next topic.