Follow

How do I set up the search to occur the right place?

Where to search

Now that you have decided what type of queries you want to execute (and, phrase, or), the next step is to decide where do you want this search to occur. When asked "where do you want to search?" people usually reply with "everywhere, of course!". Yet it is important to step back and think if that's really what you want. Take a look at the query below:

?p_aq=query("financial systems", token-op=and)

Where in the document do you believe this query will try to find the terms financial and systems? The answer is: everywhere.

If you do not define exactly which fields should be searched on, by default cX::search will do a search using _all, which would translate this query to:

?p_aq=query(_all:"financial systems", token-op=and)

The _all option here defines that all the fields for each document will be used for search (with exception of fields explicitly declared using indexOps:"storeOnly"). This means that cX::search will look for the terms financial and systems in the fields title and body, but also in fields such as categoryrelated_content, or even unitsInStock which may not be exactly what you are looking for.

The best way to avoid future problems with some undesirable fields being matched is to be explicit about which fields you want to search on. This can be done by adding the desired list of fields before the query term, as shown below:

?p_aq=query(title,body,description,tags,url,author:"financial systems", token-op=and)

In the example above we are being very clear about which fields should be used when looking for the query terms defined by the user, which makes it a lot easier to debug and answer questions like "why was this document returned in the results?".

Field Importance

You know how you want to search (and, phrase, or) and also where to search (title, body, etc.), so it's time to decide which fields matter more to you among all the ones that were selected to be searched in the previous step. As a starting point, take look at these document examples below:

 
title
body
tags
Document 1 Market Research Findings - 2012 This document summarizes the findings from the 2012 market research study... research, 2012
Document 2 About the market crash of 1929 All the available research on the market crash of 1929... stock, market, 1929
Document 3 XYZ begins to explore new market After a few years focused on research, company XYZ began exploring a new market... XYZ

And now consider the following query:

?p_aq=query(title,tags,body:"market research", token-op=and)

Based on the sample query and documents above, which document would you expect to be ranked higher?

Most people would say Document 1 listed above should be ranked higher, and the reason is that users got trained by search engines to expect, among other things, that anything that is found in the title of a document should have more relevance than something found somewhere in the body of the document. This is a very reasonable expectation, because we tend to accept that if someone went through the trouble of choosing specific terms to put in the title of a document, then those terms must be important.

So let's change the query again to be explicit about what fields should have higher importance for us:

?p_aq=query(title^5,tags^3,body:"market research", token-op=and)

What the query above is defining is that cX::search should:

  • look for documents containing the terms market and research;
  • these terms must be found in the title, tags or body fields; and, even more importantly
  • terms found in the title have 5 times (title^5) more importance than terms found in the body (the default field boost is 1)
  • terms found in the tags have 3 times (tags^3) more importance than terms found in the body

As you can see, using field boosts give you the flexibility to be very precise about which fields matter most according to your specific business rules.

Sorting

The last important piece of this puzzle of configuring basic relevance settings for your search application is to decide how results should be sorted before being returned. This is crucial because, in the end, this is what decides what results will be displayed on top.

Remember the previous example above that uses field boosts to define the importance of each field? Well, now take a look at this request below:

?p_aq=query(title^5,tags^3,body:"market research", token-op=and)&p_sm=publication_date:desc

As you can see above, this query is explicitly requesting that results be sorted by publication_date in descending order. What this means is that any field boost are completely ignored by the search engine. Yes, they are simply ignored, since we directly requested results to be sorted based on a date field, instead of the default sorting that is based on the ranking score.

Now what if you want to mix the two options, sorting by ranking score (and therefore taking advantage of field boosts), but also taking into consideration how recent a document is? For that you need the options available in the custom ranking score, which is our next topic.

 

Have more questions? Submit a request

Comments

Powered by Zendesk