Faceted Search 3.0

  • About 
  • Configuration 
    • Facets Configuration
      • Field Definition Properties
      • Field Types
    • Mapping of the Data
      • Index Document Preview
      • Debugging the Mapping
    • Configuring Search Endpoint 
    • Integration
      • Integrating Endpoints Naturally 
      • Using Headless Requests
      • Using a Block Function
      • Search Results object
      • Search Parameters and Configuration Files
      • Development mode
  • Using Faceted Search API 
    • General Syntax
      • HTTP Requests
        • Arrays
      • Endpoint Configuration Files
    • Basic Parameters
      • Returning Objects in Results
      • Removing Products from Results
      • Removing Facets from Results
      • Removing Product Fields from Results
    • Filter Queries
      • Filtering by Facets
      • Controlling Facets in Search Results
      • Inclusive and Exclusive Queries
      • Recursive Catalogue Lookup
    • Full-Text Queries
      • Search Modes
        • standard
        • autocomplete
      • Controlling Sensitivity of Full-Text Queries
        • Fuzziness
        • Sensitivity
      • Search Fields and the Weights
    • Controlling Search Results Scope
      • Pagination
    • The Standard Fields Definition
  • Development Environment 
  • Further Reading 

About

Faceted Search v.3 uses Elasticsearch engine for indexing and provides flexible and redistributable way to configure the structure of the index and maintain it with real-time updates.

Configuration

To start using Faceted Search 3 (FS3) it is required to follow four simple steps:

  1. Define fields and facets using JSON configuration file
  2. Map catalogue data to index using a mapping template
  3. Configure one or more endpoints to accept requests
  4. Send requests to the configured endpoint

Facets Configuration

Each product in the Product Catalogue is stored in Elasticsearch engine as an Index Document. The structure of the Index Document is defined in JSON configuration file facets.json located at

./modules/facetedsearch3/facets.json

A typical configuration file looks like this:

Sample facets.json Expand source

Each field in the Index Document must be given a unique key, all properties are optional. 

All facets are fields, but not all fields can be used as facets

Facets are the fields used for aggregation. When Faceted Search runs it produces a result set which contains a collection of products and collection of facets - the fields which appear in the result set along with the most frequent values of these fields. For example, "Colour" is a facet, and the values of this facets could be: "Red", "Green", "Blue", etc.
Some fields are just fields and are not used for aggregation, but may be used as a filter, for full-text search or just as a stored value which can be displayed in the results (e.g. blurb or description). 

There are 16 standard or "core" fields, most of which are related to category. They do not need to be configured or mapped, and are ready for use:

All core facets can be re-defined in facets.json file and the re-mapped in configuration.html, like any custom field/facet. Full definition of the core fields can be found here

Field Definition Properties

  
Property
Default Value
Description
name
"Humanized" version of the key. E.g. "My Field" for "my_field" Name of the field.
display_name
The value of the "name" propertyDisplayed version of field name. It can be something like "Select a Colour"
type
keyword
Defines how the value is stored in Elasticsearch index. The available types are discussed below.
fulltext
false
When set to true, a text version of the same field will be created in Elasticsearch. This is useful when the same field (e.g. of keyword type) needs to be used as a facet and for full-text search at the same time.
case_sensitive
true
By default, the values of keyword fields are indexed as is, which makes facet values case-sensitive, so a filter query ?size=XL will not match a document with size XL. This behavior can be changed with case_sensitive property set to false
hidden
false
All fields are assumed to be used as facets and appear in search result. If there is no intention to display the facet (perhaps, use this field only for filtering purposes), it can be hidden.
values_limit
10By default, the facet will display only top 10 most-popular values, This number can be adjusted with this property
values
(empty array)If a fixed list of facet values is required, it can be set with this property
order
value countBy default, facet values are ordered by the count of matching documents with the most popular value coming first, The sorting order of facet values can be changed with this property. For possible values refer to Elasticsearch documentation
output_order(none)The output_order parameter allows to re-order facet values after they have been fetched from Elasticsearch. For example, by using the order and and the values_limit properties it is possible to fetch 10 most popular values, and then by applying output_order property, sort these 10 most popular facet values in alphabetical order.
unit
(none)Unit of Measure. The value can be any string, such a "in" or "mm".
props
(empty array)This property allows to pass an array with arbitrary data to the SearchResults object. The arbitrary data may include CSS class names or HTML attributes hinting the font-end about how to render the facet on the page.

Field Types

   
Type
Use
Can Be Used as a Facet
Can be Used in Full-Text Search
keyword
The value is stored "as is". No analysis is applied. The default type for Faceted Search.Yes (recommended)No
integer
Used to store whole numbers, such as ID values. YesNo
double
Used to store floating point numbers, such as price.YesNo
date
Used to store date fieldsNoNo
boolean
true / falseYesNo
text
Used to store long strings of text for full-text search NoYes

Mapping of the Data

Product data is mapped to the index document using a mapping template in the following location:

./modules/facetedsearch3/templates/configuration.html

The purpose of this template is to map the relevant data from the $product object to the $index array which represents the Index Document:

Sample configuration.html file
<{* Typically, product custom fields can be mapped to the index field *}>
<{$index['rating'] = $product->getCustomFieldValue('Rating')}>
<{* With the use of basic Smarty functions it is possible to expand comma-separated list into an array ... *}>
<{$index['retailers'] = array_map('trim'explode(','$product->getCustomFieldValue('Show_Retailers')))}>
<{* ... or create a range from the "start" and the "end" numbers *}>
<{$index['year_range'] = range($product->getCustomFieldValue('year_start'), $product->getCustomFieldValue('year_end'))}>
<{$index['tags'] = ($product->getTags()) ? array_map('trim'explode(','$product->getTags())) : null}>
<{* The values should be set to null if they are not available *}>
<{$index['brand'] = ($product->getBrand()) ? $product->getBrand()->getName() : null}>
<* A field/facet in the index can be a completely new thing based on some calculations *>
<{$index['on_sale'] = ($product->getFinalPrice() < $product->getBasePrice())}>
<{foreach $product->getRelationInfo(true) as $relation}>
 <{if $relation.category eq 'fits'}>
 <{$something $relation.description|strtolower}>
 <{$index['relation'][$something][] = $relation.product->getName()}>
 <{/if}>
<{/foreach}>

Even the core fields can be re-assigned, if needed:

Sample configuration.html with the core field 'code' being re-mapped.
<{$index['code'] = [
 $index.code,
 $product->getCustomFieldValue('ProxyCode'),
 $product->getCustomFieldValue('VIN'),
 $product->getCustomFieldValue('ISDN'),
]}>

Index Document Preview

To ensure the facets have been configured and correctly, there is a way to see how the data is mapped to the Index Document, There is a special endpoint to preview the Index Document based on the ID or URL Slug of the corresponding product:

/fsearch/preview/{slug-or-id}

The URL below will display an Index Document in JSON format for the product with URL Slug "my-test-product":

/fsearch/preview/my-test-product

If a product is not published, does not have public view permissions, SEO settings prevent it from indexing, or for whatever other reason it cannot be indexed, an error will be displayed. 

To see the actual reason why the product cannot be indexed, development mode can be turned on and the actual reason will be displayed:

/fsearch/preview/my-test-product?dev=1
The Routing module allows to modify the base path of the FS module and change it from /fsearch/ to anything else (e.g. /search/).

Debugging the Mapping

The mapping template, like any other Smarty template, can produce errors, or there might be a need to print out some variables. Faceted Search has a special endpoint to print output of the mapping template for ID or URL Slug of the corresponding product:

/fsearch/debug/{slug-or-id}

The URL below will display any output from the mapping template for the product with URL Slug "my-test-product":

/fsearch/debug/my-test-product

The mapping template is only used to assign values. It must not produce any output! Use /fsearch/debug endpoint to ensure no output is made by it.

Configuring Search Endpoint 

Search Endpoint is a profile used to accept search queries of a the same type. For example, one endpoint may be made responsible for servicing requests from auto-complete search bar, and another endpoint may be servicing filter requests from the product catalogue. 

The endpoint configuration consists of the URL slug, a template to render search results (optional) and the endpoint configuration file with preset parameters which will apply to all queries to this endpoint. Basically, any Search API parameters can be set in the endpoint configuration file. 

Any valid URL Slug can be used as an endpoint name, except the following list of the reserved URLs:

If the endpoint is called "demo", it will have the following path to the endpoint configuration file:

./modules/facetedsearch3/templates/demo.json

... the results template:

./modules/facetedsearch3/templates/demo.html

... and the endpoint URL:

https://www.example.com/fsearch/demo
The Routing module allows to modify the base path of the FS module and change it from /fsearch/ to anything else (e.g. /catalog/).

Integration

There are three way to send requests to FS endpoints and process responses: integrate a template, make headless ajax calls or use a block function.

Integrating Endpoints Naturally 

Just integrate a template corresponding to the endpoint and use the Search Results object to display results, facets, pagination, etc. The Search Results object is available in the $search variable. 

Using Headless Requests

Leverage Standard JSON Response headless API via AJAX calls and work with the data directly

Using a Block Function

The block function facetedsearch3_search can be used to pull faceted search results into a template used by any other module. It accepts two arguments: "form" (mandatory) and "params" (optional). The first argument is the endpoint name, the second is an array with Search API request parameters. 

Sample use of the block function
<{$params = [
 'objects' => true,
 'recursive' => true,
 'category.slug' => $categorySlug
]}>
<{show_block module="facetedsearch3"
 block='search'
 form='demo'
 params=$params
 template_name='_blank'
}>
The block function does not take any parameters from HTTP request (i.e. passed in the URL), it only works with the parameters passed to the block function in the params argument and the endpoint configuration file.

Search Results object

Search Parameters and Configuration Files

Any Faceted Search API parameters can be set in the endpoint JSON configuration file. These parameters will be fixed for the endpoint and cannot be overridden with URL parameters or block functions arguments. This is useful for securing some search parameters which are not meant to be manipulated by the end user. For example, the number of items per page or sorting criteria. 

There is also a way to specify the default parameter values in the configuration file - the "default" parameter. Any parameters passed under the "default" can be overridden by the search query.

Here is an example of JSON configuration file for the "demo" endpoint and HTTP request made to that endpoint:

demo.json
{
 "parent_category.slug""products",
 "size": 30,
 "no_facets"true,
 "fields": [
 "name^3",
 "description.short^2"
 ],
 "default": {
 "mode""autocomplete",
 "foo""bar"
 }
}
/fsearch/demo?size=1000&mode=standard&fields=category.name,brand

This request above will result in the following resulting query:

Resulting API Request
{
 "parent_category.slug""products",
 "size": 30,
 "no_facets"true,
 "fields": [
 "name^3",
 "description.short^2"
 ],
 "mode""standard",
 "foo""bar"
}

Use development mode to see the resulting requests.

Development mode

Development mode can be enabled by passing ?dev=1 in the URL or as an argument to the block function. It allows to:

Using Facted Search API

General Syntax

HTTP Requests

The Search API Request parameters can be passed either in the query string of HTTP Requests (e.g. ?param1=value1&param2=value2) or in the POST data.

Arrays

Some parameters accept multiple values. There are two notation for this: the PHP array style and comma-separated list. The following two requests are equivalent:

/fsearch/demo/?facets=size,color,brand
/fsearch/demo/?facets[]=size&facets[]=color&facets[]=brand

Endpoint Configuration Files

The endpoint configuration files use JSON format.

Basic Parameters

Search results normally contain an array of Facets with their values and matching Index Documents. Index Document is an associative array of fields and values which are stored in the Elasticsearch index. Sometimes this is enough to display test results, but sometimes it is required to work with the instances of the actual CatProduct classes. 

Returning Objects in Results

When objects parameter is set to true, the results will contain an array of CatProduct instances accessible through $search->getObjects() 

/fsearch/demo/?objects=true

Removing Products from Results

Sometimes there is no need to have any products data in search results at all (e.g. only facets are required). This can be achieved with no_products parameter set to true:

/fsearch/demo?no_products=true

The request above will return no products in search results, but the facets will be populated for the scope of the query.

Removing Facets from Results

Sometimes facets are not required in the search results (e.g. autocomplete function). This can be achieved with setting no_facets to true

/fsearch/demo?no_facets=true

The request above will effectively make all facets hidden. This means the facets will not be included in Elasticsearch query and will save some resources. This also means that $search->getFacets()->getVisible() and $search->getFacets()->getAvailable() will return an empty array. 

Removing Product Fields from Results

By default, faceted search retruns the whole document stored in Elasticsearch. Sometimes, only specifc fields are needed to be fetched. This is especially useful when the high volumes of search results payloads affect performance. With the return parameter it is possible to specify the list of Product fields to be returned.

/fsearch/demo?return=id,slug,name

Filter Queries

Filter query is a request in the form of ?field1=value1&field2=value2 sent to the configured endpoint. It will return products matching the filter. A filter can be used in conjunction with the full-text query.

Filtering by Facets

The following sample request will search for red, green or blue shoes in size "XL":

/fsearch/sample?color=red,green,blue&size=XL&category.slug=shoes
When a field is specified in the filter, but the value is not provided (e.g. ?field1=&field2=), the filter will return documents where this field exists, but set to null. If there is intention to exclude a field from filter, it needs to be omitted in the query. Alternatively, ignore_empty_filters parameter must be set to "true". 

Controlling Facets in Search Results

By default, all visible configured facets will be sent to Elastic search engine to get populated, though many facets may apply only to a subset of products, for example, "Spindle Length", "Spindle Diameter" and "Spindle Material" facets may only apply to the products in the "Spindles" category. Displaying all available facets in search results may be overwhelming and sometimes it is beneficial to hide some facets from view. This can be achieved with the facets parameter in the search query. It accepts a list of facets which will be populated by query and made available in the search results with $search->getFacets()->getVisible(). The facets parameter can make a hidden facet visible, if required. However, the no_facets=true parameter overrides any setting of the facets parameter and no facets will be displayed, 

Inclusive and Exclusive Queries

By default, all the field/value pairs in the filter request are exclusive (i.e. employ "AND" operator).

The request below will match products from the "Classic" collection made of stainless steel having extended warranty

/fsearch/products?collection=Classic&material=Stainless+Steel&warranty=Extended

The should parameter allows to apply "OR" operator to the specified list of fields. 

The request below will match products having extended warranty either from the "Classic" collection or made of stainless steel.

/fsearch/products?collection=Classic&material=Stainless+Steel&warranty=Extended&should=collection,material

The minimum_should_match parameter allows to control how many fields included in the `should` query must be a match. The default value is 1 which make the "should" queries work as the classic "OR" (but sorted by the number of matching fields). Setting minimum_should_match to a higher number will draw a thershold below which the search results will be excluded if they do not match the required number of fields. Setting the minimum_should_match to the number of fields used in the search filter will effectively make the query be the "AND" query.

Recursive Catalogue Lookup

Imagine the product catalogue has the following structure:

The following request will yield no results:

/fsearch/products?category.slug=shoes

This is because the "Shoes" category has no products in it – all products belong to the child categories of it, namely "Winter" and "Summer".

Certainly, it is possible to use any of the parent_category fields: 

/fsearch/products?parent_category.slug=shoes

But if we want to filter down just to the child category "Winter Shoes", the field name will have to change to category.slug.

To avoid this situation, the recursive parameter can be set to true which makes any search on the category fields inclusive of its child categories.

/fsearch/products?category.slug=shoes&recursive=true

The request above will return all products in "Shoes", "Winter Shoes" and "Summer Shoes" categories.

Full-Text Queries

Full-Text query is a search phrase passed to the q parameter of the request (e.g. ?q=quick+brown+fox). Full-text queries are performed on all analysed fields, which include text fields and those fields where fulltext property is set to true.

Search Modes

The search mode can be set using the optional mode parameter:

/fsearch/demo?q=quick+brown+f&mode=autocomplete

Currently, two search modes are supported:

standard

The standard search mode breaks down the search phrase into search terms by stripping off all punctuation and special characters, and splitting it by spaces. Then it applies several filters which:

The search phrase "A Cow Jumped over the Moon" will be converted into the following search tokens: "cow", "jump", "over", "moon". Then the tokens will be matched against the reverse index and the relevance of results will be determined on "the more matches the better" principle. According to the sensitivity setting, the least relevant results will be filtered out. According to the fuzziness setting the query can match some more products by expanding search terms into variants. For example, fuzzy logic can expand term "moon" into "noon", "soon", "mood", "moan", and even "moo" and "moron".

autocomplete

In autocomplete mode the search phrase is taken as a prefix and is matched against any documents containing this prefix within it. Though this mode does support stemming and some basic transformations, but fuzziness is not available in this mode. Also, the sensitivity setting has no effect. A search phrase "cow j" will match "cow jumped", "cow jogged", "cow jigged", but will not match "cow has jumped". Although "cow jumping o" search phrase will match "cow jumped over the moon", because autocomplete mode still supports stemming. 

Controlling Sensitivity of Full-Text Queries

Fuzziness

Fuzziness can be enabled for full-text queries with the fuzzy parameter. By default, it is off.

The fuzzy query uses similarity based on Levenshtein edit distance. Effectively, it allows to auto-correct one or two typos in the search term, depending on the term length, or even expand the search term by a few characters:

/fsearch/country_lookup?q=austrai&fuzzy=1

The query above will match "Austria" because the fuzzy logic can switch the last two letters. But it will also match "Australia" because the Elasticsearch engine can expand the original term by adding "l" and additional "a".

Sensitivity

The sensitivity parameter has a range of 1 to 100 with default value of 70. This parameter sets a percentage of search terms in the query (words excluding articles and grammar constructions) which must match at the minimum, rounded down. In case of 4 words with default sensitivity of 70%, only two words from the query need be matched for the product to appear in search results. If sensitivity is increased to 75%, then 3 words from the query must match.
Unlike traditional search engines which allow only "AND" and "OR" operators in the search phrase, the sensitivity parameter in Faceted Search allows to fine-tune the search behavior on a scale from 1 (effectively an "OR" query) to 100, which is effectively an "AND" query where all terms in the search phrase must match.

However, it is important to understand that not all words will be considered as search terms (see Search Modes). In the standard mode all high frequency words are removed.

/fsearch/demo?q=a+cow+jumped+over+the+moon&sensitivity=75

The query above will require at minimum three of the following four words to be found in the document: "cow", "jump", "over", "moon".

Search Fields and the Weights

By default, full-text queries are performed on all analysed fields (the text fields and the fields where fulltext property is set to true), and all fields have equal weight. So, a document containing "almonds" in the product's "name" field will have equal weight to the documents containing "almonds" in the "description" field. 

The fields parameter allows to specify the list of fields which will be used in the search and boost the weight of each field using caret ^ notation.

/fsearch/demo?q=almonds&fields=name^3,code^3,description.short^2,description.long

In the query above the search phrase "almonds" will be searched only in the four fields: code, name, description.short and description.long. But the "name" and the "code" will have the highest weight of all, followed by short description and the long description will have the least weight.

Controlling Search Results Scope Pagination

Pagination of results is controlled in two possible ways.

By specifying size and from

/fsearch/demo?size=16&from=32

OR by specifying size and page

/fsearch/demo?size=16&page=3

The two above statements are equivalent and display the 3rd page of the search results with 16 products displayed per page.


Development Environments and Debugging

The "/reindex" endpoint is absolutely safe to use in production, as it adds, updates or deletes the production in the index one by one and does not affect user's experience. 

Please note, there is also an option to pass the individual product id to the URL and reindex just one product, if it is required for testing. E.g. https://examplecom.corewebdna.com/fsearch/reindex/19435

The "/rebuild" endpoint should only be used in production whenever structural changes are made to the facets.json file. When index is rebuild is it is literally destroyed, created again and being populated with the data. So, all the products will disappear from search results for a moment and then start to appear, gradually. 

Changes to the configuration.html only affect how the data is mapped to the index, therefore do not require rebuilding of the index, just reindexing will do.

Staging environment (examplecom-stage.corewebdna.com) is not designed to test faceted search changes, as the staging environment is using the same index, as the production  (similar to the way it uses the production database). It should only be used to test templates, CSS and JS. Any changes to configuration.html  should be done in your development environment (something-agencyname.corewebdna.com)

Also, you can use preview URL to see how the product will appear in the index (note, it does not fetch product from the index, it shows the result of the mapping applied in configuration.html). This is the best way to test changes to configuration.html without reindexing anything. 

For example: https://examplecom-agencyname.corewebdna.com/fsearch/preview/19435

Sometimes HTML content on some fields may skew the output, so it is better to use "view source" mode or copy-paste JSON into some JSON editor.

Also, you can use the "debug" view to see any debug output you need from configuration.html file. Normally, in production, the configuration.html file must not produce any output (should just assign variables). If it does, it will cause error with indexing, but in development environment you may want to print the contents of some variable you use in the configuration.html file for debug purposes. In this case you should use the "debug" view: https://examplecom-agencyname.corewebdna.com/fsearch/debug/19435

 

Further Reading