Faceted Search 3.0
- About
- Configuration
- Facets Configuration
- Field Definition Properties
- Field Types
- Mapping of the Data
- Index Document Preview
- Debugging the Mapping
- Configuring Search Endpoint
- Integration
- Integrating Endpoints Naturally
- Using Headless Requests
- Using a Block Function
- Search Results object
- Search Parameters and Configuration Files
- Development mode
- Facets Configuration
- Using Faceted Search API
- General Syntax
- HTTP Requests
- Arrays
- Endpoint Configuration Files
- HTTP Requests
- Basic Parameters
- Returning Objects in Results
- Removing Products from Results
- Removing Facets from Results
- Removing Product Fields from Results
- Filter Queries
- Filtering by Facets
- Controlling Facets in Search Results
- Inclusive and Exclusive Queries
- Recursive Catalogue Lookup
- Full-Text Queries
- Search Modes
- standard
- autocomplete
- Controlling Sensitivity of Full-Text Queries
- Fuzziness
- Sensitivity
- Search Fields and the Weights
- Search Modes
- Controlling Search Results Scope
- Pagination
- The Standard Fields Definition
- General Syntax
- Development Environment
- Further Reading
About
Faceted Search v.3 uses Elasticsearch engine for indexing and provides flexible and redistributable way to configure the structure of the index and maintain it with real-time updates.
Configuration
To start using Faceted Search 3 (FS3) it is required to follow four simple steps:
- Define fields and facets using JSON configuration file
- Map catalogue data to index using a mapping template
- Configure one or more endpoints to accept requests
- Send requests to the configured endpoint
Facets Configuration
Each product in the Product Catalogue is stored in Elasticsearch engine as an Index Document. The structure of the Index Document is defined in JSON configuration file facets.json located at
./modules/facetedsearch3/facets.json
A typical configuration file looks like this:
Each field in the Index Document must be given a unique key, all properties are optional.
All facets are fields, but not all fields can be used as facets
Facets are the fields used for aggregation. When Faceted Search runs it produces a result set which contains a collection of products and collection of facets - the fields which appear in the result set along with the most frequent values of these fields. For example, "Colour" is a facet, and the values of this facets could be: "Red", "Green", "Blue", etc.
Some fields are just fields and are not used for aggregation, but may be used as a filter, for full-text search or just as a stored value which can be displayed in the results (e.g. blurb or description).
There are 16 standard or "core" fields, most of which are related to category. They do not need to be configured or mapped, and are ready for use:
- id
- code
- name
- slug
- price
- category.name
- category.slug
- category.id
- parent_category.name
- parent_category.slug
- parent_category.id
- category_path.name
- category_path.slug
- category_path.id
- description.short
- description.long
All core facets can be re-defined in facets.json file and the re-mapped in configuration.html, like any custom field/facet. Full definition of the core fields can be found here.
Field Definition Properties
Property | Default Value | Description |
---|---|---|
name | "Humanized" version of the key. E.g. "My Field" for "my_field" | Name of the field. |
display_name | The value of the "name" property | Displayed version of field name. It can be something like "Select a Colour" |
type | keyword | Defines how the value is stored in Elasticsearch index. The available types are discussed below. |
fulltext | false | When set to true , a text version of the same field will be created in Elasticsearch. This is useful when the same field (e.g. of keyword type) needs to be used as a facet and for full-text search at the same time. |
case_sensitive | true | By default, the values of keyword fields are indexed as is, which makes facet values case-sensitive, so a filter query ?size=XL will not match a document with size XL. This behavior can be changed with case_sensitive property set to false . |
hidden | false | All fields are assumed to be used as facets and appear in search result. If there is no intention to display the facet (perhaps, use this field only for filtering purposes), it can be hidden. |
values_limit | 10 | By default, the facet will display only top 10 most-popular values, This number can be adjusted with this property |
values | (empty array) | If a fixed list of facet values is required, it can be set with this property |
order | value count | By default, facet values are ordered by the count of matching documents with the most popular value coming first, The sorting order of facet values can be changed with this property. For possible values refer to Elasticsearch documentation. |
output_order | (none) | The output_order parameter allows to re-order facet values after they have been fetched from Elasticsearch. For example, by using the order and and the values_limit properties it is possible to fetch 10 most popular values, and then by applying output_order property, sort these 10 most popular facet values in alphabetical order. |
unit | (none) | Unit of Measure. The value can be any string, such a "in" or "mm". |
props | (empty array) | This property allows to pass an array with arbitrary data to the SearchResults object. The arbitrary data may include CSS class names or HTML attributes hinting the font-end about how to render the facet on the page. |
Field Types
Type | Use | Can Be Used as a Facet | Can be Used in Full-Text Search |
---|---|---|---|
keyword | The value is stored "as is". No analysis is applied. The default type for Faceted Search. | Yes (recommended) | No |
integer | Used to store whole numbers, such as ID values. | Yes | No |
double | Used to store floating point numbers, such as price. | Yes | No |
date | Used to store date fields | No | No |
boolean | true / false | Yes | No |
text | Used to store long strings of text for full-text search | No | Yes |
Mapping of the Data
Product data is mapped to the index document using a mapping template in the following location:
./modules/facetedsearch3/templates/configuration.html
The purpose of this template is to map the relevant data from the $product
object to the $index
array which represents the Index Document:
<{* Typically, product custom fields can be mapped to the index field *}> <{ $index [ 'rating' ] = $product ->getCustomFieldValue( 'Rating' )}> <{* With the use of basic Smarty functions it is possible to expand comma-separated list into an array ... *}> <{ $index [ 'retailers' ] = array_map ( 'trim' , explode ( ',' , $product ->getCustomFieldValue( 'Show_Retailers' )))}> <{* ... or create a range from the "start" and the "end" numbers *}> <{ $index [ 'year_range' ] = range( $product ->getCustomFieldValue( 'year_start' ), $product ->getCustomFieldValue( 'year_end' ))}> <{ $index [ 'tags' ] = ( $product ->getTags()) ? array_map ( 'trim' , explode ( ',' , $product ->getTags())) : null}> <{* The values should be set to null if they are not available *}> <{ $index [ 'brand' ] = ( $product ->getBrand()) ? $product ->getBrand()->getName() : null}> <* A field/facet in the index can be a completely new thing based on some calculations *> <{ $index [ 'on_sale' ] = ( $product ->getFinalPrice() < $product ->getBasePrice())}> <{ foreach $product ->getRelationInfo(true) as $relation }> <{ if $relation .category eq 'fits' }> <{ $something = $relation .description| strtolower }> <{ $index [ 'relation' ][ $something ][] = $relation .product->getName()}> <{/ if }> <{/ foreach }> |
Even the core fields can be re-assigned, if needed:
<{ $index [ 'code' ] = [ $index .code, $product ->getCustomFieldValue( 'ProxyCode' ), $product ->getCustomFieldValue( 'VIN' ), $product ->getCustomFieldValue( 'ISDN' ), ]}> |
Index Document Preview
To ensure the facets have been configured and correctly, there is a way to see how the data is mapped to the Index Document, There is a special endpoint to preview the Index Document based on the ID or URL Slug of the corresponding product:
/fsearch/preview/{slug-or-id}
The URL below will display an Index Document in JSON format for the product with URL Slug "my-test-product":
/fsearch/preview/my-test-product
If a product is not published, does not have public view permissions, SEO settings prevent it from indexing, or for whatever other reason it cannot be indexed, an error will be displayed.
To see the actual reason why the product cannot be indexed, development mode can be turned on and the actual reason will be displayed:
/fsearch/preview/my-test-product?dev=1
Debugging the Mapping
The mapping template, like any other Smarty template, can produce errors, or there might be a need to print out some variables. Faceted Search has a special endpoint to print output of the mapping template for ID or URL Slug of the corresponding product:
/fsearch/debug/{slug-or-id}
The URL below will display any output from the mapping template for the product with URL Slug "my-test-product":
/fsearch/debug/my-test-product
The mapping template is only used to assign values. It must not produce any output! Use /fsearch/debug endpoint to ensure no output is made by it.
Configuring Search Endpoint
Search Endpoint is a profile used to accept search queries of a the same type. For example, one endpoint may be made responsible for servicing requests from auto-complete search bar, and another endpoint may be servicing filter requests from the product catalogue.
The endpoint configuration consists of the URL slug, a template to render search results (optional) and the endpoint configuration file with preset parameters which will apply to all queries to this endpoint. Basically, any Search API parameters can be set in the endpoint configuration file.
Any valid URL Slug can be used as an endpoint name, except the following list of the reserved URLs:
- preview
- debug
- reindex
- rebuild
- configuration
If the endpoint is called "demo", it will have the following path to the endpoint configuration file:
./modules/facetedsearch3/templates/demo.json
... the results template:
./modules/facetedsearch3/templates/demo.html
... and the endpoint URL:
https://www.example.com/fsearch/demo
Integration
There are three way to send requests to FS endpoints and process responses: integrate a template, make headless ajax calls or use a block function.
Integrating Endpoints Naturally
Just integrate a template corresponding to the endpoint and use the Search Results object to display results, facets, pagination, etc. The Search Results object is available in the $search
variable.
Using Headless Requests
Leverage Standard JSON Response headless API via AJAX calls and work with the data directly
Using a Block Function
The block function facetedsearch3_search can be used to pull faceted search results into a template used by any other module. It accepts two arguments: "form" (mandatory) and "params" (optional). The first argument is the endpoint name, the second is an array with Search API request parameters.
<{ $params = [ 'objects' => true, 'recursive' => true, 'category.slug' => $categorySlug ]}> <{show_block module= "facetedsearch3" block= 'search' form= 'demo' params= $params template_name= '_blank' }> |
Search Results object
Search Parameters and Configuration Files
Any Faceted Search API parameters can be set in the endpoint JSON configuration file. These parameters will be fixed for the endpoint and cannot be overridden with URL parameters or block functions arguments. This is useful for securing some search parameters which are not meant to be manipulated by the end user. For example, the number of items per page or sorting criteria.
There is also a way to specify the default parameter values in the configuration file - the "default" parameter. Any parameters passed under the "default" can be overridden by the search query.
Here is an example of JSON configuration file for the "demo" endpoint and HTTP request made to that endpoint:
{ "parent_category.slug" : "products" , "size" : 30, "no_facets" : true , "fields" : [ "name^3" , "description.short^2" ], "default" : { "mode" : "autocomplete" , "foo" : "bar" } } |
/fsearch/demo?size=1000&mode=standard&fields=category.name,brand
This request above will result in the following resulting query:
{ "parent_category.slug" : "products" , "size" : 30, "no_facets" : true , "fields" : [ "name^3" , "description.short^2" ], "mode" : "standard" , "foo" : "bar" } |
Use development mode to see the resulting requests.
Development mode
Development mode can be enabled by passing ?dev=1 in the URL or as an argument to the block function. It allows to:
- see more verbose errors
- see raw Elasticsearch query
Using Facted Search API
General Syntax
HTTP Requests
The Search API Request parameters can be passed either in the query string of HTTP Requests (e.g. ?param1=value1¶m2=value2) or in the POST data.
Arrays
Some parameters accept multiple values. There are two notation for this: the PHP array style and comma-separated list. The following two requests are equivalent:
/fsearch/demo/?facets=size,color,brand
/fsearch/demo/?facets[]=size&facets[]=color&facets[]=brand
Endpoint Configuration Files
The endpoint configuration files use JSON format.
Basic Parameters
Search results normally contain an array of Facets with their values and matching Index Documents. Index Document is an associative array of fields and values which are stored in the Elasticsearch index. Sometimes this is enough to display test results, but sometimes it is required to work with the instances of the actual CatProduct classes.
Returning Objects in Results
When objects parameter is set to true, the results will contain an array of CatProduct
instances accessible through $search->getObjects()
/fsearch/demo/?objects=true
Removing Products from Results
Sometimes there is no need to have any products data in search results at all (e.g. only facets are required). This can be achieved with no_products parameter set to true:
/fsearch/demo?no_products=true
The request above will return no products in search results, but the facets will be populated for the scope of the query.
Removing Facets from Results
Sometimes facets are not required in the search results (e.g. autocomplete function). This can be achieved with setting no_facets to true
/fsearch/demo?no_facets=true
The request above will effectively make all facets hidden. This means the facets will not be included in Elasticsearch query and will save some resources. This also means that $search->getFacets()->getVisible()
and $search->getFacets()->getAvailable()
will return an empty array.
Removing Product Fields from Results
By default, faceted search retruns the whole document stored in Elasticsearch. Sometimes, only specifc fields are needed to be fetched. This is especially useful when the high volumes of search results payloads affect performance. With the return parameter it is possible to specify the list of Product fields to be returned.
/fsearch/demo?return=id,slug,name
Filter Queries
Filter query is a request in the form of ?field1=value1&field2=value2 sent to the configured endpoint. It will return products matching the filter. A filter can be used in conjunction with the full-text query.
Filtering by Facets
- It is possible to filter by multiple values of the same facet. In this case the "OR" logic will apply to the values
- When two facets are used in the query, the "AND" logic is applies by default (unless the fields are listed as the should fields)
- Hidden facets (fields) can be used for filtering too
The following sample request will search for red, green or blue shoes in size "XL":
/fsearch/sample?color=red,green,blue&size=XL&category.slug=shoes
null
. If there is intention to exclude a field from filter, it needs to be omitted in the query. Alternatively, ignore_empty_filters parameter must be set to "true". Controlling Facets in Search Results
By default, all visible configured facets will be sent to Elastic search engine to get populated, though many facets may apply only to a subset of products, for example, "Spindle Length", "Spindle Diameter" and "Spindle Material" facets may only apply to the products in the "Spindles" category. Displaying all available facets in search results may be overwhelming and sometimes it is beneficial to hide some facets from view. This can be achieved with the facets parameter in the search query. It accepts a list of facets which will be populated by query and made available in the search results with $search->getFacets()->getVisible()
. The facets parameter can make a hidden facet visible, if required. However, the no_facets=true parameter overrides any setting of the facets parameter and no facets will be displayed,
Inclusive and Exclusive Queries
By default, all the field/value pairs in the filter request are exclusive (i.e. employ "AND" operator).
The request below will match products from the "Classic" collection made of stainless steel having extended warranty
/fsearch/products?collection=Classic&material=Stainless+Steel&warranty=Extended
The should parameter allows to apply "OR" operator to the specified list of fields.
The request below will match products having extended warranty either from the "Classic" collection or made of stainless steel.
/fsearch/products?collection=Classic&material=Stainless+Steel&warranty=Extended&should=collection,material
The minimum_should_match parameter allows to control how many fields included in the `should` query must be a match. The default value is 1 which make the "should" queries work as the classic "OR" (but sorted by the number of matching fields). Setting minimum_should_match to a higher number will draw a thershold below which the search results will be excluded if they do not match the required number of fields. Setting the minimum_should_match to the number of fields used in the search filter will effectively make the query be the "AND" query.
Recursive Catalogue Lookup
Imagine the product catalogue has the following structure:
- Home
- Shoes
- Winter Shoes
- Summer Shoes
- Knitware
- Shoes
The following request will yield no results:
/fsearch/products?category.slug=shoes
This is because the "Shoes" category has no products in it – all products belong to the child categories of it, namely "Winter" and "Summer".
Certainly, it is possible to use any of the parent_category
fields:
/fsearch/products?parent_category.slug=shoes
But if we want to filter down just to the child category "Winter Shoes", the field name will have to change to category.slug
.
To avoid this situation, the recursive parameter can be set to true which makes any search on the category
fields inclusive of its child categories.
/fsearch/products?category.slug=shoes&recursive=true
The request above will return all products in "Shoes", "Winter Shoes" and "Summer Shoes" categories.
Full-Text Queries
Full-Text query is a search phrase passed to the q parameter of the request (e.g. ?q=quick+brown+fox). Full-text queries are performed on all analysed fields, which include text
fields and those fields where fulltext
property is set to true
.
Search Modes
The search mode can be set using the optional mode parameter:
/fsearch/demo?q=quick+brown+f&mode=autocomplete
Currently, two search modes are supported:
standard
autocomplete
standard
The standard
search mode breaks down the search phrase into search terms by stripping off all punctuation and special characters, and splitting it by spaces. Then it applies several filters which:
- Convert all accents, such as acute (é) or grave (è), to their base form (e)
- Convert all words to their base form (stemming)
- Remove any high-frequency stop words, such as articles
- Remove any apostrophes
- Remove any HTML tags
- convert everything to lowercase
The search phrase "A Cow Jumped over the Moon" will be converted into the following search tokens: "cow", "jump", "over", "moon". Then the tokens will be matched against the reverse index and the relevance of results will be determined on "the more matches the better" principle. According to the sensitivity setting, the least relevant results will be filtered out. According to the fuzziness setting the query can match some more products by expanding search terms into variants. For example, fuzzy logic can expand term "moon" into "noon", "soon", "mood", "moan", and even "moo" and "moron".
autocomplete
In autocomplete
mode the search phrase is taken as a prefix and is matched against any documents containing this prefix within it. Though this mode does support stemming and some basic transformations, but fuzziness is not available in this mode. Also, the sensitivity setting has no effect. A search phrase "cow j" will match "cow jumped", "cow jogged", "cow jigged", but will not match "cow has jumped". Although "cow jumping o" search phrase will match "cow jumped over the moon", because autocomplete
mode still supports stemming.
Controlling Sensitivity of Full-Text Queries
Fuzziness
Fuzziness can be enabled for full-text queries with the fuzzy parameter. By default, it is off.
The fuzzy query uses similarity based on Levenshtein edit distance. Effectively, it allows to auto-correct one or two typos in the search term, depending on the term length, or even expand the search term by a few characters:
/fsearch/country_lookup?q=austrai&fuzzy=1
The query above will match "Austria" because the fuzzy logic can switch the last two letters. But it will also match "Australia" because the Elasticsearch engine can expand the original term by adding "l" and additional "a".
Sensitivity
The sensitivity parameter has a range of 1 to 100 with default value of 70. This parameter sets a percentage of search terms in the query (words excluding articles and grammar constructions) which must match at the minimum, rounded down. In case of 4 words with default sensitivity of 70%, only two words from the query need be matched for the product to appear in search results. If sensitivity is increased to 75%, then 3 words from the query must match.
Unlike traditional search engines which allow only "AND" and "OR" operators in the search phrase, the sensitivity parameter in Faceted Search allows to fine-tune the search behavior on a scale from 1 (effectively an "OR" query) to 100, which is effectively an "AND" query where all terms in the search phrase must match.
However, it is important to understand that not all words will be considered as search terms (see Search Modes). In the standard
mode all high frequency words are removed.
/fsearch/demo?q=a+cow+jumped+over+the+moon&sensitivity=75
The query above will require at minimum three of the following four words to be found in the document: "cow", "jump", "over", "moon".
Search Fields and the Weights
By default, full-text queries are performed on all analysed fields (the text
fields and the fields where fulltext
property is set to true
), and all fields have equal weight. So, a document containing "almonds" in the product's "name" field will have equal weight to the documents containing "almonds" in the "description" field.
The fields parameter allows to specify the list of fields which will be used in the search and boost the weight of each field using caret ^ notation.
/fsearch/demo?q=almonds&fields=name^3,code^3,description.short^2,description.long
In the query above the search phrase "almonds" will be searched only in the four fields: code, name, description.short and description.long. But the "name" and the "code" will have the highest weight of all, followed by short description and the long description will have the least weight.
Controlling Search Results Scope Pagination
Pagination of results is controlled in two possible ways.
By specifying size and from:
/fsearch/demo?size=16&from=32
OR by specifying size and page:
/fsearch/demo?size=16&page=3
The two above statements are equivalent and display the 3rd page of the search results with 16 products displayed per page.
Development Environments and Debugging
The "/reindex" endpoint is absolutely safe to use in production, as it adds, updates or deletes the production in the index one by one and does not affect user's experience.
Please note, there is also an option to pass the individual product id to the URL and reindex just one product, if it is required for testing. E.g. https://examplecom.corewebdna.com/fsearch/reindex/19435
The "/rebuild" endpoint should only be used in production whenever structural changes are made to the facets.json file. When index is rebuild is it is literally destroyed, created again and being populated with the data. So, all the products will disappear from search results for a moment and then start to appear, gradually.
Changes to the configuration.html only affect how the data is mapped to the index, therefore do not require rebuilding of the index, just reindexing will do.
Staging environment (examplecom-stage.corewebdna.com) is not designed to test faceted search changes, as the staging environment is using the same index, as the production (similar to the way it uses the production database). It should only be used to test templates, CSS and JS. Any changes to configuration.html should be done in your development environment (something-agencyname.corewebdna.com)
Also, you can use preview URL to see how the product will appear in the index (note, it does not fetch product from the index, it shows the result of the mapping applied in configuration.html). This is the best way to test changes to configuration.html without reindexing anything.
For example: https://examplecom-agencyname.corewebdna.com/fsearch/preview/19435
Sometimes HTML content on some fields may skew the output, so it is better to use "view source" mode or copy-paste JSON into some JSON editor.
Also, you can use the "debug" view to see any debug output you need from configuration.html file. Normally, in production, the configuration.html file must not produce any output (should just assign variables). If it does, it will cause error with indexing, but in development environment you may want to print the contents of some variable you use in the configuration.html file for debug purposes. In this case you should use the "debug" view: https://examplecom-agencyname.corewebdna.com/fsearch/debug/19435