Elastic Site Search
- About
- Site-Search
- Search Items
- Configuration
- Fields Configuration
- Field Definition Properties
- Field Types
- Mapping of the Data
- Index Document Preview
- Debugging the Mapping
- Configuring Search Endpoint
- Integration
- Integrating Endpoints Naturally
- Using Headless Requests
- Using a Block Function
- Search Results object
- Search Parameters and Configuration Files
- Develop/ment mode
- Fields Configuration
- Using Fielded Search API
- General Syntax
- HTTP Requests
- Arrays
- Endpoint Configuration Files
- HTTP Requests
- Basic Parameters
- Returning Objects in Results
- Removing Products from Results
- Removing fields from Results
- Filter Queries
- Filtering by fields
- Controlling fields in Search Results
- Inclusive and Exclusive Queries
- Full-Text Queries
- Search Modes
- standard
- autocomplete
- Controlling the Sensitivity of Full-Text Queries
- Fuzziness
- Sensitivity
- Search Fields and the Weights
- Search Modes
- Controlling Search Results Scope
- Pagination
- The Standard Fields Definition
- General Syntax
- Development Environment
- Template Examples;
- search.html
- search-item.html
- search.json
- Further Reading
About
Elastic Site Search (ESS) uses Elasticsearch engine for indexing and provides a flexible and redistributable way to configure the structure of the index and maintain it with real-time updates.
Site Search
The following items within Core dna are indexed:
Module | Object Returned | Endpoint |
---|---|---|
Pages | Page | /search/pages/action |
Blogs | Blog, BlogPost | /search/blogs/action |
Faq (Help) | Help | /search/help/action |
Products | CatProduct | /search/products/action |
In order to use ESS, simply navigate to "yoursite.com/search", this URL will by default load the search.html in the site directory (modules/search/templates/search.html)
Every time you update an item from the above modules, it will trigger an ESS update. This allows us to keep all records up to date with the search. In order to trigger certain entities ESS actions (eg: reindex, preview) you need to specify which module endpoint, refer to the table above.
Refer to the Templates section below to see some example code.
Search Items
An "item" is a result of one of the listed modules above, they are returned when a search term is executed. Items know their own module, using the method: $item→getType().
It is also possible to define Highlights, this functionality will show snippets of matching text blurbs per item. They can be checked using the method: $item→getHighlights().
When displaying results, you can use $search→getGroupedItems() to return each result grouped by their module. You can also use the normal $search→getResults() which will return the weighted item results.
If you need more information relating to a search result, you can call the function $item→getObject(). This will return the original object related to this search item, for example; return the full product object. This will allow you to call all the functionality on the normal object. Be aware that this should only be done if needed, as every call to fetch these objects can be expensive.
Configuration
To start using Elastic Site Search (EES) it is required to follow four simple steps:
- Define fields using JSON configuration file
- Map catalogue data to index using a mapping template
- Configure one or more endpoints to accept requests
- Send requests to the configured endpoint
Fields Configuration
Each item in the supported modules is stored in Elasticsearch engine as an Index Document. The structure of the Index Document is defined in JSON configuration file fields.json located at
./modules/search/fields.json
A typical configuration file looks like this:
Each field in the Index Document must be given a unique key, all properties are optional. There are 16 standard or "core" fields. They do not need to be configured or mapped, and are ready for use:
- id
- type
- item_id
- name
- url
- blurb
- content
- description.short
- description.long
All core fields can be re-defined in fields.json file and the re-mapped in configuration.html, like any custom field.
Field Definition Properties
Property | Default Value | Description |
---|---|---|
name | "Humanized" version of the key. E.g. "My Field" for "my_field" | Name of the field. |
display_name | The value of the "name" property | Displayed version of field name. It can be something like "Select a Colour" |
type | keyword | Defines how the value is stored in Elasticsearch index. The available types are discussed below. |
fulltext | false | When set to true , a text version of the same field will be created in Elasticsearch. This is useful when the same field (e.g. of keyword type) needs to be used as a field and for full-text search at the same time. |
case_sensitive | true | By default, the values of keyword fields are indexed as is, which makes field values case-sensitive, so a filter query ?size=XL will not match a document with size XL. This behavior can be changed with case_sensitive property set to false . |
hidden | false | All fields are assumed to be used as fields and appear in search result. If there is no intention to display the field (perhaps, use this field only for filtering purposes), it can be hidden. |
values_limit | 10 | By default, the field will display only top 10 most-popular values, This number can be adjusted with this property |
values | (empty array) | If a fixed list of field values is required, it can be set with this property |
order | value count | By default, field values are ordered by the count of matching documents with the most popular value coming first, The sorting order of field values can be changed with this property. For possible values refer to Elasticsearch documentation. |
output_order | (none) | The output_order parameter allows to re-order field values after they have been fetched from Elasticsearch. For example, by using the order and and the values_limit properties it is possible to fetch 10 most popular values, and then by applying output_order property, sort these 10 most popular field values in alphabetical order. |
unit | (none) | Unit of Measure. The value can be any string, such a "in" or "mm". |
props | (empty array) | This property allows to pass an array with arbitrary data to the SearchResults object. The arbitrary data may include CSS class names or HTML attributes hinting the font-end about how to render the field on the page. |
Field Types
Type | Use | Can Be Used as a field | Can be Used in Full-Text Search |
---|---|---|---|
keyword | The value is stored "as is". No analysis is applied. The default type for fielded Search. | Yes (recommended) | No |
integer | Used to store whole numbers, such as ID values. | Yes | No |
double | Used to store floating-point numbers, such as price. | Yes | No |
date | Used to store date fields | No | No |
boolean | true / false | Yes | No |
text | Used to store long strings of text for full-text search | No | Yes |
Mapping of the Data
Item data is mapped to the index document using a mapping template in the following location:
./modules/search/templates/configuration.html
The purpose of this template is to map the relevant data from the $item
object to the $index
array which represents the Index Document:
<{* Typically, product custom fields can be mapped to the index field *}> <{ $index [ 'rating' ] = $item ->getCustomFieldValue( 'Rating' )}> <{* With the use of basic Smarty functions it is possible to expand comma-separated list into an array ... *}> <{ $index [ 'retailers' ] = array_map ( 'trim' , explode ( ',' , $item ->getCustomFieldValue( 'Show_Retailers' )))}> <{* ... or create a range from the "start" and the "end" numbers *}> <{ $index [ 'year_range' ] = range( $item ->getCustomFieldValue( 'year_start' ), $item ->getCustomFieldValue( 'year_end' ))}> <{ $index [ 'tags' ] = ( $item ->getTags()) ? array_map ( 'trim' , explode ( ',' , $item ->getTags())) : null}> <{* The values should be set to null if they are not available *}> <{ $index [ 'brand' ] = ( $item ->getBrand()) ? $item ->getBrand()->getName() : null}> <* A field/field in the index can be a completely new thing based on some calculations *> <{ $index [ 'on_sale' ] = ( $item ->getFinalPrice() < $item ->getBasePrice())}> <{ foreach $item ->getRelationInfo(true) as $relation }> <{ if $relation .category eq 'fits' }> <{ $something = $relation .description| strtolower }> <{ $index [ 'relation' ][ $something ][] = $relation .product->getName()}> <{/ if }> <{/ foreach }> |
Even the core fields can be re-assigned if needed:
<{ $index [ 'code' ] = [ $index .code, $item ->getCustomFieldValue( 'ProxyCode' ), $item ->getCustomFieldValue( 'VIN' ), $item ->getCustomFieldValue( 'ISDN' ), ]}> |
Index Document Preview
To ensure the fields have been configured and correctly, there is a way to see how the data is mapped to the Index Document, There is a special endpoint to preview the Index Document based on the ID or URL Slug of the corresponding product:
/search/preview/products/{slug-or-id}
The URL below will display an Index Document in JSON format for the product with URL Slug "my-test-product":
/search/preview/products/my-test-product
If a product is not published, does not have public view permissions, SEO settings prevent it from indexing, or for whatever other reason it cannot be indexed, an error will be displayed.
To see the actual reason why the product cannot be indexed, development mode can be turned on and the actual reason will be displayed:
/search/preview/products/my-test-product?dev=1
Debugging the Mapping
The mapping template, like any other Smarty template, can produce errors, or there might be a need to print out some variables. fielded Search has a special endpoint to print output of the mapping template for ID or URL Slug of the corresponding product:
/search/debug/products/{slug-or-id}
The URL below will display any output from the mapping template for the product with URL Slug "my-test-product":
/search/debug/products/my-test-product
The mapping template is only used to assign values. It must not produce any output! Use /search/debug endpoint to ensure no output is made by it.
Configuring Search Endpoint
Search Endpoint is a profile used to accept search queries of the same type. For example, one endpoint may be made responsible for servicing requests from the auto-complete search bar, and another endpoint may be servicing filter requests from the product catalogue.
The endpoint configuration consists of the URL slug, a template to render search results (optional) and the endpoint configuration file with preset parameters which will apply to all queries to this endpoint. Basically, any Search API parameters can be set in the endpoint configuration file.
Any valid URL Slug can be used as an endpoint name, except the following list of the reserved URLs:
- preview
- debug
- reindex
- rebuild
- configuration
mysite.com/search/reindex/products mysite.com/search/reindex/pages mysite.com/search/rebuild mysite.com/search/preview/products/1234
If the endpoint is called "demo", it will have the following path to the endpoint configuration file:
./modules/search/templates/demo.json
... the results template:
./modules/search/templates/demo.html
... and the endpoint URL:
https://www.example.com/search/demo
Integration
There are three ways to send requests to FS endpoints and process responses: integrate a template, make headless ajax calls or use a block function.
Integrating Endpoints Naturally
Just integrate a template corresponding to the endpoint and use the Search Results object to display results, fields, pagination, etc. The Search Results object is available in the $search
variable.
Using Headless Requests
Leverage Standard JSON Response headless API via AJAX calls and work with the data directly
Using a Block Function
The block function search_search can be used to pull fielded search results into a template used by any other module. It accepts two arguments: "form" (mandatory) and "params" (optional). The first argument is the endpoint name, the second is an array with Search API request parameters.
<{ $params = [ 'objects' => true, 'recursive' => true, 'category.slug' => $categorySlug ]}> <{show_block module= "search" block= 'search' form= 'demo' params= $params template_name= '_blank' }> |
Search Results object
Search Parameters and Configuration Files
Any fielded Search API parameters can be preset in the endpoint JSON configuration file. These parameters will be used as default values for the endpoint and can be overwritten by the parameters passed in the URL or arguments of the block function. This is valid for all values except the array. The arrays are being merged.
Here is an example of JSON configuration file for the "demo" endpoint and HTTP request made to that endpoint:
{ "parent_category.slug" : "products" , "size" : 30, "mode" : "autocomplete" , "no_fields" : true , "fields" : [ "name^3" , "description.short^2" ] } |
/search/demo?size=12&mode=standard&fields=category.name,brand
This request above will result in the following resulting query:
{ "parent_category.slug" : "products" , "size" : 12, "mode" : "standard" , "no_fields" : true , "fields" : [ "name^3" , "description.short^2" , "category.name" , "brand" ] } |
Use development mode to see the resulting requests.
Development mode
Development mode can be enabled by passing ?dev=1 in the URL or as an argument to the block function. It allows to:
- see more verbose errors
- see raw Elasticsearch query
Using Fielded Search API
General Syntax
HTTP Requests
The Search API Request parameters can be passed either in the query string of HTTP Requests (e.g. ?param1=value1¶m2=value2) or in the POST data.
Arrays
Some parameters accept multiple values. There are two notation for this: the PHP array style and comma-separated list. The following two requests are equivalent:
/search/demo/?fields=size,color,brand
/search/demo/?fields[]=size&fields[]=color&fields[]=brand
Endpoint Configuration Files
The endpoint configuration files use JSON format.
Basic Parameters
Search results normally contain an array of fields with their values and matching Index Documents. Index Document is an associative array of fields and values which are stored in the Elasticsearch index. Sometimes this is enough to display test results, but sometimes it is required to work with the instances of the actual object classes.
Returning Objects in Results
When objects parameter is set to true, the results will contain an array of CatProduct
instances accessible through $search->getObjects()
/search/demo/?objects=true
Removing Products from Results
Sometimes there is no need to have any products data in search results at all (e.g. only fields are required). This can be achieved with no_products parameter set to true:
/search/demo?no_products=true
The request above will return no products in search results, but the fields will be populated for the scope of the query.
Removing fields from Results
Sometimes fields are not required in the search results (e.g. autocomplete function). This can be achieved with setting no_fields to true
/search/demo?no_fields=true
The request above will effectively make all fields hidden. This means the fields will not be included in Elasticsearch query and will save some resources. This also means that $search->getfields()->getVisible()
and $search->getfields()->getAvailable()
will return an empty array.
Filter Queries
Filter query is a request in the form of ?field1=value1&field2=value2 sent to the configured endpoint. It will return products matching the filter. A filter can be used in conjunction with the full-text query.
Filtering by fields
- It is possible to filter by multiple values of the same field. In this case the "OR" logic will apply to the values
- When two fields are used in the query, the "AND" logic is applied by default (unless the fields are listed as the should fields)
- Hidden fields (fields) can be used for filtering too
The following sample request will search for red, green or blue shoes in size "XL":
/search/sample?color=red,green,blue&size=XL&category.slug=shoes
Use the filter of type to handle multiple or single searches to a certain module:
/search/sample?type=products&q=mytestsearch /search/sample?type=products,faq,blogs&q=mytestsearch
null
. If there is intention to exclude a field from filter, it needs to be omitted in the query. Alternatively, ignore_empty_filters parameter must be set to "true". Controlling fields in Search Results
By default, all visible configured fields will be sent to Elastic search engine to get populated, though many fields may apply only to a subset of products, for example, "Spindle Length", "Spindle Diameter" and "Spindle Material" fields may only apply to the products in the "Spindles" category. Displaying all available fields in search results may be overwhelming and sometimes it is beneficial to hide some fields from view. This can be achieved with the fields parameter in the search query. It accepts a list of fields which will be populated by query and made available in the search results with $search->getfields()->getVisible()
. The fields parameter can make a hidden field visible if required. However, the no_fields=true parameter overrides any setting of the fields parameter and no fields will be displayed,
Inclusive and Exclusive Queries
By default, all the field/value pairs in the filter request are exclusive (i.e. employ "AND" operator).
The request below will match products from the "Classic" collection made of stainless steel having extended warranty
/search/products?collection=Classic&material=Stainless+Steel&warranty=Extended
The should parameter allows to apply "OR" operator to the specified list of fields.
The request below will match products having extended warranty either from the "Classic" collection or made of stainless steel.
/search/products?collection=Classic&material=Stainless+Steel&warranty=Extended&should=collection,material
Full-Text Queries
Full-Text query is a search phrase passed to the q parameter of the request (e.g. ?q=quick+brown+fox). Full-text queries are performed on all analysed fields, which include text
fields and those fields where fulltext
property is set to true
.
Search Modes
The search mode can be set using the optional mode parameter:
/search/demo?q=quick+brown+f&mode=autocomplete
Currently, two search modes are supported:
standard
autocomplete
standard
The standard
search mode breaks down the search phrase into search terms by stripping off all punctuation and special characters, and splitting it by spaces. Then it applies several filters which:
- Convert all accents, such as acute (é) or grave (è), to their base form (e)
- Convert all words to their base form (stemming)
- Remove any high-frequency stop words, such as articles
- Remove any apostrophes
- Remove any HTML tags
- convert everything to lowercase
The search phrase "A Cow Jumped over the Moon" will be converted into the following search tokens: "cow", "jump", "over", "moon". Then the tokens will be matched against the reverse index and the relevance of results will be determined on "the more matches the better" principle. According to the sensitivity setting, the least relevant results will be filtered out. According to the fuzziness setting the query can match some more products by expanding search terms into variants. For example, fuzzy logic can expand term "moon" into "noon", "soon", "mood", "moan", and even "moo" and "moron".
autocomplete
In autocomplete
mode the search phrase is taken as a prefix and is matched against any documents containing this prefix within it. Though this mode does support stemming and some basic transformations, but fuzziness is not available in this mode. Also, the sensitivity setting has no effect. A search phrase "cow j" will match "cow jumped", "cow jogged", "cow jigged", but will not match "cow has jumped". Although "cow jumping o" search phrase will match "cow jumped over the moon", because autocomplete
mode still supports stemming.
Controlling the Sensitivity of Full-Text Queries
Fuzziness
Fuzziness can be enabled for full-text queries with the fuzzy parameter. By default, it is off.
The fuzzy query uses similarity based on Levenshtein edit distance. Effectively, it allows to auto-correct one or two typos in the search term, depending on the term length, or even expand the search term by a few characters:
/search/country_lookup?q=austrai&fuzzy=1
The query above will match "Austria" because the fuzzy logic can switch the last two letters. But it will also match "Australia" because the Elasticsearch engine can expand the original term by adding "l" and additional "a".
Sensitivity
The sensitivity parameter has a range of 1 to 100 with default value of 70. This parameter sets a percentage of search terms in the query (words excluding articles and grammar constructions) which must match at the minimum, rounded down. In case of 4 words with default sensitivity of 70%, only two words from the query need be matched for the product to appear in search results. If sensitivity is increased to 75%, then 3 words from the query must match.
Unlike traditional search engines which allow only "AND" and "OR" operators in the search phrase, the sensitivity parameter in fielded Search allows to fine-tune the search behavior on a scale from 1 (effectively an "OR" query) to 100, which is effectively an "AND" query where all terms in the search phrase must match.
However, it is important to understand that not all words will be considered as search terms (see Search Modes). In the standard
mode all high frequency words are removed.
/search/demo?q=a+cow+jumped+over+the+moon&sensitivity=75
The query above will require at minimum three of the following four words to be found in the document: "cow", "jump", "over", "moon".
Search Fields and the Weights
By default, full-text queries are performed on all analysed fields (the text
fields and the fields where fulltext
property is set to true
), and all fields have equal weight. So, a document containing "almonds" in the product's "name" field will have equal weight to the documents containing "almonds" in the "description" field.
The fields parameter allows to specify the list of fields which will be used in the search and boost the weight of each field using caret ^ notation.
/search/demo?q=almonds&fields=name^3,code^3,description.short^2,description.long
In the query above the search phrase "almonds" will be searched only in the four fields: code, name, description.short and description.long. But the "name" and the "code" will have the highest weight of all, followed by short description and the long description will have the least weight.
Controlling Search Results Scope
Pagination
Pagination of results is controlled in two possible ways.
By specifying size and from:
/search/demo?size=16&from=32
OR by specifying size and page:
/search/demo?size=16&page=3
The two above statements are equivalent and display the 3rd page of the search results with 16 products displayed per page.
Sorting
The Standard Fields Definition
Development Environment
Elastic Site Search is using worker scripts running in the background and re-indexing
Template Examples;
A simple search template you can use to quickly set up ESS.
directory: sitedir/modules/search
search.html
modules/search/templates/search.html
search-item.html
modules/search/templates/search-item.html
search.json
modules/search/templates/search.json