Search is all around us
I can’t remember the last time I visited a website that did not have a “Search” box. From Blogs to Wikis, Online Shopping to Social Media; every website can benefit from providing search.
Indeed the very Gateway to the Web and one of the largest Companies in the World; Google, started as just a very powerful Search Engine (other Search Engines are available).
At Strategies we build and host many different types of website, but they all have the same need for a quality and speedy search function. In fact more recently, search has become so fast and useful that it has been used to power almost entire websites with millions of listing pages.
No prizes for guessing our Search Engine of choice, it is of course ElasticSearch. In this article I am going give a brief outline of ElasticSearch, and discuss some of the features that we make the most of here at Strategies.
ElasticSearch – The Elastic Fantastic
ElasticSearch is based on Apache’s Lucene full-text Search Engine, it has this in common with it’s main competitor Apache Solr. Whereas Lucene provides much of the search functionality, ElasticSearch extends this, and most importantly wraps it in a fully distributed and automatically scalable and redundant REST API.
Setting up an ElasticSearch cluster is simple enough, and doesn’t require me to explain it here. Far more worthy of discussion are the features of ElasticSearch, so lets get started.
Total Search Control
In a basic Full-Text search engine your query is likely to specify search terms, and not much else. In ElasticSearch however, you can also specify:
- How the terms are matched (token, prefix or partial match)
- Which fields in the documents are searched
- Which fields are more relevant (“Boosting”)
- Whether newer documents are more relevant (“Decay”)
- Filters (to only search a subset of the data)
- Aggregations (to return metadata over the main search results)
- and much more
You really do have full control over the search, but be careful because that is a Double-Edged Sword. ElasticSearch is very powerful, but I recommend doing a lot of research before committing to use it, if you ask it the wrong query, it might be tricky to understand what has been returned, and it is also possible to send a query that will be slow.
ElasticSearch isn’t the first data source to offer aggregations over it’s data by any means, for example MySQL and MongoDb both offer it. However, in both of those databases aggregations are a seperate query, and require extra time to send and receive. In ElasticSearch, aggregations can be sent alongside the main query, the results arrive with the search result, and there is very little extra overhead. The amount of information you can gather with a single ElasticSearch query (with aggregations) is staggering.
At Strategies we use Aggregations to gather data which we use to populate filters over the search result, for example on a query over.Blog Posts you can return the Authors along with how many posts each Author has in the result.
Percolate Queries are the opposite of standard queries. In a standard query you first store your data in the index, and then you send queries which are tested against the index to see which data matches. In a Percolate Query you store the query in the index, you can then send data in the form of documents; and they are tested to see which queries match.
Don’t see the point? Well, consider a system of Email Alerts. Users store searches, and the system will send out emails when the stored search matches new data. Implementing Email Alerts used to involve re-running each stored search regularly each day to see if any new data matches, but no longer. With Percolate Queries you no longer need to re-run the stored searches, and even better, you can instantly notify a user when matching data is added to the site.
These are far from all of the features of ElasticSearch, if you would like to learn more then I recommend a read of the documentation found at https://elastic.co