Elasticsearch and Drupal

Presented by Bec White

Prepared for BADCamp, November 2014

Bec White

Senior Engineer and Team Lead at Palantir.net

@becw on Twitter

What is a search server?

  • Analyzes text content from your site
  • Analyzes queries from your users
  • Finds matches between the two

When is a search server necessary?

  • For full text search
  • When you have lots of content
  • When you want to do something "weird"

Aside: what is "weird"?

  • Listing tens of thousands to millions of nodes
  • Improved sorting
  • Search with different levels of precision
  • Autocomplete or autosuggest
  • Related content based on textual analysis
  • Add spelling suggestions and "more like this" to site search
  • Sharing lightly-structured data across applications

What is Elasticsearch?

  • An open source search server
  • Based on the Lucene search engine (like Solr)
  • Distributed and scalable
  • Speaks JSON

Running Elasticsearch for development

  • Run a local vagrant box (gist)
  • Beware: Elasticsearch itself doesn't do access control

Three stories about search

Site search

Inline lookup widget

Auto-suggest

Site search

Site search

  • Do it because you want fulltext search
  • Use Search API + Elasticsearch Connector
  • Use Solr instead of Elasticsearch for the general case

Out-of-the-box, this isn't actually very good search...

  • Weird result order
  • Terms combined using OR

Tokens

  • Case and punctuation
  • Stemming ("searching" => "search")
  • Stop words ("a", "an", "the", "for")

Inline lookup widget

Why?

Content editors needed to find precise matches among lots of records

How?

  • Search API + Elasticsearch Connector
  • Elasticsearch javascript library
  • Store the rendered result markup in Elasticsearch

Query building

  • Queries vs. filters
  • Combine queries and filters with nesting

In this case...

  • Start with a full text search query
  • Filter by content type and editorial status

Inline lookup widget

  • Do it to find precise results fast
  • Query and filter structure is important
  • Don't forget about controlling access to the search server

Auto suggest

Why?

Search-based navigation for a dataset that is mainly titles.

How?

  • Elasticsearch analyzer configuration

How would you search for "The Dark Knight"?

  • The Dark Knight
  • the dark knight
  • dark knight

Tokenizing titles

  • Standard tokens
  • Shingles
  • NGrams

"The Dark Knight"

  • Tokenized: the, dark, knight
  • Shingled: the dark, the dark knight, dark knight
  • NGrams: ..., d, da, dark, dark k, dark kn, ...

Checking your work

Matching

  • Indexed content turns into lots of progressive tokens
  • Search string must be ONE token

How would you search for "W.C. Fields"?

  • W.C. Fields
  • w. c. fields
  • w c fields
  • wc fields

Auto suggest

  • Do it when users navigate content by title
  • It's all in the analyzer configuration
  • Iterate on matching

In summary...

Use Elasticsearch if you want to do weird things

Understanding tokens, queries, and filters will improve your matching

Search feature development benefits from iteration

References

Palantir.net

Let's make something good together

Keep tabs on our work at @Palantir

Want to hear about what we're doing?