Activate 2018 has ended
Back To Schedule
Wednesday, October 17 • 12:05pm - 12:45pm
Query Hundreds of Fields at Scale

Log in to save this to your schedule, view media, leave feedback and see who's attending!

What are the challenges to query hundreds of fields with Lucene?

We present the work we’ve done at Salesforce where every customer can personalize the indexing schema of the structured data.
In this context, controlling access rights while preserving performance is a challenge. When we have to index so many fields separately, and to query some or all of them, what are the problems and solutions for still keeping high performance and controlling the memory consumption?

We explain the context of searching so many fields and the constraints on memory in a highly multi-tenant system.
Then we provide the technical details for the solution we chose:

* A new posting format, which wraps the default one, with a field-virtualization layer, and custom segment writing/merging.
* Optimizing index seeks and scans.
* Caching at different levels.
* A customized MergePolicy/Scheduler.
* Query parser adaptations.

Finally, we provide measures of success. How a query on 100+ fields becomes as fast as 2 times a query on a single aggregated field.

avatar for Yannis Hector

Yannis Hector

Software Engineer, Salesforce
Yannis is a lead software engineer at Salesforce. He joined the Search team in the Grenoble (France) R&D office in 2014. Since then he has deeply dived into Apache Lucene and Solr internals to tackle challenging performance and scalability issues.
avatar for David Smiley

David Smiley

D W Smiley LLC
I'm a Lucene/Solr committer/PMC member. I do search consulting/development work. My particular interests in search are geospatial/spatial.

Wednesday October 17, 2018 12:05pm - 12:45pm EDT
Jarry & Joyce