Loading…
Activate 2018 has ended

Log in to bookmark your favorites and sync them to your phone or calendar.

Monday, October 15
 

9:00am

Pre-Conference Training - Day 1
Learn more about our Pre-Conference Training offerings here

Monday October 15, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Advanced Solr Development
Course Description:
This course will deep-dive into advanced querying capabilities such as geospatial, phrase, and function queries; tuning relevance to meet specific domain requirements; and the Solr architecture. At the end of this course, you will be able to build sophisticated, horizontally-scalable search applications.

Best For:
Developers who plan to build a search platform from scratch with Solr. (Those who plan to use Lucidworks Fusion should instead attend the Fusion courses.)

Pre-Requisite:
All attendees must first take the Solr Foundations course or have at least 1 year of experience working with Apache Solr.

Monday October 15, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Foundations of Lucidworks Fusion
Course Description:
Come learn about how Fusion 4 can help maximize your data accessibility. Explore the latest capabilities of Fusion, Lucidworks’ flexible and user-friendly search application development platform. Fusion is built with an open core of Apache Solr and Spark and enhanced with additional features that accelerate the creation and deployment of modern search applications. You will learn the fundamental concepts of working with Fusion, including ingestion and indexing, query and syntax, and a review of multi-node topologies. By understanding the full extent of the platform’s distributed capabilities, you will be able to construct and optimize sophisticated search and data-driven applications.

Best for:
This course is the starting point for everyone looking to learn anything about Fusion. Those who have worked with Fusion 2.x or 3.x will learn about the newest features in Fusion 4. Architects, developers, and business users who are working on or considering a Fusion-based solution implementation will benefit from this course.

Pre-Requisite:
None. Apache Solr or search application experience is not required. Experience with command line tools is helpful.

Monday October 15, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Lucidworks Fusion AI
Course Description:
The Lucidworks Fusion AI training is the right choice for those interested in taking Fusion applications to the next level. This course explores how you can enrich your data through techniques such as automatic classification, document clustering, anomaly detection, and natural language processing, and make it available to your applications. You will also learn how to use techniques such as query intent detection, signals and recommendations, automatic synonym generation, and automatic mis-spelling corrections in order to deliver the most personalized, contextual information and insights using the enriched data.

Best For:
Architects and developers who are working on or considering a Fusion-based solution and are planning to implement Fusion’s machine learning capabilities.

Pre-Requisite:
This training will introduce advanced Fusion 4 topics and requires all attendees first take the Lucidworks Fusion Foundations course.

Monday October 15, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Lucidworks Fusion for Advanced Search Application Developers
Course Description:
Learn how to deliver powerful, intuitive search experiences at large internet scales. Expanding on topics introduced in the Foundations of Lucidworks Fusion class, this course will teach you how to calibrate your search apps for optimal relevance. Explore Fusion’s experimentation framework and the use of signals and telemetry to enhance user experiences and allow for in-depth usage analysis. You will also be introduced to Fusion SQL, which enables self-serve analytics at massive scale.

Best For:
Architects and developers who are working on or considering a Fusion-based solution and want to fine-tune their Fusion applications to significantly improve search and user experience.

Pre-Requisite:
All attendees must first take the Foundations of Lucidworks Fusion course.


Monday October 15, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Solr Foundations
This course introduces Apache Solr, the most widely used open source search engine. You will learn fundamental search concepts and the key role that Solr plays in the modern data ecosystem. At the end of this course, you’ll be able to index and query data to and from Solr, and build basic search applications.

Monday October 15, 2018 9:00am - 5:00pm
TBA
 
Tuesday, October 16
 

9:00am

Pre-Conference Training - Day 2
Learn more about our Pre-Conference Training offerings here

Tuesday October 16, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Advanced Solr Development
Course Description:
This course will deep-dive into advanced querying capabilities such as geospatial, phrase, and function queries; tuning relevance to meet specific domain requirements; and the Solr architecture. At the end of this course, you will be able to build sophisticated, horizontally-scalable search applications.

Best For:
Developers who plan to build a search platform from scratch with Solr. (Those who plan to use Lucidworks Fusion should instead attend the Fusion courses.)

Pre-Requisite:
All attendees must first take the Solr Foundations course or have at least 1 year of experience working with Apache Solr.

Tuesday October 16, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Lucidworks Fusion AI
Course Description:
The Lucidworks Fusion AI training is the right choice for those interested in taking Fusion applications to the next level. This course explores how you can enrich your data through techniques such as automatic classification, document clustering, anomaly detection, and natural language processing, and make it available to your applications. You will also learn how to use techniques such as query intent detection, signals and recommendations, automatic synonym generation, and automatic mis-spelling corrections in order to deliver the most personalized, contextual information and insights using the enriched data.

Best For:
Architects and developers who are working on or considering a Fusion-based solution and are planning to implement Fusion’s machine learning capabilities.

Pre-Requisite:
This training will introduce advanced Fusion 4 topics and requires all attendees first take the Lucidworks Fusion Foundations course.

Tuesday October 16, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Lucidworks Fusion App Studio
Course Description:
Learn how to connect people with data by building beautiful, modern, front-end applications using App Studio. With this intuitive toolkit, full stack developers with limited UI experience can easily build custom applications in days rather than months. In this course, you will learn a powerful markup language that encapsulates commonly used UX patterns, and see how it can be used to effectively present insights across multiple use cases and verticals. You will also explore wizard-driven templates that provide starter applications for use cases such as site search and enterprise search within minutes and learn how to tailor them to your needs.

Best For:
Architects and developers who are working on or considering a Fusion-based solution and want to considerably reduce development time to build a custom front-end application.

Pre-Requisite:
All attendees must first take the Foundations of Lucidworks Fusion course. Experience with UI development is helpful, but not required.

Tuesday October 16, 2018 9:00am - 5:00pm
TBA

9:00am

Pre-Conference Training: Lucidworks Fusion for Advanced Search Application Developers
Course Description:
Learn how to deliver powerful, intuitive search experiences at large internet scales. Expanding on topics introduced in the Foundations of Lucidworks Fusion class, this course will teach you how to calibrate your search apps for optimal relevance. Explore Fusion’s experimentation framework and the use of signals and telemetry to enhance user experiences and allow for in-depth usage analysis. You will also be introduced to Fusion SQL, which enables self-serve analytics at massive scale.

Best For:
Architects and developers who are working on or considering a Fusion-based solution and want to fine-tune their Fusion applications to significantly improve search and user experience.

Pre-Requisite:
All attendees must first take the Foundations of Lucidworks Fusion course.

Tuesday October 16, 2018 9:00am - 5:00pm
TBA

5:00pm

Welcome Reception
Tuesday October 16, 2018 5:00pm - 7:00pm
Level 4 Foyer
 
Wednesday, October 17
 

8:00am

Breakfast & Registration
Wednesday October 17, 2018 8:00am - 9:00am

9:00am

Welcome and Opening Remarks
Will joined Lucidworks in 2013 as Chief Product Officer and was appointed CEO in 2014. He has over 15 years of product, marketing, and business development experience. Prior to Lucidworks he was head of technical business development for Splunk, where he was responsible for defining the company’s market category and key product feature sets. He created and led the company’s global partner program, building an ecosystem of consultants, developers, resellers, system integrators, service providers, and technology partners. Earlier in his career, Hayes served as a software engineer at Genentech.

Wednesday October 17, 2018 9:00am - 10:00am
Salle de bal

10:00am

Keynote: Deep Learning for AI
There has been much progress in AI thanks to advances in deep learning in recent years, especially in areas such as computer vision, speech recognition, natural language processing, playing games, robotics, machine translation, etc. This presentation aims at introducing some of the core concepts and motivations behind deep learning and representation learning. Deep learning builds on many of the ideas introduced decades earlier with the connectionist approach to machine learning, inspired by the brain. These essential early contributions include the notion of distributed representation and the back-propagation algorithm for training multi-layer neural networks, but also the architecture of recurrent neural networks and convolutional neural networks. In addition to the substantial increase in computing power and dataset sizes, many modern additions have contributed to the recent successes. Thanks to soft-attention mechanisms neural nets have moved from pattern recognition devices working on vectors to general-purpose differentiable modular machines which can handle arbitrary data structures. The talk will end with a discussion of some major open problems for AI which are at the forefront of research in deep learning and reinforcement learning.

Speakers
avatar for Dr. Yoshua Bengio

Dr. Yoshua Bengio

Scientific Director | Professor, MILA | University of Montreal
Yoshua Bengio is Full Professor of the Department of Computer Science and Operations Research, at University of Montreal, head of the Montreal Institute for Learning Algorithms (MILA), CIFAR Program co-director of the CIFAR program on Learning in Machines and Brains, and  Cana... Read More →


Wednesday October 17, 2018 10:00am - 10:45am
Salle de bal

10:45am

AM Break
Wednesday October 17, 2018 10:45am - 11:15am

11:15am

Strategic Value from Enterprise Search and Insights
Enterprise search is becoming a required tool for employees of large corporations to help allow them to find relevant information quickly so that they can execute their daily job functions.  Saving time looking for information translates into increased productivity and revenue.  This session will cover the Enterprise Search value proposition, provide a Strategy and Roadmap, discuss the journey to building the product, and demos of the Enterprise Search Product, Chatbot, and Machine Learning reading comprehension model.

Speakers
avatar for Viren Patel

Viren Patel

Director - Chief Data Office - Enterprise Search, PricewaterhouseCoopers
Viren Patel works within PwC’s Chief Data Office to utilize data assets and technology to create strategic value and competitive advantage. He has worked with data and analytics for over 16 years delivering consulting services to fortune 500 clients. Currently he is the product... Read More →


Wednesday October 17, 2018 11:15am - 11:35am
Salon 1

11:15am

Learning to Rank: From Theory to Production
Learning to Rank is awesome. Even more awesome is the fact that Apache Solr/Lucene is the first open source search engine that can do it out of the box. But all that is for nought if you don't hunt down the necessary features, make it interoperate with all the other functionality, and do this fast enough on a production system for such ranking to be feasible.

This talk, by the engineers at Bloomberg who built this functionality into Solr in the first place, is a war story of how the company's real-time, low-latency news search engine was tamed to learn how to rank. Join us on a journey that will teach you how to take your LtR system to your clients, and more importantly, the many ways not to do it. There will be drama, excitement, and despair (and even Gandalf, if you pay attention)! Now grab that popcorn...

Speakers
avatar for Diego Ceccarelli

Diego Ceccarelli

Software Engineer, Bloomberg
Diego is a Software Engineer at Bloomberg LP, working in the News Search R&D team. His work focuses on improving search relevance for financial news. Before joining Bloomberg, Diego was a researcher in Information Retrieval at the National Council of Research in Italy, whilst completing... Read More →
avatar for Malvina Josephidou

Malvina Josephidou

Software Engineer, Bloomberg
I am a software engineer in the News R&D team at Bloomberg. My work focuses on using machine learning to improve search relevance and discoverability on the news search engine. Previous to this I completed my PhD at the University of Cambridge in statistics and computational biology... Read More →


Wednesday October 17, 2018 11:15am - 11:55am
Drummond East

11:15am

Content Analytics Studio – The Visualization, Machine Learning and Application Workflow Tool which Kicks Kibana’s Butt – For Solr!
One of the common knocks on Solr as a platform is that it is lacking a visualization tool which is as clean and as powerful as Kibana for Elasticsearch. Previous attempts at a solution have involved forking Kibana, with predictable results, namely nice solutions that are quickly out-of-date. This session will present a third alternative, “Content Analytics Studio” and will show how such tools can elevate themselves from simple visualization to become end-to-end Machine Learning + Business Integration systems. This encourages the use of Solr in a much wider range of applications and use-cases, and more thoroughly integrates it into real, day-to-day business processes to drive substantially larger ROI opportunities.

Speakers
avatar for Andrés Aguilar Umaña

Andrés Aguilar Umaña

Software Developer, Accenture
Andres Aguilar is a software developer with more than 6 years of experience working in search related projects with Search Technologies now part of Accenture Digital. He is the lead developer on the Content Analytics Studio initiative, but have also focused on content processing tools... Read More →


Wednesday October 17, 2018 11:15am - 11:55am
Jarry & Joyce

11:15am

Solr on Kubernetes
Enterprise applications have seen an evolution over the years in terms of deployments from bare metal to virtual machines (VMs) and now to containers. In our experience over the years, as we moved from bare metals to VMs to containers, we have experienced increased flexibility and cost optimization for a slight performance degradation as a trade-off.

Even though Docker is the most widely accepted technology for containerization, container orchestration was, until recently, an open problem. With Kubernetes finally winning the container orchestration wars, we have moved some of our Solr clusters to Kubernetes based infrastructure in production.

In this talk, we present our learnings about caveats & pitfalls in moving from VMs to Kubernetes in the following broad areas: (a) storage, (b) fault tolerance, (c) scaling, (d) network performance. Our talk would be followed up with a brief demo on using Solr on Kubernetes with scaling.

Speakers
avatar for Apoorv Bhawsar

Apoorv Bhawsar

Search Engineer, Unbxd


Wednesday October 17, 2018 11:15am - 11:55am
Drummond West

11:15am

Image-Based E-Commerce Product Discovery: A Deep Learning Case Study
To further improve discoverability of Macy’s product catalog online we introduced an easy shopping experience for finding products which are hard to describe using text-based search. The feature allows customers to use an existing product on the website and find all matching products which are similar to the original product based on its image. Deep-learning algorithms are applied to product images to provide an experience. To enhance it further a new component is developed to evaluate the signals other that image similarity, such as product attribute similarity, customer’s shopping preferences, business rules etc. Current talk gives some insights on implementation and overall feature architecture.

Speakers
avatar for Peter Gazaryan

Peter Gazaryan

Senior Architect, Search & Browse, Macy's
Peter joined Macy’s in 2013 as a Technical Solution Manager. In his current role Peter is leading the projects in integrating the different software systems providing search and browse functionality to macys.com customers. Peter received his Master degree in the Electrical Engineering from South-Russian State Technical University... Read More →
avatar for Denis Kamotsky

Denis Kamotsky

Principal Engineer, Macy's
Denis Kamotsky joined Macy’s in 2001 at the time of the company's first Java-based web site launch and thereafter has been actively contributing in multiple areas of macys.com development as an engineer and a solution architect. Since early 2011 Denis has been leading a team responsible... Read More →


Wednesday October 17, 2018 11:15am - 11:55am
Salon 6&7

11:40am

Overcoming Obstacles: Implementing Search in an Era of Strong Cybersecurity and Federal Data Center Consolidation
How CAPE Overcame Obstacles to Implementing Search​ in a Government (DoD)/ Enterprise Environment

-- Procurement. Industry business models can change on a dime, but government procurement is a big ship to turn for​ ​course correction. In recent years, the license + life cycle replacement model changed to a subscription model. ​CAPE ran into an obstacle with Contracting Officers and Specialists neither aware of nor prepared for that change.

-- Access Control. In the days before Federal Data Center Consolidation and the rise of Enterprise Service Providers, it was a straightforward matter to gain access across the network to authenticate users, determine their access rights, and match those up to permissions. As the Enterprise establishes its dominion to meet compliance regulations, the design of network domains can introduce obstacles to accessing resources needed for security trimming and access control.

-- Connecting the Dots with Enterprise. In golf, if your shot is in danger of hitting another group you yell "fore!" Centralization and de-centralization of computing resources have their pros and cons. In a centralized model, enterprise level requirements impact a large user community. Sudden obstacles can jump out unannounced when new policies and patches are pushed out ​(with or ​without fair warning​).

Speakers
avatar for Phyllis Kolmus

Phyllis Kolmus

OSD Programs & AT&T Information Management and Analysis Group, Deputy Group Director & Contractor Lead
Phyllis Kolmus leads AT&T’s Information Management & Analysis Group, a $50M program supporting the OSD Cost Assessment and Program Evaluation (OSD/CAPE). She runs a 40-person team of software engineers and defense/IT analysts delivering a wide range of technology and business solutions... Read More →


Wednesday October 17, 2018 11:40am - 12:00pm
Salon 1

12:05pm

The SAS Search Journey: Using AI to Move from Google to Lucidworks
In 2016, SAS began a journey to move from the Google Search Appliance and Ultraseek to SOLR.  This talk will describe, at a high level, how SAS rebuilt their Enterprise Search experience for a global audience, integrating localization and boosting relevancy, while implementing a cross-datacenter infrastructure that runs Lucidworks Fusion. With a team of three, SAS has created a robust search tool for both our Intranet and Internet global sites.

Speakers
avatar for Alex Flynn

Alex Flynn

Senior Manager, IT Operations, SAS
Alex currently manages Cloud Operations and Infrastructure Engineering teams that are responsible for the design and support of Enterprise Class (internal and external) Applications, Web Services, Identity Management (IAM), Productivity Tools, Enterprise Search (SOLR), and related... Read More →


Wednesday October 17, 2018 12:05pm - 12:25pm
Salon 1

12:05pm

Cybersecurity with Apache Metron and Apache Solr
Cybersecurity is all about drowning in data and not having enough people to keep up. Criminal organizations and nation states have huge resources and collaborate on the attacking side, while traditional systems keep slow silos and short memories. Apache Metron takes a big data, open source community and data science-centric approach to give security analysts a fighting chance of keeping up. With behavior profiling as part of the real-time stream, a unique ability to slit windowed analytics over long periods of time, Metron also provides a platform for security data science, and deep, relevant, personalization of security response and advanced detection. Apache SOLR plays a key role as the backend for the Metron SOC Analyst and SOC investigator dashboards for thread triage, workflow management, and visualization. The talk features a demonstration of Apache Metron in which Lucidworks Fusion application logs will be ingested in real-time and will be analyzed for anomalous behavior by Fusion users.

Speakers
avatar for Ward Bekker

Ward Bekker

Pre-Sales Solutions Engineer II, Hortonworks
Ward Bekker - Solutions Engineer Hortonworks & Apache Metron Contributor.
avatar for Scott Cote

Scott Cote

Senior Software Engineer, Lucidworks
Scott Cote is a data science evangelist and open source promotor who organized DFW Data Science - a 2800+ member user group focused on promoting knowledge sharing, opportunity, and growth for the Dallas/Ft. Worth Community. During the day, he works as a Senior Software Engineer for... Read More →


Wednesday October 17, 2018 12:05pm - 12:45pm
Drummond East

12:05pm

Identifying Parts of an E-commerce Query on Target.com Using Search Logs
We present here a neural network model, which in contrast to the earlier dictionary based models, uses search logs to tag various parts of user query. Our current work is focused on brand, color, price, item type, gender, age, dimension.

Query tags help the downstream models for search relevancy to perform better when supplemented with these entity tags along with the query. These are helpful in efficiently applying filters and facets a priori while serving search results. Ex: white flower skirts for a 3 year old girl $50. If identified properly, we can switch on brand filters: (A New Day), gender: girl, ge: 3 years, price: around $50, color: white, item type: skirt, and pattern: flowered. This improves the precision of results and aids user experience by providing a query based filtering approach.

The Model Architecture:
Our training data corpus is click-through logs from Target's digital search. Training on textual data presents us with data sparsity challenge. We present here how we tackled the problem of using distributed word representations (w2v, glove), and present observed results on the appropriate datasets for generating the distributed word representations. We use bi-directional LSTMs to model word sequences in a query and add mix in signals from dictionary search results to train the model. We discuss the results obtained from various objective functions that were used as loss metric for back-propagation. We observed good accuracies and modeling of phrases, and disambiguation of words between various tags.

Speakers
avatar for Vijayender Reddy Karnaty

Vijayender Reddy Karnaty

Senior Software Engineer, Target Corporation
Vijayender is an Engineer at Target, enabling relevancy for the e-commerce search platform. He has worked on platforms and pipelines built with Spark and Tensorflow to process query logs to meaningful insights. Prior to this has been working in networking industry and did his bachelors... Read More →
avatar for Vidhya Sundaram

Vidhya Sundaram

Senior Engineering Manager, Target
Vidhya is currently heading Relevance Search for Target.com, trying to create impact for Target through Search. Managing Solr & AI powered systems to spearhead the change.


Wednesday October 17, 2018 12:05pm - 12:45pm
Salon 4&5

12:05pm

Cross Data Center Replication Options - A Practical Guide to CDCR
Implementing Solr in an Entrprise Data Center often includes a requirement to support multiple data centers in the solution. While this provides redundancy and High Availability, it also significantly complicates the implementation since identical data must be replicated from one data center to another.

There are several ways that data replication in Solr can be accomplished, and they all have their own pros and cons. This talk will discuss those options, what the merits of each are, and when each may or may not be appropriate.

Speakers
avatar for Patrick Hoeffel

Patrick Hoeffel

Lucidworks Fusion Consultant, Polaris Alpha
Patrick Hoeffel is a Software Engineering Manager at Polaris Alpha. A veteran of commercial software solutions for almost 30 years, Patrick has been involved products ranging from Online Services to early Internet Startups to Enterprise Applications to Military Intelligence. He has... Read More →


Wednesday October 17, 2018 12:05pm - 12:45pm
Drummond West

12:05pm

Query Hundreds of Fields at Scale
What are the challenges to query hundreds of fields with Lucene?

We present the work we’ve done at Salesforce where every customer can personalize the indexing schema of the structured data.
In this context, controlling access rights while preserving performance is a challenge. When we have to index so many fields separately, and to query some or all of them, what are the problems and solutions for still keeping high performance and controlling the memory consumption?

We explain the context of searching so many fields and the constraints on memory in a highly multi-tenant system.
Then we provide the technical details for the solution we chose:


* A new posting format, which wraps the default one, with a field-virtualization layer, and custom segment writing/merging.
* Optimizing index seeks and scans.
* Caching at different levels.
* A customized MergePolicy/Scheduler.
* Query parser adaptations.
   

Finally, we provide measures of success. How a query on 100+ fields becomes as fast as 2 times a query on a single aggregated field.

Speakers
avatar for Yannis Hector

Yannis Hector

Software Engineer, Salesforce
Yannis is a lead software engineer at Salesforce. He joined the Search team in the Grenoble (France) R&D office in 2014. Since then he has deeply dived into Apache Lucene and Solr internals to tackle challenging performance and scalability issues.
avatar for David Smiley

David Smiley

D W Smiley LLC
I'm a Lucene/Solr committer/PMC member. I do search consulting/development work. My particular interests in search are geospatial/spatial.


Wednesday October 17, 2018 12:05pm - 12:45pm
Jarry & Joyce

12:05pm

Making Reddit Search Relevant and Scalable
In the past year, we've rebuilt Search at Reddit to be more scalable while improving our relevancy. In this session, we will talk about problems we've faced and  solutions we've implemented to improve the search experience on Reddit for users to easily discover the communities and posts they're looking for. We'll also dive into using user click signals to get the insight into what users are looking for, what is the most engaging areas for our users, and how we increase time-on-site through relevant content discovery. We'll also talk about challenges we've faced in scaling our Solr cluster to the 350M+ users that visit Reddit every month.

Speakers
avatar for Jerry Bao

Jerry Bao

Software Engineer, Reddit
Jerry has been working on search for over two years, with expertise in managing and scaling search infrastructure to millions of queries per day. He spends his off-time flying small aircraft and traveling the world.
avatar for Anupama Joshi

Anupama Joshi

Senior Engineering Manager, Reddit
Anupama Manages search @ Reddit from ingestion to results and infra to ranking.


Wednesday October 17, 2018 12:05pm - 12:45pm
Salon 6&7

12:45pm

Lunch
Wednesday October 17, 2018 12:45pm - 2:15pm

2:15pm

How eCommerce Leaders are Gaining Competitive Advantage with Machine Learning: The Key Use Cases You Need to Know
Speakers
avatar for Kevin Vondemkamp

Kevin Vondemkamp

Vice President – Web, Social & eCommerce, Appen
Kevin Vondemkamp is a senior executive with extensive domestic & international experience in Sales, Marketing & Business Development for both start-ups and major corporations. In his current role at Appen, he works with leading global technology companies to improve their machine... Read More →


Wednesday October 17, 2018 2:15pm - 2:35pm
Salon 1

2:15pm

Practical End-to-End Learning to Rank Using Fusion
Learning-to-rank (LTR) is a powerful technique which utilizes supervised machine learning to address the problem of search relevancy. While recent versions of Solr include an LTR component, there are still significant practical barriers to using LTR. This talk will demonstrate both the engineering and the data science necessary to build a production-grade, end-to-end LTR system on a real world dataset.

The talk is divided into three parts: First, I will show how to set up, configure, and train a simple LTR model using both Fusion and Solr. Secondly, I will demonstrate how to include more complex features and show improvement in model accuracy, in an iterative workflow that is typical in data science. Particular emphasis will be given to best practices around utilizing time-sensitive user-generated signals. Lastly, I will explore some of the tradeoffs between engineering and data science, as well as Solr querying/indexing strategies (sidecar indexes, payloads) to effectively deploy a model that is both production-grade and accurate.

Speakers
avatar for Andy Liu

Andy Liu

Senior Data Engineer, Lucidworks
Andy Liu is Senior Data Engineer at Lucidworks, where he researches and builds next-generation capabilities for the Fusion platform and works with clients to operationalize machine learning. He has spent the last 10+ years building products at the intersection of big data, search... Read More →


Wednesday October 17, 2018 2:15pm - 2:55pm
Salon 4&5

2:15pm

The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep Learning
We will present the integration of MyRobotLab and Solr to power the InMoov robot. The InMoov robot is the worlds first life size humanoid 3D printed open source robot. The InMoov robot was designed by french sculptor Gael Langevin and MyRobotLab was started by software developer Greg Perry.

Speakers
avatar for Kevin Watters

Kevin Watters

Founder, KMW Technology
Kevin is a long time user of and contributor to Solr. He has been running a small search engine professional services firm in Boston called KMW Technology.


Wednesday October 17, 2018 2:15pm - 2:55pm
Drummond East

2:15pm

ROI: Return On Information – Mapping the Evolutionary Path of Economic Value
Humans have built tools for tens of thousands of years – some of these tools have been used for good and some for evil. As technologists, we continue to build tools to advance the human race. On the cutting edge of these developments is the way we handle information.
 
In this session, we’ll explore general technological and sociological trends that affect our relationship with information. Today information is distributed so rapidly that the line between the written word and the spoken word is blurred. We’ve translated the way we write into a new language based on conversational speech. In the enterprise, these changes are affecting the way we do business, especially in terms of information retrieval and governance. So in this session, we’ll look at what lies ahead with new search technologies how we can enable extraordinary gains for organizations and the people who work there by building great software tools.

Speakers
avatar for Frederic Bourget

Frederic Bourget

CTO, Netgovern
As the CTO of NetGovern, Frédéric takes care of product vision, marketing, and overall business strategy. After completing studies in engineering physics at Laval University, Frédéric started his career at Matrox, where he was part of defining one of the first small business Internet... Read More →


Wednesday October 17, 2018 2:15pm - 2:55pm
Jarry & Joyce

2:15pm

JSON in Solr: From Top to Bottom
These days, JSON is a popular format. Solr supported JSON for data input and output formats for very long time, but - recently - support for JSON has increased significantly for other purposes. In fact, sometimes using JSON is a better choice than the traditional approach. Yet, for those who do want to use JSON as much as possible, the documentation is scattered and many examples are minimalistic. In this session, we will review comprehensively all the different ways one can work with Solr using JSON alone, using a custom-built example. We will also discuss where JSON is not yet an option.

Speakers
avatar for Alexandre Rafalovitch

Alexandre Rafalovitch

Website Officer, United Nations
Alexandre is a full-stack IT specialist with more than 20 years of industry and non-profit experience, including in Java, C# and HTML/CSS/JavaScript. He develops projects on Windows, Mac and Linux. Alexandre is an Apache Lucene/Solr committer since August 2016 and chooses to focus... Read More →


Wednesday October 17, 2018 2:15pm - 2:55pm
Drummond West

2:15pm

Using Opinion Mining and Sentiment Analysis to Discover Hidden Product Features for E-Commerce Search
Customers often search for very specific product features that are difficult to discover from only the supplier-provided product information. For example, how do you rank products for customers searching for an “easy to clean rug” or a “comfortable to sit on couch.” To help e-commerce search engines evolve and embrace this challenge, we need to look beyond the product catalog to improve query understanding and product ranking. In this session we will show how to address this problem by leveraging customer feedback datasets such as product reviews. We will take a deep dive into some natural language processing techniques such as opinion mining, sentiment analysis to discover, refine and rank useful insights from reviews, apply sentiment models to label mined data and then show how to integrate this information into Solr in order to surface the right products in search results. We will also discuss how this work improved conversion rate for some obscure customer searches on Wayfair.com.

Speakers
avatar for John Castillo

John Castillo

Software Engineer, Wayfair
John’s areas of interest include: automation, performance and scalability, predictive search, natural language processing (NLP), and information retrieval. They work together on improving search relevance by applying NLP techniques to discover meaningful product features from user... Read More →
avatar for Suyash Sonawane

Suyash Sonawane

Senior Software Engineer, Wayfair
Suyash Sonawane is a Software Engineer at Wayfair--the fastest growing online destination for all things home. Suyash is passionate about solving the unique problem of developing, scaling, and maintaining systems that answer millions of furniture specific queries per day. Suyash works... Read More →


Wednesday October 17, 2018 2:15pm - 2:55pm
Salon 6&7

2:40pm

Measuring ROI on Enterprise Search
When it comes to measuring ROI, an eCommerce implementation makes things easy: conversion rates, time spent in a shopping cart, or perhaps repeat visits and purchases. But what about an Enterprise Search solution, such as a company intranet or public-facing resource utility? Without purchases, adds to cart, auto-ship signups, etc., how do we measure the investment we make in Enterprise Search with so many qualitative, "soft" factors? In this session, you will discover ways to chart the return on your investment in Enterprise Search, not only with the smiles on the faces of your end-users, but with quantitative evidence you can use to prove your investment was worthwhile.

Speakers
avatar for John Lenker

John Lenker

Senior Sales Engineer, Lucidworks
John Lenker is an IT professional with 20 years of experience as a developer, consultant, and engineer, working both for and with the world's largest and most successful companies across a wide spectrum of industries, such as transportation, legal, energy, telecommunications, insurance... Read More →


Wednesday October 17, 2018 2:40pm - 3:00pm
Salon 1

3:05pm

Transforming our Enterprise Search Experience
With Intel's recent enterprise search upgrade, the team deployed a new scalable platform and a much improved user experience. The team worked with a design agency and their internal user experience group to deliver a new UI design driven by employee feedback and usability tests. They leveraged App Studio to deliver a clean, simple UI that met brand guidelines and facilitated the search needs of employees worldwide. In this session you'll learn how Intel's small search team transformed the search user experience and their plans to continually improve the overall experience with ML/AI capabilities.

Speakers
avatar for Ryan Gale

Ryan Gale

Senior Program Manager, Intel Corporation
Ryan is a senior program manager with 15 years experience delivering global digital solutions. Ryan has been leading search projects at Intel for the last 4 years, launching a new platform for Intel.com in 2016 and in June of this year he led the enterprise search transition from GSA to Solr. Ryan also has an eye for design and believes enterprise technology solutions need to deliver "consumer-grade" user experiences... Read More →


Wednesday October 17, 2018 3:05pm - 3:25pm
Salon 1

3:05pm

The Neural Search Frontier
Is search the next industry to be revolutionized by deep learning? Lately, researchers have been applying neural networks to search applications with impressive gains. Search users use different language than what's contained in the corpus. For example, doctors create articles discussing jargon like 'myocardial infarction' but patients search use lay-terms like 'heart attack.' Mapping vocabularies using expert created taxonomies or word embeddings (word2vec, LDA, etc) can help. Manual approaches can take a great amount of work, or don't map between searcher and document vocabulary. When clear associations between relevant documents and queries can be made, neural search can learn the patterns between query and document language embeddings, with tremendous gains on text search. Such embeddings can also be used to provide alternative representations of the user queries in order to better capture the user intents.

Join Doug Turnbull, author of 'Relevant Search', as we explore this promising frontier. Is it a silver bullet? What are the pros and cons? And how can it fit into your search infrastructure using Solr, Elasticsearch, or Lucene?

Speakers
avatar for Doug Turnbull

Doug Turnbull

Chief Technical Officer, OpenSource Connections
Search relevance consultant. Author of Relevant Search. Doug crafts search/recommendation solutions that “get” users. To do this, Doug uses Solr, sprinkling a little natural language processing and machine learning on top for good measure. Through writing and speaking Doug wants... Read More →


Wednesday October 17, 2018 3:05pm - 3:45pm
Salon 4&5

3:05pm

Activating your Data, with a Faster Path to Results
Many organizations face mixed success with their forays into the world of big data by prematurely investing in big data technology before they truly understand their business goals and underlying information strategy.   Loading up a data lake up with huge amounts of data does not necessarily speed the path to results and outcomes.  

Join Commvault to discuss an alternate method to accelerate value using your existing data with Commvault Activate, SOLR and Lucidworks Fusion, BEFORE you do the heavy investment in the traditional big data technologies:
  • Envision the problem and the outcomes
  • Leverage data you’ve already collected through backup, archive and analysis against data sources directly
  • Retain your information governance policies and access controls
  • Search, profile, source and test data that is germane to business outcomes
  • Model insights and prove value from the content index
  • Drive earlier results and buy-in from project stakeholders

Speakers
avatar for Aaron Murphy

Aaron Murphy

Senior Director, Office of CTO, Commvault
Aaron Murphy is a global business development leader. With a background in information governance, search, business intelligence and content analytics, Aaron specializes in incubating and launching innovative data-centric market offerings.  Comfortable in conversations at all levels... Read More →


Wednesday October 17, 2018 3:05pm - 3:45pm
Jarry & Joyce

3:05pm

Applied Mathematical Modeling with Apache Solr
This session will explore the practical applications of the mathematical models available in Apache Solr's math expressions library. The models explored will include: linear and non-linear regression, logistic regression, curve fitting, probability distributions and monte carlo simulations, and time series analysis. The session will cover how the different types of models can be used to analyze data, make predictions, and find anomalies within the data.

Speakers
avatar for Joel Bernstein

Joel Bernstein

Senior Data Engineer, Lucidworks
Joel is an active Lucene/Solr committer and PMC member. His primary focus in recent years has been in developing the streaming, SQL and advanced mathematic features in Solr. Joel is also a principal software engineer on the search team at Alfresco Software where he designs and implements... Read More →


Wednesday October 17, 2018 3:05pm - 3:45pm
Drummond East

3:05pm

Security in Apache Solr
According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years. With decreasing storage costs, and the fear of missing out on important data, this period is only supposed to get shorter, translating to an even higher data growth. With as much data being stored for being searched, it’s almost obvious that it resides on multi-tenant systems, which talk to each other. This makes it important to think about and build security mechanisms both within and around the search engine, the platform that holds all of the data.

Security could mean different things to different use cases or users. It may mean providing basic authentication and authorization in a closed network, but has the added requirement of document level filtering when running in a multi-tenant setup with shared collections. An often overlooked aspect of when thinking about securing Solr is it’s communication with other satellite systems like log aggregators, monitoring systems etc.. These systems need special consideration when setting up Solr with security.

While setting up and running a search system at scale isn’t a trivial task to begin with, it can be orders of magnitude more complicated when doing it with security enabled but when hosting critical data, there is really no other way out. Insecure systems are unstable for certain datasets, or at the very least much less desirable. In addition, it’s a good thing to have when hosting multi-tenant setups.

Understanding the basic nuances of what security should mean when we talk about Solr is as important as knowing how to set it up to ensure that the data can only be accessed by trusted systems or users. This talk will highlight what securing Solr generally means, what are the available nuts and bolts that are shipped with Solr, and what else is needed to have a secure setup of Solr. It would also give options of things that can be punted upon or deferred until later, which are good to have in certain cases but not as essential while specifically talking about the parts that can not be missed. At the end of this talk, the attendees would have a much better understanding of security in Solr, components that are needed, and concepts that are integral for running a secure setup of Solr.

Speakers
avatar for Anshum Gupta

Anshum Gupta

Lucene/Solr Committer, Apple
Anshum Gupta is a software engineer at Apple and an Apache Lucene/Solr committer and PMC member with over 10 years of experience with search, and related technologies. He started dabbling with Lucene over 10 years ago and since then has worked at various organizations. Prior to joining... Read More →


Wednesday October 17, 2018 3:05pm - 3:45pm
Drummond West

3:05pm

Ticket Search By Voice and NLP at Stubhub
This session presents how voice search has enabled fans to discover not only events but also tickets at StubHub. Using AI for entity detection, entity disambiguation, ranking and understanding unstructured data, we are able to provide a rich and powerful experience. Combining homegrown NLP and external NLP libraries and services, we were able to productize SolrCloud and AI capabilities to discover best value tickets for a price range and seat location.

We will go over design and architecture of NLP system and query classifier to identify NLP vs normal search query. Voice search using mobile devices is a fast-growing driver of traffic for stubhub and e-commerce site search. We will share how it is helping fans to find right seats in shortest time and hence improve conversion, when they have a choice of many traditional filters to find tickets by attributes like price, quantity, venue section and seat location for aisle or handicap.

Speakers
avatar for William Yu

William Yu

Software Engineer, StubHub
William Yu is a Software Engineer at StubHub who has been with the Search team for 1 year. He has 2+ years of experience in developing Java applications. At StubHub, he is works on query understanding and improving the search and suggestion experiences, and was also involved with... Read More →
avatar for Charles Zhang

Charles Zhang

Senior Engineering Manager Search and Catalog, StubHub
Charles Zhang is an senior engineering manager for Search and Catalog team at StubHub. He has 20+ years of experience in developing Java applications using open-source technology. At StubHub, he has design and architect distributed systems for processing large volumes of data and... Read More →


Wednesday October 17, 2018 3:05pm - 3:45pm
Salon 6&7

3:30pm

Using Search to Elevate Customer Experience at Moody's Analytics
Speakers
avatar for Muthu Periaswamy

Muthu Periaswamy

Director, Moody's Analytics
Muthu Periaswamy is a Director at Moody’s Analytics (MA) and leads platform strategy for the content solutions business that includes our marquee products CreditView2.0 and RDS. Muthu also built and managed the MIS Commercial Group’s foundational cloud based application portfolio... Read More →


Wednesday October 17, 2018 3:30pm - 3:50pm
Salon 1

3:45pm

PM Break
Wednesday October 17, 2018 3:45pm - 4:15pm

4:15pm

Why We Picked Lucidworks and How They Improved Our Customer Experiences
Main Takeaways: 1. High level comparison with other vendors
2. Importance of AI features for future growth
3. Customer impacts

Speakers
avatar for Aneil Singh

Aneil Singh

VP, Security & Technology Research, Igloo Software
Been supporting and building SaaS solutions since year 2000. Research, hire and manage outsourced development teams for many years world wide. Design and support and manager IT focusing on server environments and deploying new solutions. Executive focused on solutions, problem solving... Read More →


Wednesday October 17, 2018 4:15pm - 4:35pm
Salon 1

4:15pm

Automatically Build Solr Synonyms List Using Machine Learning
Synonyms list plays an important part for search. However, it usually take a long time to detect and maintain synonyms by the search or ontology group in a company. In this talk, we will discuss how to automatically detect synonyms from user click data and compare with popular methods such as word2vec (which is able to find related word but not nessisarily interchangeable for search purposes). We will also demo how to generate those analytical results and use them to improve search relevancy by a system, which combines the power of Solr with the power of a fast distributed compute engine like Apache Spark, to bring data science into production.

Speakers
avatar for Chao Han

Chao Han

VP of Research, Lucidworks
Chao is a data scientist with over 10 years of analytical experience in both academia and industry. She got a PHD in Statistics from Virginia Tech in 2012 (with 8 publications). After graduation, she worked at JPMorgan Chase R&D supporting projects in the areas of transaction text... Read More →


Wednesday October 17, 2018 4:15pm - 4:55pm
Salon 4&5

4:15pm

Journey of Search: What Every Enterprise Must Have for Search Success
Join us for an informative panel discussion on the “Journey of Search”. Search experts take a deep dive into what makes enterprise search a success now and in the future.  We’ll share the benefits of an effective business case, utilizing Artificial Intelligence, implementing tools for success and the importance of utilizing technical support. Hear the valuable perspective of the technical team on how to drive a successful deployment and the advantages of the Lucidworks Partner Program.
 

Speakers
avatar for Chris Cook

Chris Cook

Senior Deployment Engineer, Onix
Chris has 30 years experience in the Information Technology industry. The past 10 years have been focused specifically on search in the enterprise. He brings with him a vast array of industry knowledge and real-world experience pertaining to technology deployments, networking, security... Read More →
avatar for Ryan Donnelly

Ryan Donnelly

Enterprise Account Manager, Onix
Ryan has 5 years experience in the enterprise search space. He specializes in helping organizations identify high-value use cases for search applications. He works with 50+ customers daily across all industries and verticals to support their ever-evolving needs as it relates to quickly... Read More →
avatar for Jose Pagan

Jose Pagan

Senior Project Manager, Onix
Jose has over 20 years of project management and project deployment experience within various vertical markets that span from manufacturing, education, the financial sector, healthcare, and technology. He specializes in resource allocation, various project management methodologies... Read More →
avatar for Daisy Urfer

Daisy Urfer

Partner Alliance Manager, Onix
Daisy Urfer has 6 years of experience specializing in Enterprise Search and channel partner relations. She specializes in the utilization of Value Added Partner programs to drive success for all organizations.


Wednesday October 17, 2018 4:15pm - 4:55pm
Jarry & Joyce

4:15pm

Embracing Diversity: Searching over Multiple Languages
Although a lot of online content is written in English there're tons of non English users out there that still need to retrieve information. When searching, especially for tech related topics, it's common to compose queries in English; however for such users search results written in their own native language may be preferred.

We'll see how statistical machine translation tools can help in the above scenario to perform text translation at query time, resulting in an improved recall and precision for the search engine queries.

We'll look at how cross language information retrieval can be implemented on top of Apache Solr with the help of a Neural machine translation toolkit and also leverage Pointer-Generator Networks to summarize the retrieved and translated results from different sources.

The audience will gain a better understanding of how to be able to make search queries against a multilingual corpora indexed into Apache Solr and being able to retrieve all of the relevant search results in different languages.

Speakers
avatar for Suneel Marthi

Suneel Marthi

AWS
Suneel is a Member of Apache Software Foundation and is a Committer and PMC on Apache Mahout, Apache OpenNLP, Apache Streams. He's presented in the past at Flink Forward, Hadoop Summit, Berlin Buzzwords, Machine Learning Conference, Big Data Tech Warsaw and Apache Big Data.
avatar for Jeff Zemerick

Jeff Zemerick

Cloud Architect, Mountain Fog
Jeff is a software engineer and cloud architect. Heis a committer and PMC on Apache OpenNLP. Jeff currently works onnatural language processing pipeline projects and resides outside ofMorgantown, WV.


Wednesday October 17, 2018 4:15pm - 4:55pm
Drummond East

4:15pm

Query-time Nonparametric Regression with Temporally Bounded Models
Discussion and demonstration of an architecture that knits several pieces of Solr’s infrastructure together, with further detail into Solr’s new Time Routed Aliases (TRAs). The system is a machine learning system based on a non-parametric regression methodology taken from habitat ecology. The model is partially pre-calculated and stored in Solr so that it can can be assembled on the fly to recommend what documents a user may be interested in based on recent data. The definition of “recent” is defined by a Solr filter query. Solr TRAs are used to help scale and sunset old data from the system. Technologies discussed in this talk include predictive modeling, Solr streaming expressions, indexing with JesterJ, and Solr Time Routed Aliases (TRAs). The latter half of this presentation goes into some depth regarding TRAs,. TRAs are useful for avoiding performance degradation due to index growth in systems based on continuously acquired timestamped data (similar to the system presented). Both presenters helped build Solr’s TRA capability.

Speakers
avatar for Patrick Heck

Patrick Heck

Owner, Needham Software LLC
Patrick (Gus) Heck is the Owner of Needham Software LLC and has been solving search problems since 2010, been an independent Solr Consultant since 2012, and a frequent contributor to the Apache Solr project since 2013.
avatar for David Smiley

David Smiley

D W Smiley LLC
I'm a Lucene/Solr committer/PMC member. I do search consulting/development work. My particular interests in search are geospatial/spatial.


Wednesday October 17, 2018 4:15pm - 4:55pm
Drummond West

4:15pm

Shape of the Cloud: Search Infrastructure as Code
Setting up initial infrastructure and making changes to it can be challenging, especially for devs without much ops knowledge. Treating infrastructure as code empowers developers to cut through the fear of the unknown. Thanks to Terraform and other config management tools it’s no longer difficult or time-consuming to deploy fully-operational Fusion/Solr clusters. It also enables version control over infrastructure changes, just like over app code changes.
At NRHL we utilize Terraform, Chef, CircleCI and Docker to build out our autoscaling search clusters (Fusion/Solr/Zookeeper) in AWS. This talk will will be about our journey at Nordstromrack | Hautelook of moving from SOLR6 to Fusion/SOLR7 as well as to using Terraform and treating Infrastructure as Code. We will also briefly cover how we gradually introduced the new setup to our end users using A/B testing, measuring and tuning. We hope to provide an overview for developers on how tools like these can be used and hopefully start making sense of what otherwise be a cryptic process.

Speakers
VV

Vasily Volkov

NordstromRack | Hautelook
avatar for Akasha Yi

Akasha Yi

Senior Platform Engineer, Nordstromrack | Hautelook
Akasha is a Senior Platform Engineer at Nordstromrack | Hautelook and will be presenting the talks on Terrafrom and Automation. She has a strong system engineering background and has worked for a large CDN provider in the past.


Wednesday October 17, 2018 4:15pm - 4:55pm
Salon 6&7

5:05pm

Building Self-Aware Machines
Speakers
avatar for Kord Campbell

Kord Campbell

Lucidworks
Kord is a developer marketing consultant for Lucidworks. In between founding Loggly, a log search service, and Grub, the distributed web crawler, Kord was Splunk's evangelist, where he worked on various developer centric learning programs. In his free time, Kord may be found tinkering... Read More →


Wednesday October 17, 2018 5:05pm - 5:45pm
Jarry & Joyce

5:05pm

How to Build a Semantic Search System
Building a semantic search system - one that can correctly parse and interpret end-user intent and return the ideal results for users’ queries - is not an easy task. It requires semantically parsing the terms, phrases, and structure within queries, disambiguating polysemous terms, correcting misspellings, expanding to conceptually synonymous or related concepts, and rewriting queries in a way that maps the correct interpretation of each end user’s query into the ideal representation of features and weights that will return the best results for that user. Not only that, but the above must often be done within the confines of a very specific domain - ripe with its own jargon and linguistic and conceptual nuances.

This talk will walk through the anatomy of a semantic search system and how each of the pieces described above fit together to deliver a final solution. We'll leverage several recently-released capabilities in Apache Solr (the Semantic Knowledge Graph, Solr Text Tagger, Statistical Phrase Identifier) and Lucidworks Fusion (query log mining, misspelling job, word2vec job, query pipelines, relevancy experiment backtesting) to show you an end-to-end working Semantic Search system that can automatically learn the nuances of any domain and deliver a substantially more relevant search experience.

Speakers
avatar for Trey Grainger

Trey Grainger

SVP of Engineering, Lucidworks
Trey is the SVP of Engineering at Lucidworks, where he leads their engineering efforts around Lucidworks Fusion, Apache Lucene/Solr, and their other open source and commercial offerings. Trey is also the co-author of the book Solr in Action, as well as a published researcher and frequent... Read More →


Wednesday October 17, 2018 5:05pm - 5:45pm
Salon 4&5

5:05pm

Inside the Black Box: How Does a Neural Network Understand Names?
A cornerstone of customer relationship management, chatbot analytics, and research automation systems, Named Entity Recognition (NER) is a key commercial application of Natural Language Processing (NLP). State of the art approaches to NER are purely data driven, leveraging deep neural networks to identify named entity mentions—such as people, organizations, and locations—in lakes of text data. In this talk, I will present our latest research on NER and provide real-life examples of how we are applying these cutting-edge techniques to ten different languages, including Spanish, English, Arabic, Persian, Korean, and Japanese. We'll look at accuracy, speed, and memory footprint, while comparing some of the best known deep architectures with a basic statistical approach. I will focus on the interpretation of the network, when assigned to learn names across many languages.

We’ll start with a detailed description of our neural architecture for NER, which is based on a generic Long Short-Term Memory (LSTM) implementation, a specific flavour of recurrent neural network for sequence tagging. We encode word as well as letter embeddings as a single neural pipeline. Our decoder is based on Conditional Random Fields (CRF), leveraging label distributions from across the entire input text. We will then look into the internal network activation values, on different input conditions, with a special focus on highly inflected languages. Our latest findings show key neurons that get activated for different linguistic aspects.

Speakers
avatar for Philip Blair

Philip Blair

Senior Research Engineer, Basis Technology
Philip Blair is a Senior Research Engineer on Basis Technology’s R&D team. He investigates practical applications of deep learning technologies for use in text analytics. Philip also leads Basis Technology’s machine learning infrastructure team, focused on deploying cutting-edge... Read More →


Wednesday October 17, 2018 5:05pm - 5:45pm
Salon 1

5:05pm

Building Analytics Applications with Streaming Expressions in Apache Solr
Effective real-time analysis and visualization of collected and correlated data to get insights is the high need for businesses. Streaming Expressions introduced in Apache Solr v 6.0 provides powerful stream language for Solrcloud.

This session will begin with challenges faced in building near-real-time analytics applications on large datasets. We introduce Streaming Expressions in Apache Solr, discuss the concept and key components it is built upon briefly. The session moves on to discuss various real-life use-cases build on top of Streaming Expressions with statistical functions available in latest versions, along with their performance complexity. The session concludes with listing newest Streaming Expressions being added in the recent versions.

Speakers
avatar for Amrit Sarkar

Amrit Sarkar

Engineer, Lucidworks
Amrit Sarkar is Search Engineer and Consultant at Lucidworks Inc, California-based enterprise search technology company, with 3+ years experience in search domain and big data, e-commerce and product.LinkedIn: https://www.linkedin.com/in/sarkaramrit2Blog: https://www.medium.com... Read More →


Wednesday October 17, 2018 5:05pm - 5:45pm
Drummond West

5:05pm

Fundamental Linguistics for Search Applications – Why Are Linguistics Important?
One important difference between search applications and traditional database applications is the linguistic processing of search engines.

We will explain why linguistics are so important for every search, be it for Solr or Fusion. We will walk through basic approaches - both on the content processing and the query side - and will present search-specific challenges for some of the most important languages based on real world examples.

This allows for providing recommendations on how to successfully deliver a search project where multi-language support is key. Besides methods that are available with open source components, we will give an outlook on integrating commercial linguistic offerings based on a real, multi-language project implementation.

Speakers
avatar for Bastian Mathes

Bastian Mathes

Project Manager / Solution Engineer, Raytion GmbH
Bastian Mathes is a solution engineer and project manager at Raytion focusing on delivering search solutions. He has hands on experience with several open-source and commercial search stacks, NLP Tools and accompanying libraries and components. Bastian delivered search solution in... Read More →
avatar for Christian Vogt

Christian Vogt

Senior Consultant, VP Service Delivery, Raytion
Christian Vogt is Senior Consultant and VP Service Delivery at Raytion, an internationally operating consultancy for information management focused on enterprise search. . With about 15 years of experience in search-related projects he supports customers transforming business needs... Read More →


Wednesday October 17, 2018 5:05pm - 5:45pm
Drummond East

5:05pm

5:55pm

AI In Practice: the Good, the Bad and the Ugly (Panel Discussion)
AI is definitely the future, but is it ready for your organization? Is your organization ready for AI?
Grant Ingersoll, CTO and founder Lucidworks, assembles some of the best minds and takes a real-world look at AI to go beyond the hype to answer honest and thoughtful questions.

Sure, they can enumerate successes -- but Grant wants to know about when things don’t go quite right. What can be learned from their less than stellar experiences?  When do you need AI – and when do other options suffice? And if you are going to embark, how do you assemble the right people to help transform your company? Can you retrain those you have? Or do you need to start from scratch?

Joining Grant to answer these and other questions are:
•    Daniel Tunkelang – Self, High-Class Consultant
•    Kavita Ganesan – GitHub Inc., Senior Data Scientist
•    Josh Wills – Slack, Search, Learning, and Intelligence Engineer
•    Anupama Joshi – Reddit, Senior Engineering Manager


Speakers
avatar for Kavita Ganesan

Kavita Ganesan

Senior Data Scientist, GitHub Inc.
Kavita Ganesan is a Machine Learning and NLP Data Scientist at Github. She was previously at 3M Health Information Systems where much of her work was on Clinical Text Mining. At Github, Kavita helped launch the first Machine Learning and Natural Language Processing pipeline with the... Read More →
avatar for Grant Ingersoll

Grant Ingersoll

CTO, Lucidworks
Grant Ingersoll is the CTO and co-founder of Lucidworks as well as an active member of the Apache Lucene community – a Lucene and Solr committer, and co-founder of the Apache Mahout machine learning project. He is also the lead author of “Taming Text” from Manning Publications... Read More →
avatar for Anupama Joshi

Anupama Joshi

Senior Engineering Manager, Reddit
Anupama Manages search @ Reddit from ingestion to results and infra to ranking.
avatar for Daniel Tunkelang

Daniel Tunkelang

High-Class Consultant, Self
Daniel Tunkelang is a high-class consultant. In addition to working with leading retailers to improve their ecommerce search, Daniel has helped companies like Apple, Cisco, eBay, Elsevier, Etsy, Flipkart, Pinterest, Salesforce, and Yelp address some of their search challenges. He... Read More →
avatar for Josh Wills

Josh Wills

Software Engineer, Slack
Josh Wills is a software engineer at who has worked on Slack's configuration and experimentation system, search infrastructure, and data infrastructure. He is a recovering manager, having formerly led the data engineering team at Slack and the data science team at Cloudera, a member... Read More →


Wednesday October 17, 2018 5:55pm - 7:00pm
Salle de bal

7:30pm

Conference Party at 1909 Taverne Moderne
Wednesday October 17, 2018 7:30pm - 9:30pm
 
Thursday, October 18
 

8:00am

Breakfast
Thursday October 18, 2018 8:00am - 9:00am

9:10am

Fireside Chat with Beena Ammanath: Injecting Moral Code Into AI
As the AI revolution continues to accelerate and new AI products are developed to solve key problems faced by consumers, businesses and the world at large. In the very near future, almost all new technology will incorporate some form of AI, driving the human machine engagement to unimaginable heights. As our reliance on AI deepens, many far-reaching ethical issues will arise - affecting everyone, including public citizens, small businesses utilizing AI or entrepreneurs developing the latest AI technology. We will discuss the moral code of AI, how we can solve some of the world's largest problems with AI and being human in the age of AI.

Speakers
avatar for Beena Ammanath

Beena Ammanath

Global Vice President | Founder & CEO, HPE | Humans For AI
Beena is an award winning senior digital transformation leader with extensive global experience in Artificial Intelligence, big data, and IoT. Her knowledge spans across e-commerce, financial, marketing, telecom, retail, software products, services and industrial domains with companies... Read More →


Thursday October 18, 2018 9:10am - 10:10am
Salle de bal

10:10am

AM Break
Thursday October 18, 2018 10:10am - 10:30am

10:30am

Vectors in Search – Towards More Semantic Matching
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then I will describe a few different techniques for efficiently searching vector-based representations in an inverted index, such as learning sparse representations of vectors, clustering, and learning binary vectors. Finally, I will discuss some of the pitfalls of vector-based search, and how to get the best of both worlds by combining vector-based scoring with traditional relevancy metrics such as BM25.

Speakers
avatar for Simon Hughes

Simon Hughes

Chief Data Scientist, Dice.com
Simon is currently the Chief Data Scientist at Dice.com, the technology professional recruiting site. He is also a PhD candidate at DePaul university, studying a PhD in machine learning and natural language processing. At Dice, he has developed multiple recommender engines for matching... Read More →


Thursday October 18, 2018 10:30am - 11:10am
Drummond East, Level 3

10:30am

Intro to Lucidworks Fusion
This is an interactive session for the ultimate Fusion newcomer. We will walk through launching a Fusion app in less than 30 seconds, crawling a website, transforming and loading our data into a Fusion collection, and doing some pretty impressive query-side magic. It is highly recommended for attendees to bring a laptop with internet access to this session.

Speakers
avatar for Kord Campbell

Kord Campbell

Lucidworks
Kord is a developer marketing consultant for Lucidworks. In between founding Loggly, a log search service, and Grub, the distributed web crawler, Kord was Splunk's evangelist, where he worked on various developer centric learning programs. In his free time, Kord may be found tinkering... Read More →
avatar for Esther Quansah

Esther Quansah

Solutions Engineer, Lucidworks
Esther Quansah is a Solutions Engineer at Lucidworks. She works on the Customer Success team ensuring that Lucidworks customers are fully adopting and leveraging the most powerful features of Fusion which align to solve their complex problems.


Thursday October 18, 2018 10:30am - 11:10am
Jarry & Joyce

10:30am

Lucene/Solr 8: The Next Major Release
Lucene and Solr 8 will be released near the end of 2018. This session will explore new features, performance improvements and breaking changes.

Speakers
avatar for Steve Rowe

Steve Rowe

Senior Software Engineer, Lucidworks
Steve Rowe is a Member of the Apache Software Foundation, and a committer and PMC member on the Lucene/Solr Project. Prior to joining Lucidworks in 2012, he spent 10 years working on NLP as a Research Software Engineer at the Center for Natural Language Processing at Syracuse Uni... Read More →


Thursday October 18, 2018 10:30am - 11:10am
Drummond West

10:30am

Speeding Up Lucene with GPUs
To scale Solr indexes to hundreds of millions of documents (up to billions of documents) with as optimal search performance as possible, the general advice is to leverage multi-core parallelism as much as possible. In other words, more CPU processor cores/threads ensure better search throughput and latency of queries. With commodity GPUs capable of running thousands of parallel threads and having as high as 12GBs of fast device memory, the possibility of offloading certain workloads (querying or indexing) seems tempting. However, GPU memory and threads are very different in nature than system memory and CPU threads, and mixing them is not straightforward.

Query scoring seems like a very compelling candidate for GPU offloading, as complex function queries or complex re-ranking (LTR) of queries might be slow on CPU. Through our initial benchmarks on the geonames dataset (11M data points), scoring documents using the haversine distance from a given query point, we observed a 10X improvement when performed on a GPU as against a regular CPU based Solr query. Similarly, we saw sub-second performance on scoring of 100M results on a regular gaming GPU.

In this talk, we wish to present (a) the results of our initial experiments, (b) integration into Lucene, (c) licensing challenges, and (d) ideas for future explorations. (Reference: https://issues.apache.org/jira/browse/LUCENE-7745)

Speakers
avatar for Ishan Chattopadhyaya

Ishan Chattopadhyaya

Committer & PMC Member, Lucene/Solr, Unbxd


Thursday October 18, 2018 10:30am - 11:10am
Salon 4&5, Level 2

10:30am

How Does the USA Today Network Provide Its Readers With Meaningful Content?
The 110 million monthly consumers of the USA Today Network's 109 sites and mobile apps not only expect a certain level of freshness to their content, but a sense for what the community as a whole is consuming. While some of these problems can be solved with algorithms such as collaborative filtering, the short shelf life of news can sometimes yield less than desirable results. Whether it is the My Topics subscriptions, the content backfill problem, or a slew of other discovery problems, the USA Today Network leverages Solr to solve most of them.

In this presentation, we will go over how the USA Today Network is efficiently serving its users with content that is both fresh, popular, and relevant to the community through smarter content backfill and the My Topics feature.

Speakers
avatar for Devansh Dhutia

Devansh Dhutia

Development Manager, USA Today Network
Devansh is a Dev. Manager at the USA Today Network on the Platform Engineering team. He has been actively involved in improving content discovery both within the network as well as syndication with external vendors. When Devansh isn't working or spending quality time with his family... Read More →


Thursday October 18, 2018 10:30am - 11:10am
Salon 6&7

11:20am

Enabling Opinion Driven Decision Making
People use opinions of other people for all sorts of decision making tasks from which mobile device to purchase to which hotel to stay at by reading hundreds of opinions on the Web. Such an abundance of opinions while useful, can become overwhelming due to its sheer volume. This talk will walk you through how you can transform large amounts of user reviews into valuable information to support consumer decision making with search and summarization technologies.

Speakers
avatar for Kavita Ganesan

Kavita Ganesan

Senior Data Scientist, GitHub Inc.
Kavita Ganesan is a Machine Learning and NLP Data Scientist at Github. She was previously at 3M Health Information Systems where much of her work was on Clinical Text Mining. At Github, Kavita helped launch the first Machine Learning and Natural Language Processing pipeline with the... Read More →


Thursday October 18, 2018 11:20am - 12:00pm
Salon 4&5

11:20am

Lessons from the Field: Common Mistakes Made Deploying a Search App with Fusion
Starting your new search application? Enhancing the old one? Join Solution Architects Michael and Josh for a discussion on the 'dos and don'ts' of deploying a search application utilizing the Lucidworks Fusion stack. From AI to UI, we will cover a range of topics to help define what drives an engaging user interface coupled with a stellar search relevancy experience.

Speakers
avatar for Josh Goldstein

Josh Goldstein

Solution Architect, Lucidworks
avatar for Michael Hunn

Michael Hunn

Solution Architect, Lucidworks


Thursday October 18, 2018 11:20am - 12:00pm
Salon 1

11:20am

Challenges of Simple Documents: When Basic isn't so Basic
Since 2017, when the Solr Reference Guide became a set of static HTML pages hosted on the Solr website, the inability to search the Guide has been a major loss in functionality. In trying to resolve that gap, the Lucene/Solr community has wrestled with many of the same questions users face when implementing Solr.

Looking at only one aspect of the problem - indexing the content - the Guide seems an astonishingly simple content set: there are less than 300 HTML pages, and they are reasonably well-structured. We could even make them more structured if we want. It's easy, right?

Well, maybe it isn't. When choosing how to index documents, we must consider both internal and external factors: what the content really makes available to us, what we may need to add during the indexing process to improve the user experience, what our users expect from the experience, and the realities of how we'll maintain the index as we add new content.

Using a series of demos indexing the Guide, we'll explore the benefits and trade-offs of using the options available with Solr and Fusion, and consider how even the most basic content set can present projects with implementation challenges.

Speakers
avatar for Cassandra Targett

Cassandra Targett

Director of Engineering, Lucidworks
Cassandra has 20 years experience in search and knowledge management. She has been a Lucene/Solr committer since 2013 and a member of the PMC since 2016. As Director of Engineering at Lucidworks, she manages the Solr and partner development teams.


Thursday October 18, 2018 11:20am - 12:00pm
Drummond East

11:20am

SolrJ: Power and Pitfalls
Anyone building a non-trivial search application needs a client for making requests to Solr. SolrJ is the community supported answer to that demand, and is the most complete Solr client currently available. Knowing how to use it effectively is crucial to unlocking Solr’s potential at your company. This talk will be a technical deep-dive on this often overlooked part of Solr, covering its history (briefly), current state, and future direction. Listeners will hear answers to questions like:

- What’s included in SolrJ?
- Where does it shine, and where does it fall short?
- What APIs and options does SolrJ support? Which ones still need coverage?
- What best practices are there for using SolrJ?
- How does SolrJ measure up against other clients out there?
- What’s changed in SolrJ recently, and what directions will SolrJ likely be heading in the future?

This talk aims to have something for attendees of all experience levels. Beginners will leave knowing enough to create a simple search app using SolrJ. Intermediate users should learn some tips for debugging SolrJ applications, and some common problem areas. Advanced users will leave knowing how to contribute back to SolrJ, improving it for future users.

Speakers
avatar for Jason Gerlowski

Jason Gerlowski

Solr Integrations Engineer, Lucidworks
Jason has been working on Search since 2013 through projects at Vivisimo, IBM Watson, and at Lucidworks.  He began working on Apache Solr in 2015 and was designated a Lucene/Solr committer this past year.  He currently works for Lucidworks where he's excited about making Solr easier... Read More →


Thursday October 18, 2018 11:20am - 12:00pm
Drummond West

11:20am

Empowering Customers to Self Solve - A Findability Journey
With a growing product portfolio and increasing support volume, it is essential to adopt automation to scale support delivery. Red Hat constantly evaluates how to empower customers to self-solve using search and by building tools for engineers to resolve cases faster. Using self-solve rate and time to close(TTC) as primary KPIs will determine success. In this session we will cover the evolution of different search techniques in our Solution Engine and the customers’ search journey. We will identify the challenges to provide an accurate and relevant solution for customer issues before opening a support case. We will dive into the query parsing for human vs machine generated data, relevancy model for the wide array of products and evaluation aspects. We will also describe how adopting ML classification techniques helped improve language detection and faster case routing to the specialists. This session will give insights into how to leverage these techniques to promote customer self solving behavior and findability by using search and machine learning.

Speakers
avatar for Manikandan Sivanesan

Manikandan Sivanesan

Team Lead, Red Hat
Manikandan Sivanesan leads the Customer Platform Search team at Red Hat which provides the platform for several integrated applications like Customer Portal, Solution Engine, Container Catalog, internal case search. He works on a number of search areas like textual analysis, distributed... Read More →
avatar for Rutvij Vyas

Rutvij Vyas

Senior Software Engineer, Red Hat


Thursday October 18, 2018 11:20am - 12:00pm
Salon 6&7

11:20am

Solr Under the Hood at S&P Global
This talk will cover:

- An introduction to Solr and searching functionalities overall
- Use case of Solr implementation at S&P Global
- How we benefitted from Solr vs RDBMS full text search etc.
- Adoption and moving towards SolrCloud for better leverage of cloud based features and enhancements

Speakers
avatar for Sumit Vadhera

Sumit Vadhera

Senior Manager (DPS Team), S&P Global
Around 12 years of experience into Big data technologies and RDBMS(Oracle and MySQL)....Have worked on SOLR(3,4,5,6) and Vertica and Cassandra and Hadoop(HW and Cloudera) and postgreSQL and AWS(Dyanomdb,aurora,redshift) etc


Thursday October 18, 2018 11:20am - 12:00pm
Jarry & Joyce

12:00pm

Lunch
Thursday October 18, 2018 12:00pm - 1:15pm

1:15pm

Deep Learning for Unified Personalized Search and Recommendations
One of the really nice things about modernl neural network architectures is that they are easily capable of incorporating many different heterogenous sources as inputs. In this talk, we'll go over how to create a ranking model, trained on user-identified clicks, which can learn a pointwise ranking function from (user_id, query_string, document snippet) tuples, which takes into account the users' past query/click history to personalize the ranking to their preferences.

The modeling will be described in Keras (with python), and the runtime examples will use Tensorflow's Java API to allow easier integration with a Solr LTR plugin.

Speakers
avatar for Jake Mannix

Jake Mannix

Chief Data Engineer, Lucidworks
Jake Mannix is the Chief Data Engineer at Lucidworks. Before joining Lucidworks, Jake worked on the Semantic Scholar project at the Allen Institute for Artificial Intelligence, and prior to that was tech lead for Twitter’s data science and data engineering teams, building both the... Read More →


Thursday October 18, 2018 1:15pm - 1:55pm
Salon 4&5

1:15pm

Building a Fast and Powerful Search App with Lucidworks Site Search
Lucidworks Site Search is an embeddable, easy-to-configure, out-of-the-box site search solution that runs anywhere. It is a fully functional search application. Once you have configured your data and interface, just point users to the URL we provide. Site Search can be deployed on-prem, in the cloud, or on hybrid architectures so you can choose the deployment model that best fits your security and operational requirements.

Speakers
avatar for Josh Ellinger

Josh Ellinger

Senior UX Engineeer, Lucidworks
Josh is a Senior UX Engineer at Lucidworks, he also works on Lucidworks Fusion. In the past he has worked on complex problems such as building a tool to fact check the internet.
avatar for Andrew Thanalertvisuti

Andrew Thanalertvisuti

Solutions Architecture, Lucidworks
Andrew is a Solutions Architect at Lucidworks, where he has developed the Banana (a fork of Kibana) open-source project to visualize data in Solr. He has been working on visualization projects and analytics solutions across different teams at Lucidworks. Currently, he is working on... Read More →


Thursday October 18, 2018 1:15pm - 1:55pm
Salon 1

1:15pm

How SolrCloud Solved Recovery Issues
This talk is about long lasted recovery issues in SolrCloud (SOLR-9555, SOLR-7065, etc) and how Solr 7.3 solved this problem with a totally new/safer design.

Speakers
avatar for Dat Cao Manh

Dat Cao Manh

Software Engineer, Lucidworks
Lucene/Solr Committer and PMC member spent a lot of time into improving SolrCloud, including recovery, search and indexing


Thursday October 18, 2018 1:15pm - 1:55pm
Drummond East

1:15pm

State of the JSON Facet API
This talk will cover recent developments in Solr's JSON Facet API, including distributed refinement, field collapsing, parent-child document operations, and domain changes. We'll also cover some internals, including how to develop a new faceted aggregation. Finally, there will be a discussion of features on the horizon related to faceting.

Speakers
avatar for Yonik Seeley

Yonik Seeley

Search Engineer, Cloudera
Yonik Seeley is the creator of Solr. He works at Cloudera integrating and leveraging "Big Search" technologies into their advanced platform for machine learning and analytics. Yonik was a co-founder of LucidWorks, and he holds a master's degree in computer science from Stanford U... Read More →


Thursday October 18, 2018 1:15pm - 1:55pm
Drummond West

1:15pm

100 Billion Documents And Counting: Rebuilding Message Search at Slack
Slack kicked off a project in 2017 to migrate our message search index from an old-school Solr 4 cluster to the latest and greatest release of SolrCloud. It took us about a year to get the new system launched, and we learned a ton about Solr along the way. Come listen to our stories, learn from our experiences, laugh at our mistakes, and see what we have in store for Slack search on Solr in the years to come.

Speakers
avatar for John Gallagher

John Gallagher

Software Engineer, Slack
John Gallagher is a Software Engineer on the Search Infrastructure at Slack, where he is focused on search quality and data scalability.  Prior to Slack, John worked on search and infrastructure teams at Foursquare and researched concurrent systems.
avatar for Josh Wills

Josh Wills

Software Engineer, Slack
Josh Wills is a software engineer at who has worked on Slack's configuration and experimentation system, search infrastructure, and data infrastructure. He is a recovering manager, having formerly led the data engineering team at Slack and the data science team at Cloudera, a member... Read More →


Thursday October 18, 2018 1:15pm - 1:55pm
Jarry & Joyce

1:15pm

Scaling Box-Search: Gearing up for Petabyte Scale
Search is an integral part of Box. It enables millions of users, across thousands of enterprises, to find relevant content. At its core, this is powered by Solr clusters across multiple data-centers, hosting hundreds of terabytes of sharded inverted index. Each day we are ingesting millions of files which causes this index to grow at a staggering pace. At this rate we will soon reach a petabyte scale search index that still needs to support near realtime indexing, low latency queries, high-availability and multi-tenancy.

In this session we will talk about some of the key scalability challenges we have been facing at Box and how we have addressed them by implementing a dynamic sharding scheme using Key Range Partitioning and Bin-packing. We will discuss the high level architecture of this new system, including some key aspects of it such as load-balancing, fault tolerance and failure-recovery. Finally we will share some lessons we have learnt in the process of building this system on top of Solr, which will hopefully help others who intend to undertake similar endeavors.

Speakers
avatar for Shubhro Roy

Shubhro Roy

Senior Software Engineer, Box
Shubhro enjoy working with data, be it indexing, mining or analyzing it. Currently he is part of the Search team at Box, building infrastructure components that enable millions of users to find relevant content. Prior to Box, Shubhro worked on full text database search at Oracle... Read More →
avatar for Anthony Urbanowicz

Anthony Urbanowicz

Staff Engineer, Box
Anthony Urbanowicz is a staff engineer at Box. Before Box, Anthony worked at multiple startups to multinationals, including Microsoft, Uber and a multi winter stint teaching snowboarding. As a relevance engineer at Microsoft, Anthony worked on Bing's speller, search results ranking... Read More →


Thursday October 18, 2018 1:15pm - 1:55pm
Salon 6&7

2:05pm

Entity Extraction for Product Searches
A user looking for “awesome smartphone 2018” is likely really after “+review:awesome +category:smartphone +release_date:2018”. A clever use of (e)dismax might get us pretty close to where we want, but it’s not real query understanding. There are other ways, of course, like training a model that will, based on the keyword, guess which field it’s looking into. In this session, we’ll discuss some of the ways, their pros and cons and how you’d implement them on top of Solr. We’ll specifically look into existing open-source tools that you can re-use in order to build such a system.

Speakers
avatar for Radu Gheorghe

Radu Gheorghe

Search Consultant & Software Engineer, Sematext Group, Inc.
Radu Gheorghe is a search consultant, software engineer and trainer at Sematext, working mainly with Solr, Elasticsearch and logging-related projects.
avatar for Rafał Kuć

Rafał Kuć

Software Engineer, Sematext Group, Inc.
Rafał, in his professional life is a Sematext trainer, consultant and a software engineer, http://solr.pl co-founder and the Solr Cookbook and Elasticsearch Server books author. In his personal life Rafał is a father and a husband.


Thursday October 18, 2018 2:05pm - 2:45pm
Salon 4&5

2:05pm

Generating Faster ML Predictions in Fusion
Lucidworks Fusion is the enterprise platform for intelligent search and analytics built on top of Apache Solr and Apache Spark. Machine Learning components inside Fusion allow customers to train, test, and serve models in production using query and index pipelines. Serving predictions at query time need a performative engine with very little overhead. In this talk, we will go through the ML model life cycle in Fusion and the techniques we use to generate faster predictions in pipelines.

Speakers
avatar for Kiran Chitturi

Kiran Chitturi

Data Engineer, Lucidworks
Kiran Chitturi is a Data Engineer at Lucidworks. He works on Lucidworks enterprise product Fusion as part of the Smart Data team at Lucidworks working on Analytics and ML features for Fusion.


Thursday October 18, 2018 2:05pm - 2:45pm
Salon 1

2:05pm

SQL Analytics for Search Engineers
Building a modern search application takes more than just tuning queries in Solr. Today's search engineer needs a broad set of tools to aggregate user activity to improve query relevance, generate recommendations, and leverage machine learning models for ranking and content enrichment. In addition, search teams are often asked to integrate diverse data sets into the search experience. At Lucidworks, we've combined the power of Spark SQL and Solr to solve a number of common problems that arise in modern search applications using tried and true SQL. In this talk, I'll show how to use SQL to:

-Aggregate documents in Solr to compute metrics for recommendations and query boosting based on user activity.
-Compute ranking experiment outcomes across variants using SQL.
-Wrangle powerful data to join and index documents from NoSQL databases and other popular big data systems like Cassandra and HBase.
-Self-service analytics with BI tools, such as Tableau / Power BI, using JDBC / SQL.
-Leverage Solr's analytics capabilities, such as facets and streaming expressions, to optimize SQL queries.
-Generate predictions from ML models, such as Spark-NLP, using simple SQL functions.

Attendees will come away from this talk with a solid understanding and examples where they can use SQL to complement their Solr skills in building powerful search experiences.

Speakers
avatar for Tim Potter

Tim Potter

Manager Smart Data at Lucidworks; Apache Solr Committer / PMC, Lucidworks
Timothy Potter is a senior member of the engineering team at Lucidworks and a committer on the Apache Solr project. Previously, Tim was an architect on the Big Data team at a social media analytics company, where he worked on large-scale machine learning, text mining, and social network... Read More →


Thursday October 18, 2018 2:05pm - 2:45pm
Drummond West

2:05pm

Cluster Dynamics in Solr Autoscaling
This talk will show how to better understand Solr Autoscaling framework as a control system. Interesting dynamic behaviors of clusters will be presented and analyzed and how autoscaling affects them. The talk will also present how the new metrics history API and the simulation framework help us to better understand and test the behavior of large dynamic Solr clusters.

Speakers
avatar for Andrzej Białecki

Andrzej Białecki

Senior Software Engineer, Lucidworks
Andrzej Białecki has over 20 years of experience in software engineering, ranging from system integration, to OS development to information retrieval, to standardization of e-commerce models. He’s been actively involved in Open Source since 1997. Currently he’s an Apache Lucene/Solr... Read More →


Thursday October 18, 2018 2:05pm - 2:45pm
Drummond East

2:05pm

Apply Learning to Rank in The Home Depot Type Ahead Service
Type Ahead (TA) is a universal feature that large e-commerce companies use to automatically complete a user’s query based on a prefix. Being the U.S.’s 4th largest e-commerce site; TA service is one of the most important service that delivers optimal user experience.

TA results are often based on previous popularity, conversion rate of the queries. However, such hardcoded rules are not the most optimal. With massive TA usage data being generated each day, we employ a machine learning model to learn the “why” behind what drives users to select the suggested terms from TA. This model learns important features behind each decision ranging from statistical probabilities to semantics and seasonality to past user satisfaction. Our model then re-rank TA suggestions such that only the most relevant results are portrayed to the user, thus creating a seamless customer experience. This talk will showcase the effects of machine learning to create a more powerful and robust online shopping experience.

Speakers
avatar for Rongkai Zhao

Rongkai Zhao

Software Engineering Manager, The Home Depot
Rongkai Zhao is a Software Engineering Manager and Architect at The Home Depot where he oversees the research and development of system components in search, personalization, and call center intelligence. He has worked on e-commerce search engine since 2010 and has a wide range of... Read More →


Thursday October 18, 2018 2:05pm - 2:45pm
Jarry & Joyce

2:05pm

Realtime Solr Analytics and Triage for Video Delivery Workflow
When you are running an ingest and delivery workflow system for millions of video-on-demand assets, you want to make sure you can run analysis in real-time so that operations can explore and triage the assets to troubleshoot.

In this presentation, we’ll discuss our distributed video delivery system and the real-time analytical requirement that we need to meet and the technology decisions we made when designing the system. We'll review lessons that we have learned so that you should watch out for your environment.

We will also discuss how our system has enabled the interactive visual exploration for the real-time statistics before users drill down to the specifics to troubleshoot. After users identify the triage method, how they can enter the knowledge to the rule engines so that triage and repair can be automated.

Speakers
avatar for Julia Li

Julia Li

Senior Engineer, Comcast
Julia Li, Senior Engineer, Comcast Julia is a senior engineer at Comcast. She has 20 years of experience working in telecom industry to build network management and operation systems, integrating relational and big data technologies. Most recently, Julia is focusing on adopting new... Read More →


Thursday October 18, 2018 2:05pm - 2:45pm
Salon 6&7

2:55pm

Enriching Solr with Deep Learning for a Question Answering System
Information Retrieval (IR) based question answering systems have many applications in the real world. Recent advances in DL give us a huge possibility to improve IR apps and engines, and allow us to incorporate systems like chatbots. In this talk we will show our study on comparing traditional ML models vs DL models (both supervised and unsupervised) for different QA tasks such as answer paragraph selection, question-question similarity (FAQ matching) and answer span selection, and discuss the pros and cons of each method. For instance, using modern state-of-the-art DL models is quite expensive and cannot be easily scaled, thus we will present how to leverage Solr's payloads and indexes to improve runtime performance of DL and other ML models.

Speakers
SK

Savva Kolbachev

ObjectStyle
avatar for Sanket Shahane

Sanket Shahane

Research Engineer, Lucidworks
Sanket is the Research Engineer at Lucidworks Inc. passionate about machine learning and search. His focus of work involves researching and developing methodologies to solve complex problems of the search domain like Cold Start problem(in search context), developing Question Answering... Read More →


Thursday October 18, 2018 2:55pm - 3:35pm
Salon 4&5

2:55pm

Fusion on Kubernetes
We will cover the Lucidworks Cloud Platform team's journey of containerizing and orchestrating Fusion via docker and kubernetes. We will discuss lessons learned, architectural approaches, and how to keep up with a rapidly evolving technology. Tech: docker, kubernetes, helm, operators, prometheus, logging

Speakers
avatar for Alan

Alan

Sr. Director of Engineering Services, Lucidworks
Java dev for 20+ years, Engineering Manager for 10
avatar for Joe Streeky

Joe Streeky

Lucidworks


Thursday October 18, 2018 2:55pm - 3:35pm
Salon 1

2:55pm

Autoscaling Suggestions: Simplifying Operations
Speakers
avatar for Varun Thacker

Varun Thacker

Lucene/Solr Committer and PMC member, Lucidworks


Thursday October 18, 2018 2:55pm - 3:35pm
Drummond East

2:55pm

How To Be a Solr Contributor
Every bug fix starts with a bug report.
Every feature starts with an idea.
Every line of code, every page of documentation, every automated test case -- they all exist because of communication and collaboration.

In this session, we'll discuss the ways in which all Solr users (regardless of java know-how) can make meaningful contributions to Solr: helping to diagnose and fix bugs; improving documentation; designing and implementing new features; etc.

Speakers
avatar for Chris Hostetter

Chris Hostetter

Software Engineer, Lucidworks
Chris 'Hoss' Hostetter is a Member of the Apache Software Foundation, and a committer on the Lucene/Solr Project. Prior to joining Lucidworks in 2010 to work full time on Solr development, he spent 11 years as a Principal Software Engineer for CNET Networks thinking about searching... Read More →


Thursday October 18, 2018 2:55pm - 3:35pm
Drummond West

2:55pm

Migration Station - DevOps for Fusion with Version Control and Continuous Integration
How do you move Fusion changes from Test to Production? How can you revert changes if the bits hit the fan?

We have a method that takes the fear out of promoting changes to production.

The typical workflow for developing Fusion is live on the server. When you create a datasource from the “Datasources” screen, it is updated immediately on the server. This makes it easy to work interactively with the search engine. But from the web UI, there is not a way to save these changes to a source file,  and there is no way to move from a test server into production. More importantly, there is not a way to back out problematic changes from production.

We have developed a java based command line interface using Fusion’s Rest API to implement version control and continuous integration. We are specifically using GIT, Bamboo or Jenkins, and Java JAXRS to serialize the Fusion objects. Then we use a Jersey client to call Fusion’s REST API to apply changes, just like the web UI does.

Speakers
avatar for Todd Lack

Todd Lack

Sr. IT Software Developer, SAS


Thursday October 18, 2018 2:55pm - 3:35pm
Jarry & Joyce

2:55pm

Who Moved My State? A Blob Storage Solr Story
Salesforce implemented and runs Solr clusters serving customers whose data volumes and query characteristics are hard to predict. These clusters automatically handle index growth and data rebalancing.

But to more efficiently use hardware, adapt to variable total and core-specific loads, and run in cloud infrastructures, changes were required. We introduced a new architecture in which cores are persisted in a Blob Store (S3) with Solr servers’ local SSD storage used only as a cache.

Cores are then loaded by servers as needed, and popular cores are replicated and maintained. Inactive cores are removed (but kept in Blob Store!). Servers can be shut down when overall load is low, because the remaining servers can serve any core. And performance is globally unaffected, because querying and indexing use a local core copy after the core is initially loaded from the Blob Store.

Core segment updates between Solr servers and the Blob Store build on Solr Replication logic, replacing the dialogue between two servers with Blob Store persisted metadata.

This architecture provides an elastic, highly available and scalable search cluster with a relatively simple implementation.

Speakers
avatar for Ilan Ginzburg

Ilan Ginzburg

Architect, Salesforce
Ilan works on search infrastructure and integration problems from the Salesforce office in Grenoble, France. He holds business administration and computer science engineering degrees and a PhD in parallel computing.Prior to Salesforce, Ilan worked at Intel, HP Labs in Palo Alto, a... Read More →


Thursday October 18, 2018 2:55pm - 3:35pm
Salon 6&7

3:35pm

PM Break
Thursday October 18, 2018 3:35pm - 3:50pm

3:50pm

Closing Session: The Future of Search & AI
Speakers
avatar for Trey Grainger

Trey Grainger

SVP of Engineering, Lucidworks
Trey is the SVP of Engineering at Lucidworks, where he leads their engineering efforts around Lucidworks Fusion, Apache Lucene/Solr, and their other open source and commercial offerings. Trey is also the co-author of the book Solr in Action, as well as a published researcher and frequent... Read More →


Thursday October 18, 2018 3:50pm - 4:30pm
Salle de bal