Lucene/Solr Architecture

Slides:



Advertisements
Similar presentations
Apache Solr Out Of The Box (OOTB) Chris Hostetter
Advertisements

Building FHIR Servers on Existing Applications
CTS2 DEVELOPMENT FRAMEWORK CTS2 Overview. Schedule What is it? Why a framework? What does this do for me? Plugins Implementations available now CTS2 Compliance.
Lucene/Solr Architecture
Raptor Technical Details. Outline Workshop structured by Raptor workflow – Raptor Event model. – ICA log file parsing – ICA/MUA event storage – ICA event.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
EasySearch Technical Overview. Ever seen a website without a full text search? BUT – Search is expensive Financially Computationally – Search is complicated.
Computing & Information Sciences Kansas State University Kansas State University Olathe Workshop on Big Data – August, 2014 KSU Laboratory for Knowledge.
Apache Solr Yonik Seeley 29 June 2006 Dublin, Ireland.
1.  Understanding about How to Working with Server Side Scripting using PHP Framework (CodeIgniter) 2.
AskMe A Web-Based FAQ Management Tool Alex Albu. Background Fast responses to customer inquiries – key factor in customer satisfaction Costs for customer.
 Apache Solr Apache Solr – Introduction David Shemer.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
Solr has a lot of extensive features Solr Integration and Enhancements Todd Hatcher.
Richa Arora.  Tool Identified and Overview  Schema.xml  Tokenization, Stop words, and Synonym Handling  Indexing  Data Import Handler  Query format.
Information Retrieval in Practice
Fast Track to ColdFusion 9. Getting Started with ColdFusion Understanding Dynamic Web Pages ColdFusion Benchmark Introducing the ColdFusion Language Introducing.
Mike Jackson EPCC OGSA-DAI Today Release 2.2 Principles and Architectures for Structured Data Integration: OGSA-DAI.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Basic features ● Document database ● Paid deployment ● JSON ● C#, HTTP REST, Java ● version 3.0.
Overview of Search Engines
Implementing search with free software An introduction to Solr By Mick England.
Full-Text Search with Lucene Yonik Seeley 02 May 2007 Amsterdam, Netherlands.
Powerful Full-Text Search with Solr Yonik Seeley Web 2.0 Expo, Berlin 8 November 2007 download at
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
© NYC Apache Lucene/Solr Meetup. Lucid Imagination, Inc. Agenda Welcome "Faster. Better. Solr! What to look for in Solr 1.4“ Yonik Seeley,
Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć – sematext.com.
Kuali Rice at Indiana University Rice Setup Options July 29-30, 2008 Eric Westfall.
Dynamic Data Exchanges with the Java Flow Processor Presenter: Scott Bowers Date: April 25, 2007.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
Copyright © Orbeon, Inc. All rights reserved. Erik Bruchez Applications of XML Pipelines XML Prague, June 16 th, 2007.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Revolutionizing enterprise web development Searching with Solr.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Open Search Office Web Services Database Doc Mgt Sys Pipeline Index Geospatial Analysis Text Search Faceting Caching Query parsing Clustering Synonyms.
Kuali Enterprise Workflow Kuali Days – November 2008 Scott Gibson, University of Maryland Bryan Hutchinson, Cornell University James Smith, University.
MAKANI ANDROID APPLICATION Prepared by: Asma’ Hamayel Alaa Shaheen.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
AxKit A member of the Apache XML project Ryan Maslyn Kyle Bechtel.
Free the Data: creating a web services interface to the online catalog Emily Lynema NC State University Libraries Code4lib 2007 February 28, 2007.
1 Aspire Document Processing 1. 2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration.
807 - TEXT ANALYTICS Massimo Poesio Lab 2: (Quick intro to) SOLR Document clustering with MAHOUT.
Herzog August Bibliothek Wolfenbüttel Backend, Service, Listener VuFind's new SOLR connection Originally Presented By David Maus Herzog August Bibliothek.
Lucene Jianguo Lu.
Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
INFORMATION RETRIEVAL Pabitra Mitra Computer Science and Engineering IIT Kharagpur
Apache Solr Beyond The Box Chris Hostetter
Information Retrieval in Practice
Ask the Experts – Building Login-Based Sites in AEM
Implementing Autocomplete with Solr and jQuery
Search Engine Architecture
Node.js Express Web Applications
Searching and Indexing
Open Source distributed document DB for an enterprise
Safe by default, optimized for efficiency
Building Search Systems for Digital Library Collections
CS 5604 Information Storage and Retrieval
CS6604 Digital Libraries IDEAL Webpages Presented by
How the VIAF Magic Happens
SDLIP + STARTS = SDARTS A Protocol and Toolkit for Metasearching
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Rafał Kuć – Sematext sematext.com
Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20
CloudAnt: Database as a Service (DBaaS)
Created by Atif Aziz. ELMAH means is "Error Logging Modules and Handlers". It is an application-wide error logging facility that is completely pluggable.
Presentation transcript:

Lucene/Solr Architecture Request Handlers Response Writers Update Handlers /admin /select /spell XML Binary JSON XML CSV binary Extracting Request Handler (PDF/WORD) Search Components Schema Update Processors Query Highlighting Signature Spelling Statistics Logging Faceting Debug Indexing Apache Tika More like this Clustering Query Parsing Config Distributed Search Data Import Handler (SQL/RSS) Analysis Faceting Filtering Search Caching High-lighting Index Replication Apache Lucene Core Search IndexReader/Searcher Indexing IndexWriter Text Analysis

Lucene/Solr plugins RequestHandlers – handle a request at a URL like /select SearchComponents – part of a SearchHandler, a componentized request handler Includes, Query, Facet, Highlight, Debug, Stats Distributed Search capable UpdateHandlers – handle an indexing request Update Processor Chains – per-handler componentized chain that handle updates Query Parser plugins Mix and match query types in a single request Function plugins for Function Query Text Analysis plugins: Analyzers, Tokenizers, TokenFilters ResponseWriters serialize & stream response to client

Lucene/Solr Query Plugin Architecture Declarative Analysis per-field - Tokenizer to split text - TokenFilter to transform tokens - Analyzer for completely custom - Separate query / index analyzer QParser plugins - Support different query syntaxes - Support different query execution - Function Query supports pluggable custom functions - Excellent support for nesting/mixing different query types in the same request. schema.xml Whitespace Tokenizer Analyzer for “title” CustomFilter SynonymFilter Porter Stemmer // declaratively defines types // and analyzers for fields <fieldType name=“text1”> <filter=“whitespace”> <filter=“customFilter” …> <filter=“synonyms” file=..> <filter=“porter” except=..> <field name=“title” type=“text1” <field name=“cust1” class=… Analyzer for “cust1” (potentially completely custom architecture not using tokenizer/filters) solrconfig.xml < index configuration /> < caching configuration /> < request handler config /> < search component config /> < update processor config /> < misc – HTTP cache, JMX > <parser name=“mycustom” … <func name=“custom” class=… MyCustom QParser Lucene QParser Function QParser sqrt sum pow custom max log XML QParser DisMax QParser Function Range Q

Lucene/Solr Request Plugins {“response”={ “docs”={ http://.../select?q=cheese&wt=json /select /admin/luke /mypath RequestHandler Request Handler (non-component based) Request Handler (custom) XML response writer Query Component Distributed Search Facet Component XSLT response writer Highlight Component Binary response writer Debug Component Each request handler can be mapped to a different URL SearchHandler is a componentized RequestHandler that allows search components to be chained together and also enables the framework for distributed search operations. Each Searchhandler can have it’s own custom set of search components, along with default or invariant parameters All of the configuration is declarative – including adding new request handlers or search components. The QueryResponse object is very generic and can handle returning any type of data JSON response writer Query Response Custom response writer Additional plug-n-play search components TermVector QueryElevation Spellcheck Terms MoreLikeThis Statistics My Custom Clustering

Lucene/Solr Indexing PDF <doc> <title> /update /update/csv HTTP POST HTTP POST /update /update/csv /update/xml /update/extract XML Update Handler CSV Update Handler XML Update with custom processor chain Extracting RequestHandler (PDF, Word, …) Update Processor Chain (per handler) Text Index Analyzers Data Import Handler Database pull RSS pull Simple transforms Remove Duplicates processor Logging Index Custom Transform RSS feed Just like all request handlers, update handlers can be mapped to a specific URL and have their own set of default or invariant parameters. Each update handler can have it’s own Update Processor Chain that can do Document-level operations prior to indexing, or even redirect indexing to a different server or create multiple documents (or zero) from a single one. All of the configuration is declarative, including the specification of update processor chains. pull Lucene SQL DB pull Lucene Index