Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lucene/Solr Architecture

Similar presentations


Presentation on theme: "Lucene/Solr Architecture"— Presentation transcript:

1 Lucene/Solr Architecture
Request Handlers Response Writers Update Handlers /admin /select /spell XML Binary JSON XML CSV binary Extracting Request Handler (PDF/WORD) Search Components Schema Update Processors Query Highlighting Signature Spelling Statistics Logging Faceting Debug Indexing Apache Tika More like this Clustering Query Parsing Config Distributed Search Data Import Handler (SQL/RSS) Analysis Faceting Filtering Search Caching High-lighting Index Replication Apache Lucene Core Search IndexReader/Searcher Indexing IndexWriter Text Analysis

2 Lucene/Solr plugins RequestHandlers – handle a request at a URL like /select SearchComponents – part of a SearchHandler, a componentized request handler Includes, Query, Facet, Highlight, Debug, Stats Distributed Search capable UpdateHandlers – handle an indexing request Update Processor Chains – per-handler componentized chain that handle updates Query Parser plugins Mix and match query types in a single request Function plugins for Function Query Text Analysis plugins: Analyzers, Tokenizers, TokenFilters ResponseWriters serialize & stream response to client

3 Lucene/Solr Query Plugin Architecture
Declarative Analysis per-field - Tokenizer to split text - TokenFilter to transform tokens - Analyzer for completely custom - Separate query / index analyzer QParser plugins - Support different query syntaxes - Support different query execution - Function Query supports pluggable custom functions - Excellent support for nesting/mixing different query types in the same request. schema.xml Whitespace Tokenizer Analyzer for “title” CustomFilter SynonymFilter Porter Stemmer // declaratively defines types // and analyzers for fields <fieldType name=“text1”> <filter=“whitespace”> <filter=“customFilter” …> <filter=“synonyms” file=..> <filter=“porter” except=..> <field name=“title” type=“text1” <field name=“cust1” class=… Analyzer for “cust1” (potentially completely custom architecture not using tokenizer/filters) solrconfig.xml < index configuration /> < caching configuration /> < request handler config /> < search component config /> < update processor config /> < misc – HTTP cache, JMX > <parser name=“mycustom” … <func name=“custom” class=… MyCustom QParser Lucene QParser Function QParser sqrt sum pow custom max log XML QParser DisMax QParser Function Range Q

4 Lucene/Solr Request Plugins
{“response”={ “docs”={ /select /admin/luke /mypath RequestHandler Request Handler (non-component based) Request Handler (custom) XML response writer Query Component Distributed Search Facet Component XSLT response writer Highlight Component Binary response writer Debug Component Each request handler can be mapped to a different URL SearchHandler is a componentized RequestHandler that allows search components to be chained together and also enables the framework for distributed search operations. Each Searchhandler can have it’s own custom set of search components, along with default or invariant parameters All of the configuration is declarative – including adding new request handlers or search components. The QueryResponse object is very generic and can handle returning any type of data JSON response writer Query Response Custom response writer Additional plug-n-play search components TermVector QueryElevation Spellcheck Terms MoreLikeThis Statistics My Custom Clustering

5 Lucene/Solr Indexing PDF <doc> <title> /update /update/csv
HTTP POST HTTP POST /update /update/csv /update/xml /update/extract XML Update Handler CSV Update Handler XML Update with custom processor chain Extracting RequestHandler (PDF, Word, …) Update Processor Chain (per handler) Text Index Analyzers Data Import Handler Database pull RSS pull Simple transforms Remove Duplicates processor Logging Index Custom Transform RSS feed Just like all request handlers, update handlers can be mapped to a specific URL and have their own set of default or invariant parameters. Each update handler can have it’s own Update Processor Chain that can do Document-level operations prior to indexing, or even redirect indexing to a different server or create multiple documents (or zero) from a single one. All of the configuration is declarative, including the specification of update processor chains. pull Lucene SQL DB pull Lucene Index


Download ppt "Lucene/Solr Architecture"

Similar presentations


Ads by Google