Lucene Open Source Search Engine. Lucene - Overview Complete search engine in a Java library Stand-alone only, no server – But can use SOLR Handles indexing.

Slides:



Advertisements
Similar presentations
Lucene in action Information Retrieval A.A – P. Ferragina, U. Scaiella – – Dipartimento di Informatica – Università di Pisa –
Advertisements

Assignment 2: Full text search with Lucene Mathias Mosolf, Alexander Frenzel.
JavaScript I. JavaScript is an object oriented programming language used to add interactivity to web pages. Different from Java, even though bears some.
Advanced XSLT. Branching in XSLT XSLT is functional programming –The program evaluates a function –The function transforms one structure into another.
CC SQL Utilities.
Lucene Tutorial Based on Lucene in Action Michael McCandless, Erik Hatcher, Otis Gospodnetic.
Introduction to Information Retrieval Introduction to Information Retrieval Lucene Tutorial Chris Manning and Pandu Nayak.
MOSS 2007 Document Management Adam McCarthy 1 st April 2009.
Formal Language, chapter 4, slide 1Copyright © 2007 by Adam Webber Chapter Four: DFA Applications.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Advanced Indexing Techniques with
1 Various Methods of Populating Arrays Randomly generated integers.
The Lucene Search Engine Kira Radinsky Modified by Amit Gross to Lucene 4 Based on the material from: Thomas Paul and Steven J. Owens.
Lucene in action Information Retrieval A.A – P. Ferragina, U. Scaiella – – Dipartimento di Informatica – Università di Pisa –
Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.
Information Retrieval in Practice
The Lucene Search Engine Kira Radinsky Based on the material from: Thomas Paul and Steven J. Owens.
Chapter 10.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
A Guide to Oracle9i1 Introduction To Forms Builder Chapter 5.
1 Chapter 2 Introductory Programs. 2 Getting started To create and run a Java program –Create a text file with a.java extension for the source code. For.
Overview of Search Engines
Introduction to Information Retrieval Introduction to Information Retrieval Lucene Tutorial Chris Manning, Pandu Nayak, and Prabhakar Raghavan.
Full-Text Search with Lucene Yonik Seeley 02 May 2007 Amsterdam, Netherlands.
® IBM Software Group © 2006 IBM Corporation JSF Tab Controls This Learning Module shows how to develop server-side EGL applications with dynamic content.
1 Introduction to Lucene Rong Jin. What is Lucene ?  Lucene is a high performance, scalable Information Retrieval (IR) library Free, open-source project.
Apache Lucene in LexGrid. Lucene Overview High-performance, full-featured text search engine library. Written entirely in Java. An open source project.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
JSP Standard Tag Library
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Advanced Lucene Grant Ingersoll Center for Natural Language Processing ApacheCon 2005 December 12, 2005.
Lucene Performance Grant Ingersoll November 16, 2007 Atlanta, GA.
© The McGraw-Hill Companies, 2006 Chapter 4 Implementing methods.
Analyzing Data For Effective Decision Making Chapter 3.
INTRODUCTION TO JAVASCRIPT AND DOM Internet Engineering Spring 2012.
COP3530 Data Structures600 Stack Stack is one the most useful ADTs. Like list, it is a collection of data items. Supports “LIFO” (Last In First Out) discipline.
Introduction of Geoprocessing Topic 7a 4/10/2007.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
Freemarker ● Introduction ● Core features ● Java part example ● Template example ● Expressions ● Builtins ● Assigning value ● Conditions ● Loops ● Macros.
Indexing UMLS concepts with Apache Lucene Julien Thibault University of Utah Department of Biomedical Informatics.
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
Discussion 5. Lab 3 Not really using rectangle class Return type of getPoint is String instead of Point You are not able to retrieve the point if you.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Using Data Within a Program Chapter 2.  Classes  Methods  Statements  Modifiers  Identifiers.
Lucene-Demo Brian Nisonger. Intro No details about Implementation/Theory No details about Implementation/Theory See Treehouse Wiki- Lucene for additional.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
“ Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and.NET ”
Applications Development
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
Design a full-text search engine for a website based on Lucene
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Introduction to Files in VB Chapter 9.1, 9.3. Overview u Data Files  random access  sequential u Working with sequential files  open, read, write,
1 Java Server Pages A Java Server Page is a file consisting of HTML or XML markup into which special tags and code blocks are inserted When the page is.
Lucene Jianguo Lu.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Address Book Application Introducing Database Programming.
Arrays Chapter 7. MIS Object Oriented Systems Arrays UTD, SOM 2 Objectives Nature and purpose of an array Using arrays in Java programs Methods.
Introduction of Geoprocessing Lecture 9 3/24/2008.
Java: Variables and Methods By Joshua Li Created for the allAboutJavaClasses wikispace.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
IST 210: PHP Basics IST 210: Organization of Data IST2101.
In this session, you will learn to: Create and manage views Implement a full-text search Implement batches Objectives.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Lucene : Text Search IG5 – TILE Esther Pacitti. Basic Architecture.
Compiler Construction (CS-636)
Lucene in action Information Retrieval A.A
Lucene/Solr Architecture
Introduction to javadoc
Presentation transcript:

Lucene Open Source Search Engine

Lucene - Overview Complete search engine in a Java library Stand-alone only, no server – But can use SOLR Handles indexing and query Fully featured – but not 100% complete Customizable – to an extent Fully open source Current version: 3.6.1

Lucene Implementations LinkedIn – OS software on integer list compression Eclipse IDE – For searching documentation Jira Twitter Comcast – XfinityTV.com, some set top boxes Care.com MusicBrainz Apple, Disney BobDylan.com

Indexing Lucene

Lucene - Indexing Directory = A reference to an Index – RAMDirectory, SimpleFSDirectory IndexWriter = Writes to the index, options: – Limited or unlimited field lengths – Auto commit – Analyzer ( how to do text processing, more on this later ) – Deletion Policy ( only for deleting old temporary data ) Document – Holds fields to index Field – A name/value pair + index/store flags

Lucene – Indexer Outline SimpleFSDirectory fsDir = new SimpleFSDirectory(File) IndexWriter iWriter = new IndexWriter(fsDir,…) Loop: fetch text for each document { Document doc = new Document(); doc.add(new Field(…)); // for each field iWriter.addDocument(doc); } iWriter.commit(); iWriter.close(); fsDir.close();

Class Materials SharePoint link – use “search\[flast]” username – sharepoint.searchtechnologies.com – Annual Kickoff – Shared Documents – FY2013 Presentations – Introduction to Lucene lucene-training-src-FY2013.zip

Lucene – Index – Exercise 1 Create A new Maven Project – mvn archetype:generate - DgroupId=com.searchtechnologies - DartifactId=lucene-training - DarchetypeArtifactId=maven-archetype- quickstart -DinteractiveMode=false – Right click pom.xml, Maven.. Add Dependency lucene-core in search box Choose Expand Maven Dependencies.. Right click lucene-core.. Maven download sources – Source code level = 1.6 Copy Source File: LuceneIndexExercise.java – Into com.searchtechnologies package Copy data directory to your project Follow instructions in the file

Query Lucene

Lucene - Query Directory = An index reference IndexReader = Reads the index, typically associated with reading document fields – readOnly IndexSearcher = Searches the Index QueryParser – Parses a string to a Query – QueryParser = Standard Lucene Parser – Constructor: Version, default field, analyzer Query – Query expression to execute – Returned by qParser.parse(String) – Search Tech’s QPL can generate Query objects

Lucene – Query part 2 Executing a Search – TopDocs td = iSearcher.search(, ) TopDocs – Holds statistics on the search plus the top N documents – totalHits, scoreDocs[], maxScore ScoreDoc –Information on a single document – Doc ID and score Use IndexReader to fetch any Document from a Doc ID – (includes all fields for the document)

Lucene – Search Outline SimpleFSDirectory fsDir = new SimpleFSDirectory(File f) IndexReader iReader = new IndexReader(fsDir,…) IndexSearcher iSearcher = new IndexSearcher(iReader) StandardAnalyzer sa = new StandardAnalyzer(…) QueryParser qParser = new QueryParser(…) Loop: fetch a query from the user { Query q = qParser.parse( ) TopDocs tds = iSearcher.search(q, 10); Loop: For every document in tds.scoreDocs { Document doc = iReader.document(tds.scoreDocs[i].doc); Print: tds.scoreDocs[i].score, doc.get(“field”) } // Close the StandardAnalyzer, iSearcher, and iReader

Lucene – Query – Exercise 2 Open Source File: LuceneQueryExercise.java Follow instructions in the file

Relevancy Tuning Lucene

Lucene Extras – Fun Things You Can Do iWriter.updateDocument(Term, Document) – Updates a document which contains the “Term” – “Term” in this case is a field/value pair Such as “id” = “ ” doc.boost( ) – Multiplies term weights in the doc by boost value – Part of “fieldNorm” when you do an “explain” field.boost( ) – Multiplies term weights in field by boost value

Explain - Example iSearcher.explain(query, doc-number) Query: star OR catch^0.6 for document = (MATCH) product of: = (MATCH) sum of: = (MATCH) weight(title:catch^0.6 in 903), product of: = queryWeight(title:catch^0.6), product of: 0.6 = boost = idf(docFreq=1, maxDocs=1005) = queryNorm = (MATCH) fieldWeight(title:catch in 903), product of: 1.0 = tf(termFreq(title:catch)=1) = idf(docFreq=1, maxDocs=1005) = fieldNorm(field=title, doc=903) 0.5 = coord(1/2)

Lucene – Query– Exercise 3 Add explain to your query program Explanation exp = iSearcher.explain(... ) Call it for all documents produced by your search Simply use toString() on the result of explain() to display the results

Boosting – Other Issues Similarity Class Javadoc Documentation – Very useful discussion of boosting formulas Similarity.encodeNormValue() – 8-bit floating point! 0.00 => => 6E 0.20 => => => => => => => 7A 0.90 => 7B 1.00 => 7C 1.10 => 7C 1.20 => 7C 1.30 => 7D 1.40 => 7D 1.50 => 7E 1.60 => 7E 1.70 => 7E 1.80 => 7F 1.90 => 7F 2.00 => => => => => => => => => => => => => => => => => => => => => => => => => => => => => => => 84

Lucene Query Objects Query objects are used to execute the search Query Parser Query String iSearcher. search() Top Docs All Derived from the Lucene Query class

Lucene Query Objects - Example (george AND washington) OR (thomas AND jefferson) BooleanQuery (clauses = SHOULD) BooleanQuery (clauses = MUST) TermQuery george TermQuery washington TermQuery thomas TermQuery jefferson BooleanQuery (clauses = MUST)

Lucene BooleanQuery george +washington -martha jefferson -thomas +sally WORKS LIKE AND: BooleanQuery bq = new BooleanQuery(); bq.add( X, Occur.MUST); bq.add( Y, Occur.MUST); WORKS LIKE OR: BooleanQuery bq = new BooleanQuery(); bq.add( X, Occur.SHOULD); bq.add( Y, Occur.SHOULD); WORKS LIKE: X AND (X OR Y) BooleanQuery bq = new BooleanQuery(); bq.add( X, Occur.MUST); bq.add( Y, Occur.SHOULD);

Lucene – Query– Exercise 4 Create BooleanQuery and TermQuery objects as necessary to create a query without the query parser Goal: (star AND wars) OR (blazing AND saddles) TermQuery: tq = new TermQuery(new Term("field","token")) BooleanQuery: BooleanQuery bq = new BooleanQuery(); bq.add(, Occur.MUST); Occur – Occur.MUST, Occur.SHOULD, Occur.MUST_NOT TermQuery and BooleanQuery derive from Query – Any “Query” objects can be passed to iSearcher.search()

Lucene Proximity Queries “Spanning” Queries  Return matching “spans” DOCUMENT: Four score and seven years ago, our forefathers brought forth… Query: Returns: four before/5 seven0:4 (four before/5 seven) before forefathers0:8 brought near/3 ago5:9 (four adj score) or (brought adj forth)0:2, 8: Word positions mark word boundaries

Proximity Queries : Available Operators (standard) SpanTermQuery – For terms inside spanning queries (standard) SpanNearQuery – Inorder flag  handles both near and before (standard) SpanOrQuery (standard) SpanMultiTermQueryWrapper – fka SpanRegexQuery (Search Tech) SpanAndQuery (SearchTech) SpanBetweenQuery – between(start,end,positive-content,not-content)

Span Queries demo of LuceneSpanDemo.java

Analysis Lucene

Analyzers “Analysis” = “Text Processing” in Lucene Includes: – Tokenization Since 1955, the B-52…  Since, 1955, the, B, 52 – Token filtering Splitting, joining, replacing, filtering, etc. Since, 1955, the, B, 52  1955, B, 52 George, Lincoln  george, lincoln MasteringBiology  Mastering, Biology B-52  B52, B-52, B, 52 – Stemming tables  table carried  carry

Analyzer Analyzer, Tokenizer, TokenFilter Tokenizer: Text  TokenStream TokenFilter: TokenStream  TokenStream Analyzer: A complete text processing function (one tokenizer + multiple token filters) – Manufactures TokenStreams TokenizerTokenFilter... string

Existing Analyzers, Tokenizers, Filters Tokenizer – (Standard) CharTokenizer, WhitespaceTokenizer, KeywordTokenizer, ChineseTokenizer, CJKTokenizer, StandardTokenizer, WikipediaTokenizer (more) – (Search Tech) UscodeTokenizer (produces each HTML as a separate token) TokenFilter – Stemmers: (Standard) many language-specific stemmers, PorterStemFilter, SnoballFilter – Stemmers: (Search Tech) Lemmatizer

Existing Analyzers, Tokenizers, Filters TokenFilters (continued) – LengthFilter, LowerCaseFilter, StopFilter, SynonymTokenFilter (don’t use), WordDelimiterFilter (SOLR only) Analyzers – WhitespaceAnalyzer, StandardAnalyzer, various language analyzers, PatternAnalyzer Analyzers almost always need to be customized.

Creating and Using TokenStream TokenStream tokenStream = new SomeTokenizer(…); tokenStream = new SomeTokenFilter1(tokenStream); tokenStream = new SomeTokenFilter2(tokenStream); CharTermAttribute charTermAtt = tokenStream.getAttribute(CharTermAttribute.class); OffsetAttribute offsetAtt = tokenStream.getAttribute(OffsetAttribute.class); while (tokenStream.incrementToken()) { charTermAtt  Now contains info on the token’s term offsetAtt.startOffset()  Now contains the token’s start offset }

Token Streams - How They Work TokenFilter Call incrementToken() TokenFilter Call incrementToken() Tokenizer Call incrementToken() Get next token from Reader() store in Attribute objects Return Modify attribute objects and return Modify attribute objects and return Use Attribute Objects

Creating and Using TokenStream DEMO

Replacement Pattern TokenFilter incrementToken() Call incrementToken() Modify attribute objects and return Token Filters Simply Modify Attributes that Pass Through

Token Filter – Replacement Pattern public final class LowerCaseFilter extends TokenFilter { public LowerCaseFilter(TokenStream input) { super(input); termAtt = (CharTermAttribute) addAttribute(CharTermAttribute.class); } private CharTermAttribute termAtt; public final boolean incrementToken() throws IOException { if (input.incrementToken()) { final char[] buffer = termAtt.buffer(); final int length = termAtt.length(); for(int i=0;i<length;i++) buffer[i] = Character.toLowerCase(buffer[i]); return true; } else return false; }

Deletion Pattern TokenFilter incrementToken() Call incrementToken() Token Filters Check Token Attributes and May Call incrementToken() Multiple Times Keep Looping Until A Good Token is Found Then Return It

Token Filter – Deletion Pattern public final class TokenLengthLessThan50CharsFilter extends TokenFilter { public TokenLengthLessThan50CharsFilter(TokenStream in) { super(in); termAtt = (CharTermAttribute) addAttribute(CharTermAttribute.class); posIncrAtt = (PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class); } private CharTermAttribute termAtt; private PositionIncrementAttribute posIncrAtt; public final boolean incrementToken() throws IOException { int skippedPositions = 0; while(input.incrementToken()) { final int length = termAtt.length(); if(length > 50) { skippedPositions += posIncrAtt.getPositionIncrement(); continue; } posIncrAtt.setPositionIncrement( posIncrAtt.getPositionIncrement() + skippedPositions); return true; } return false; }

Splitting Tokens Pattern – First Call TokenFilter incrementToken() Call incrementToken() When Splitting a Token, Save the Splits Aside For Later Saved Token Split Token Return First Half Save Second Half

Splitting Tokens Pattern – Second Call TokenFilter incrementToken() When Called the Second Time, Just Return Saved Token Saved Token Return Saved Token

Token Filter – Splitting Pattern public final class SplitDashFilter extends TokenFilter { public SplitDashFilter(TokenStream in) { super(in); termAtt = (CharTermAttribute) addAttribute(CharTermAttribute.class); } private CharTermAttribute termAtt; char[] saveToken = new char[100]; // Buffer to hold tokens from previous incrementToken() call int saveLen = 0; public final boolean incrementToken() throws IOException { if(saveLen > 0) { // Output previously saved token termAtt.setEmpty(); termAtt.append(new String(saveToken, 0, saveLen)); saveLen = 0; return true; } if (input.incrementToken()) { // Get a new token to split final char[] buffer = termAtt.buffer(); final int length = termAtt.length(); boolean foundDash = false; for(int i=0;i<length;i++) { // Scan token looking for ‘–’ to split it if(buffer[i] == ‘-’) { foundDash = true; termAtt.setLength(i); // Set length so termAtt = first half now } else if(foundDash) saveToken[saveLen++] = buffer[i]; // Save second half for later } return true; // Output first half right away } else return false; }

Token Splitting DEMO

Stemmers and Lemmatizers Stemmers available in Lucene – Snoball, Porter – They are both terrible [much too aggressive] – For example: mining  min Kstem – Publicly available stemmer with Lucene TokenFilter Implementation – Better, but still too aggressive: searchmining  searchmine Search Technologies Lemmatizer – Based on GCIDE Dictionary – Extremely accurate, only reduces words to dictionary entries – Also does irregular spelling reduction: mice  mouse – STILL A WORK IN PROGRESS: Needs one more refactor

ST Query Processing Lucene

Search Technologies Query Parser Originally written for GPO – Query  FAST FQL Converted to.Net for CPA Refactored for Lucene for Aspermont Refactored to be more componentized and pipeline-oriented for OLRC Still a work in progress – Lacks documentation, wiki, etc.

Search Technologies Query Processing Query Parser – Parses the user’s entered query Query Processing Pipeline – A sequence of query processing components which can be mixed and matched Lucene Query Builder Other Query Builders Possible – FAST, Google, etc. – No others implemented yet Query Configuration File – Holds query parsing and processing parameters

Our Own Query Processing: Why? Gives us more control – Can exactly meet user’s query syntax Exposes operators not available through Lucene Syntax – Example: before proximity operator “behind the scenes” query tweaking – Field weighting – Token merging: rio tinto  url:riotinto – Exact case and exact suffix matching – True lemmatization (not just stemming)

ST Query Parser – Overall Structure Parser Query String Processor Top Docs Generic “AQNode” Structures Processor Lucene Builder... Lucene Query Structures

The Search Technologies Query Structure userQuery nodeQuery finalQuery Query String Lucene Query Structures Generic AQNode Structures Holds references to all query representations Therefore, query processors can process any query representation Everything is a QueryProcessor – Parsing, processing, and query building

Query Parser: Features AND, OR, NOT, parenthesis – ((star and wars) or (star and trek)) – star and not born {broken} +, - + = query boost - = not {broken} Proximity operators – within/3, near/3, adj Phrases field: searches title:(star and wars) and description:(the original)

Using Query Processors Load Query Configuration QueryConfig qConfig = new QueryConfig("data/query-config.xml"); Create Query Processor IQueryProcessor iqp2 = new TokenizationQueryProcessor(); Initialize Query Processor iqp2.initialize(qConfig); Use Query Processors (simply call in sequence) iqp1.process(query); iqp2.process(query); iqp3.process(query);

Query Processors: Other Notes Types of processors: (off the shelf) – Lemmatizer, tokenization, lower case QueryParser and Query classes may need to be fully qualified com.searchtechnologies.queryprocessor.Query query = new com.searchtechnologies.queryprocessor.Query(queryString); Query Parser Only Splits on Whitespace star-trek or star-wars  or(star-trek,star-wars) Use TokenizationQueryProcessor to split fully  or(phrase(star,trek),phrase(star,wars))

ST Query Processor – Exercise 5 Add ST QueryProcessor to your Lucene Query Add Dependency to your pom.xml: – com.searchtechnologies: st-queryparser: 0.3 Add Processors – com.searchtechnologies.queryprocessor.QueryParser – TokenizationQueryProcessor() – LowercaseQueryProcessor() – LuceneQueryBuilder() Initialize Config, Construct Processors, Initialize Processors, Execute Processors

Creating Your Own Query Processor AQNode – “Aspire Query Node” – Operands – list of operands (references to other AQNodes) – Operator – Enumerated list (AND, OR, NEAR…) – Proximity window (int) – From value, to value (objects) Use from value for token strings Use from + to value for date ranges, int ranges, etc. – startChar, endChar (in original user’s query string) – Enclosing field name – Other stuff for future expansion Attached data objects Custom Query Builder

Query Processor Outline public class MyQueryProcessor implements IQueryProcessor public void initialize(QueryConfig config) throws QueryProcessorException { // Read any parameters you need from the config // config is an AXML (a wrapper around a W3C DOM object) public void process(Query query) throws QueryProcessorException { // Process the query // query.getNodeQuery()  The AQNode version of the query // query.getUserQuery()  The original query string // query.getFinalQuery()  The final (typically Lucene) query structure }

Query Processor Example public class LowercaseQueryProcessor implements IQueryProcessor public void initialize(QueryConfig config) throws QueryProcessorException { public void process(Query query) throws QueryProcessorException { convertToLowerCaseAQNodes(query.getNodeQuery()); } void convertToLowerCaseAQNodes(AQNode aqn) { if(aqn.getOperator() == AQNode.OperatorEnum.TERM) { String termText = (String)aqn.getFromValue(); aqn.setFromValue(termText.toLowerCase()); return; } if(aqn.getOperands() == null) return; for(AQNode childAqn : aqn.getOperands()) { convertToLowerCaseAQNodes(childAqn); }

ST Query Processor – Exercise 6 Copy the FixStarQueryProcessor – Looks for “sta” and changes them to “star” Fill out the contents of the QueryProcessor Add the QueryProcessor to your query program Run the program and query on “sta” – Add to STQueryProcessorExcercise5.java

Query Processing New Features Template Substitution (OLRC) – field:() searches are substituted for arbitrary query expressions Lemmatization (OLRC, BNA) Wildcard Handling (OLRC) Refactor Aspermont Query Processors – Semantic Network Expansion (ontology) – Add boost/reduce tokens (field:HI, field:LO) – Proximity boost – Composite fields and query field boost

Custom Hit Collector Collect() method called for each matching doc – Should be fast – Throw exception to break out of loop – Relation to Scorer DecadesCollector – Custom collector to take the top scoring document from each decade One main collector that wraps one TopDocsScoreCollector per decade See Source DecadesCollector.java

Complete Open Source Search Engine