Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lucene/SOLR 2: Lucene search API

Similar presentations


Presentation on theme: "Lucene/SOLR 2: Lucene search API"— Presentation transcript:

1 Lucene/SOLR 2: Lucene search API
voorgerecht: Searcher, Term, Sort, Filter hoofdgerecht: Query, Similarity, QueryParser toetje: Hits, Highlighter, SpellChecker TU Delft Library Digitale Productontwikkeling Egbert Gramsbergen

2 org.apache.lucene.search.Searcher
int i int i class Verbasterd UML class diagram Document Document Searcher * doc docFreq explain search getSimilarity setSimilarity +lower level methods (performance tuning) Term ([]) constructor int ([]) argument --- return value --> Explanation int doc Query optional ... Filter Sort methods Hits Similarity

3 org.apache.lucene.search.Searcher
FSDirectory RAMDirectory DbDirectory JEDirectory IndexSearcher * Directory Searcher String path IndexReader MultiSearcher * FilterIndexReader MultiReader [] Searcheable ParallelReader ParallelMultiSearcher * RemoteSearcheable

4 Term * createTerm field text compareTo String field String text int
org.apache.lucene.index.Term Term * createTerm field text compareTo String field String text int Gebruik: o.a. bouwsteen van Query en Filter

5 Sort ([]) SortField ([]) [] * * setSort * setSort getSort
org.apache.lucene.search.Sort N.B. Lucene kent geen strongly typed fields, SOLR wel Sort * * setSort ([]) SortField int AUTO, CUSTOM, DOC, SCORE, INT, LONG, FLOAT, DOUBLE, STRING * String field boolean reverse ([]) [] setSort getSort String field boolean reverse int type SortComparatorSource Locale * String language String country String variant

6 org.apache.lucene.search.Filter
BooleanFilter ChainedFilter Filter DuplicateFilter PrefixFilter QueryWrapperFilter gebruik: bijv. in faceted search RangeFilter SpanFilter CachingWrapperFilter voorbeeld: TermsFilter * addTerm Term more…

7 org.apache.lucene.search.Query
TermQuery FuzzyQuery MultiTermQuery WildcardQuery BooleanQuery RegexQuery Query PhraseQuery PrefixQuery SpanFirstQuery MultiPhraseQuery SpanNearQuery RangeQuery SpanNotQuery SpanQuery SpanOrQuery BoostingQuery SpanRegexQuery ConstantScoreQuery SpanTermQuery ConstantScoreRangeQuery DisjunctionMaxQuery BoostingTermQuery FilteredQuery FuzzyLikeThisQuery MatchAllDocsQuery ValueSourceQuery FieldScoreQuery MoreLikeThisQuery CustomScoreQuery

8 org.apache.lucene.search.Query
setBoost getBoost rewrite Float boost IndexReader TermQuery * getTerm Term PhraseQuery * add getTerms setSlop [ ] int position int slop

9 org.apache.lucene.search.BooleanQuery
* add getClauses setMinimumNumberShouldMatch boolean disableCoord [ ] BooleanClause * int Query  and/or-ish query //example BooleanQuery bq; float andNess = 0.5; // 0.:OR(default), 1.:AND … BooleanClause[] clauses = bq.getClauses(); int numOpt = 0; for (int 1 = 0; i<clauses.length; i++ { if (clauses[i].getOccur()==BooleanClause.Occur.SHOULD) numOpt++; } bq.setMinimumNumberShouldMatch(Math.round(numOpt*andNess)); //NOTE: if there is no MUST clause at least 1 SHOULD clause must match BooleanClause.Occur int MUST, MUST_NOT, SHOULD

10 org.apache.lucene.search.tunction.CustomScoreQuery
([]) ValueSourceQuery int doc float subQueryScore float([]) valSrcScore(s) float FieldScoreQuery * String field Use cases: * Meewegen pub. type+jaar (bibliotheek) * Geografische nabijheid (search “pizza”) override FieldScoreQuery.Type int BYTE, SHORT, INT, FLOAT Default: subQueryScore * valSrcScores[0] * valSrcScores[1] * … Pub.jaar: score = 1+a/(1+τ), τ=(t-tp)/t0 a t t-tp

11 org.apache.lucene.search.Similarity
Hier wordt het echte werk verricht: org/apache/lucene/search/Similarity.html Query, Document  Score volgens Vector Space model

12 org.apache.lucene.queryParser.QueryParser
String  Query (hoera!) ::=def. ()nesting *repetition []optional |or | | | | | Query ::= ( Clause )* | | Clause ::= ["+"|"-"] [<TERM> ":"] ( <TERM> | "(" Query ")" ) | | | | | AND NOT field | nested query single term or phrase Voorbeelden: aaa bbb ccc year:[2000 TO 2005] (inclusive) +aaa bbb –ccc price:{020 TO 100} (not inclusive) "aaa bbb" aaa^3 bbb (boost) title:aaa "aaa bbb"^0.5 title:(+aaa bbb) AND author:"ddd e f" 1/ (/ escape char) aaa* bb*b cc?c aaa~ (fuzzy/min.similarity) "aaa bbb"~10 (proximity/slop) gaat ook nog door Analyzer  Strings: 20<100 Lucene: alleen Strings SOLR: strongly typed fields!  NIET: "aaa* bbb"  NIET: *aaa, ?aaa

13 org.apache.lucene.queryParser.QueryParser
Niet iedere Query kan door QueryParser worden gemaakt (te ingewikkeld of bescherming performance) “New Yor*” *ork “New York” binnen 10 woorden afstand van “Broadway” en max. 5 woorden na het begin van het veld Niet iedere Query wil door QueryParser worden gemaakt Doe aan Interface ontwerp, bijv. * vrije text invoer (geQueryParsed) * aparte interface elementen voor: * velden * ranges * facetten, more like this, …

14 org.apache.lucene.queryParser.QueryParser
StandardAnalyzer RussianAnalyzer QueryParser * parse setDefaultOperator setPhraseSlop setFuzzyMinSim … String defaultField BrazilianAnalyzer Analyzer DutchAnalyzer * String query Query File stopwords String[] stopwords HashSet stopwords QueryParser.Operator AND_OPERATOR, OR_OPERATOR float int

15 Hits Document doc score iterator length int n float score Hit
org.apache.lucene.search.Hits Searcher search Document get getFields … String fieldName String value List fields Hits doc score iterator length Field name getValue … int n float score Hit getDocument getScore HitIterator next hasNext length boolean hasNext int length N.B. gebruik HitCollector (low-level API) voor grote aantallen hits

16 org.apache.lucene.search.highlight.Highlighter
* setTextFragmenter getBestFragments … QueryScorer * Query Scorer (fragmentScorer) IndexReader Formatter String fieldName SimpleHTMLFormatter * String preTag String postTag Float maxScore String minForegroundcolor String maxForegroundcolor String minBackgroundcolor String maxBackgroundcolor Analyzer String fieldName String text int maxNumFragments GradientFormatter SpanGradientFormatter * String[] bestFragments Fragmenter SimpleFragmenter * int fragmentSize

17 org.apache.lucene.search.spell.SpellChecker
N-gram index SpellChecker * indexDictionary suggestSimilar setAccuracy … PlainTextDictionary * Directory (spellIndex) File InputStream Reader Dictionary IndexReader LuceneDictionary * String field boolean morePopular String word int numSug String[] words float minScore


Download ppt "Lucene/SOLR 2: Lucene search API"

Similar presentations


Ads by Google