Lucene/SOLR 2: Lucene search API

Lucene/SOLR 2: Lucene search API
voorgerecht: Searcher, Term, Sort, Filter hoofdgerecht: Query, Similarity, QueryParser toetje: Hits, Highlighter, SpellChecker TU Delft Library Digitale Productontwikkeling Egbert Gramsbergen

org.apache.lucene.search.Searcher
int i int i class Verbasterd UML class diagram Document Document Searcher * doc docFreq explain search getSimilarity setSimilarity +lower level methods (performance tuning) Term ([]) constructor int ([]) argument --- return value --> Explanation int doc Query optional ... Filter Sort methods Hits Similarity

org.apache.lucene.search.Searcher
FSDirectory RAMDirectory DbDirectory JEDirectory IndexSearcher * Directory Searcher String path IndexReader MultiSearcher * FilterIndexReader MultiReader [] Searcheable ParallelReader ParallelMultiSearcher * RemoteSearcheable

Term * createTerm field text compareTo String field String text int
org.apache.lucene.index.Term Term * createTerm field text compareTo String field String text int Gebruik: o.a. bouwsteen van Query en Filter

Sort ([]) SortField ([]) [] * * setSort * setSort getSort
org.apache.lucene.search.Sort N.B. Lucene kent geen strongly typed fields, SOLR wel Sort * * setSort ([]) SortField int AUTO, CUSTOM, DOC, SCORE, INT, LONG, FLOAT, DOUBLE, STRING * String field boolean reverse ([]) [] setSort getSort String field boolean reverse int type SortComparatorSource Locale * String language String country String variant

org.apache.lucene.search.Filter
BooleanFilter ChainedFilter Filter DuplicateFilter PrefixFilter QueryWrapperFilter gebruik: bijv. in faceted search RangeFilter SpanFilter CachingWrapperFilter voorbeeld: TermsFilter * addTerm Term more…

org.apache.lucene.search.Query
TermQuery FuzzyQuery MultiTermQuery WildcardQuery BooleanQuery RegexQuery Query PhraseQuery PrefixQuery SpanFirstQuery MultiPhraseQuery SpanNearQuery RangeQuery SpanNotQuery SpanQuery SpanOrQuery BoostingQuery SpanRegexQuery ConstantScoreQuery SpanTermQuery ConstantScoreRangeQuery DisjunctionMaxQuery BoostingTermQuery FilteredQuery FuzzyLikeThisQuery MatchAllDocsQuery ValueSourceQuery FieldScoreQuery MoreLikeThisQuery CustomScoreQuery

org.apache.lucene.search.Query
setBoost getBoost rewrite Float boost IndexReader TermQuery * getTerm Term PhraseQuery * add getTerms setSlop [ ] int position int slop

org.apache.lucene.search.BooleanQuery
* add getClauses setMinimumNumberShouldMatch boolean disableCoord [ ] BooleanClause * int Query  and/or-ish query //example BooleanQuery bq; float andNess = 0.5; // 0.:OR(default), 1.:AND … BooleanClause[] clauses = bq.getClauses(); int numOpt = 0; for (int 1 = 0; i<clauses.length; i++ { if (clauses[i].getOccur()==BooleanClause.Occur.SHOULD) numOpt++; } bq.setMinimumNumberShouldMatch(Math.round(numOpt*andNess)); //NOTE: if there is no MUST clause at least 1 SHOULD clause must match BooleanClause.Occur int MUST, MUST_NOT, SHOULD

org.apache.lucene.search.tunction.CustomScoreQuery
([]) ValueSourceQuery int doc float subQueryScore float([]) valSrcScore(s) float FieldScoreQuery * String field Use cases: * Meewegen pub. type+jaar (bibliotheek) * Geografische nabijheid (search “pizza”) override FieldScoreQuery.Type int BYTE, SHORT, INT, FLOAT Default: subQueryScore * valSrcScores[0] * valSrcScores[1] * … Pub.jaar: score = 1+a/(1+τ), τ=(t-tp)/t0 a t t-tp

org.apache.lucene.search.Similarity
Hier wordt het echte werk verricht: org/apache/lucene/search/Similarity.html Query, Document  Score volgens Vector Space model

org.apache.lucene.queryParser.QueryParser
String  Query (hoera!) ::=def. ()nesting *repetition []optional |or | | | | | Query ::= ( Clause )* | | Clause ::= ["+"|"-"] [<TERM> ":"] ( <TERM> | "(" Query ")" ) | | | | | AND NOT field | nested query single term or phrase Voorbeelden: aaa bbb ccc year:[2000 TO 2005] (inclusive) +aaa bbb –ccc price:{020 TO 100} (not inclusive) "aaa bbb" aaa^3 bbb (boost) title:aaa "aaa bbb"^0.5 title:(+aaa bbb) AND author:"ddd e f" 1/ (/ escape char) aaa* bb*b cc?c aaa~ (fuzzy/min.similarity) "aaa bbb"~10 (proximity/slop) gaat ook nog door Analyzer  Strings: 20<100 Lucene: alleen Strings SOLR: strongly typed fields!  NIET: "aaa* bbb"  NIET: *aaa, ?aaa

Niet iedere Query kan door QueryParser worden gemaakt (te ingewikkeld of bescherming performance) “New Yor*” *ork “New York” binnen 10 woorden afstand van “Broadway” en max. 5 woorden na het begin van het veld Niet iedere Query wil door QueryParser worden gemaakt Doe aan Interface ontwerp, bijv. * vrije text invoer (geQueryParsed) * aparte interface elementen voor: * velden * ranges * facetten, more like this, …

StandardAnalyzer RussianAnalyzer QueryParser * parse setDefaultOperator setPhraseSlop setFuzzyMinSim … String defaultField BrazilianAnalyzer Analyzer DutchAnalyzer * String query … Query File stopwords String[] stopwords HashSet stopwords QueryParser.Operator AND_OPERATOR, OR_OPERATOR float int

Hits Document doc score iterator length int n float score Hit
org.apache.lucene.search.Hits Searcher search Document get getFields … String fieldName String value List fields Hits doc score iterator length Field name getValue … int n float score Hit getDocument getScore HitIterator next hasNext length boolean hasNext int length N.B. gebruik HitCollector (low-level API) voor grote aantallen hits

org.apache.lucene.search.highlight.Highlighter
* setTextFragmenter getBestFragments … QueryScorer * Query Scorer (fragmentScorer) IndexReader Formatter String fieldName SimpleHTMLFormatter * String preTag String postTag Float maxScore String minForegroundcolor String maxForegroundcolor String minBackgroundcolor String maxBackgroundcolor Analyzer String fieldName String text int maxNumFragments GradientFormatter SpanGradientFormatter * String[] bestFragments Fragmenter SimpleFragmenter * int fragmentSize

org.apache.lucene.search.spell.SpellChecker
N-gram index SpellChecker * indexDictionary suggestSimilar setAccuracy … PlainTextDictionary * Directory (spellIndex) File InputStream Reader Dictionary IndexReader LuceneDictionary * String field boolean morePopular String word int numSug String[] words float minScore

Lucene/SOLR 2: Lucene search API

Similar presentations

Presentation on theme: "Lucene/SOLR 2: Lucene search API"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lucene/SOLR 2: Lucene search API

Similar presentations

Presentation on theme: "Lucene/SOLR 2: Lucene search API"— Presentation transcript:

Similar presentations

About project

Feedback