Download presentation
Presentation is loading. Please wait.
Published byRoy Johns Modified over 9 years ago
1
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010
2
Outline History What’s Lucene What’s Solr Getting Starting with Solr (Indexing, updating, deleting) Querying Data Other features of Solr IR Concepts and Solr Light demo of Solr Questions
3
History Search for a replacement search platform commercial: high license fees open-source: no full solutions CNET grants code to Apache, Solr enters Incubator 17 Jan 2006 Solr is a Lucene sub-project
4
What is Lucene? Solr uses the Lucene Search library and extends it. Open source, high-performance text search engine library. Lucene is not a server and not a web crawler either. Uses scoring algorithms based on Information Retrieval principles. Uses rich set of text analyzers and query syntax with a parser.
5
Lucene’s index (conceptual) Index Document Field NameValue Figure 1: Lucene index (Kataria S., Khabsa M.,Document Indexing and Scoring algorithm, 2010)
6
What is Solr Solr is an open source enterprise search platform. Used by ITunes,CNET, Zappos, Netflix as well as intranet sites. Written in Java. XML/HTTP interface. Schema to define types and fields. Web administration interface. DB Solr Web Data Figure 2: Common Solr Usage Data
7
Major Features of Solr Powerful full-text search Hit highlighting Faceted search Dynamic clustering Database integration
8
Architecture of Solr Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Figure 3: Architecture of Solr (Seeley Y., Apache Solr, 2006)
9
Solr Documents Solr accepts well formatted XML documents www.cnn.com CNN Breaking News – Obama wins Barack Obama is the 44 th president of the USA 2008-11-06T23:59:59.999Z
10
Getting Started with Solr How to run Solr on the IBM cloud system Log in to the system Using putty and generated private key Go to team1->apache-solr->example Start Solr server Load the http://localhost:8983/solr/admin/ in your web browserhttp://localhost:8983/solr/admin/
11
Indexing Data Solr server is up and running. To index data: Open a new terminal Follow path team1/apache-solr/example/example-docs/ Run "java -jar post.jar" on some of the XML files in that directory
12
Indexing Data Cont’d To index all data: Run “java –jar post.jar *.xml” Indexed all sample files in the example directory
13
Solr Admin page Run http://localhost:8983/solr/admin in your web browserhttp://localhost:8983/solr/admin
14
Updating Data User can edit the existing XML file to change data Run “java -jar post.jar” command
15
Deleting Data Delete operation can be done by: Posting a delete command and specifying the value of a document’s unique key field. java -Ddata=args -Dcommit=no -jar post.jar " SP2514N ” Posting a delete command and a query that matches multiple documents. java -Ddata=args -jar post.jar " name:DDR ” Don’t forget to update data “java -jar post.jar”!!!
16
Querying Data Searches are done with the query string in the q parameter. Example query: q=video Can pass a number of request parameters to control what information is returned. Example: “fl" parameter to control what stored fields are returned Example query: q=video&fl=name,id,score (return estimated relevancy score)
17
Querying Data cont’d Example query : q=video Number of documents found in the collection Different fields from the retrieved document query
18
Querying Data cont’d Example query : q=name:video
19
Querying Data cont’d Example query : q=video&fl=name,id,score
20
Querying Data cont’d Example query : q=video&fl=*,score (return all stored fields, as well as estimated relevancy score) Estimated relevancy score
21
Querying Data cont’d Example query : q=video&sort=price desc&fl=name,id,price
22
Querying Data cont’d Example query : q=video&wt=json Can be python php, ruby, xml
23
Highlighting Example query :...&q=video card&fl=name,id&hl=true&hl.fl =name,features Highlighted fields are listed at the bottom of the page
24
Faceted Search It’s a dynamic clustering of search results into categories Allow users to refine their search result Generates counts for various properties or categories. Also called faceted browsing, faceted navigation The benefits: Superior feedback No surprises or dead ends No selection hierarchy is imposed
25
Faceted Search Example : CNET website
26
Faceted Search Example query:...&q=*:*&facet=true&facet.field=cat Generated counts Refers all documents
27
Faceted Search Example query:...&q=ipod&facet=true&facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] Generated counts
28
Search Relevancy PowerShot SD 500 PowerShotSD500 SD500 Power Shot PowerShot sd500powershot powershot WhitespaceTokenizer WordDelimiterFilter catenateWords=1 LowercaseFilter power-shot sd500 power-shotsd500 sd500powershot sd500powershot WhitespaceTokenizer WordDelimiterFilter catenateWords=0 LowercaseFilter Query Analysis A Match! Document Analysis Figure 4 : Search Relevancy (Seeley Y., Apache Solr, 2006)
29
What we’ve Covered Basic information about Solr Structure of Solr How to run Solr instance Adding, deleting, updating documents Make changes to the index Make a query and run it Use Solr admin interface
30
Other features of Solr Distributed search Numeric field statistic Search result clustering Function queries Boosting More Like This
31
Relation with IR Concepts Tokenization Scoring tf-idf(Lucene Class Similarity) Lucene Practical Scoring: Boosting – documents, queries Wildcard queries (te?t,test*, te*t) Clustering(result clustering via Carrot2) Lucene’s Conjunctive Search Algorithm uses skip pointers
32
Relation with IR Concepts Figure 5 : Chapter 7,Information Storage and Retrieval (Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze) Figure 6 : Chapter 1, Lucene In Action (Otis Gospodnetic and Erik Hatcher)
33
Video file:///C:/Users/Sethi/Documents/Camtasia%20Studi o/Apache-solr-team1/Apache-solr-team1.html file:///C:/Users/Sethi/Documents/Camtasia%20Studi o/Apache-solr-team1/Apache-solr-team1.html
34
Questions Any questions??? Are you ready for exercises???
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.