Lucene : Text Search IG5 – TILE Esther Pacitti. Basic Architecture.

Slides:



Advertisements
Similar presentations
Lucene in action Information Retrieval A.A – P. Ferragina, U. Scaiella – – Dipartimento di Informatica – Università di Pisa –
Advertisements

Assignment 2: Full text search with Lucene Mathias Mosolf, Alexander Frenzel.
Lucene/SOLR 2: Lucene search API
Lucene Tutorial Based on Lucene in Action Michael McCandless, Erik Hatcher, Otis Gospodnetic.
Introduction to Information Retrieval Introduction to Information Retrieval Lucene Tutorial Chris Manning and Pandu Nayak.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Advanced Indexing Techniques with Apache Lucene - Payloads Advanced Indexing Techniques with Michael Busch
Advanced Indexing Techniques with
Apache Solr Yonik Seeley 29 June 2006 Dublin, Ireland.
The Lucene Search Engine Kira Radinsky Modified by Amit Gross to Lucene 4 Based on the material from: Thomas Paul and Steven J. Owens.
Lucene in action Information Retrieval A.A – P. Ferragina, U. Scaiella – – Dipartimento di Informatica – Università di Pisa –
Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.
The Symbol Table Lecture 13 Wed, Feb 23, The Symbol Table When identifiers are found, they will be entered into a symbol table, which will hold.
The Lucene Search Engine Kira Radinsky Based on the material from: Thomas Paul and Steven J. Owens.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Programming with Collections Collections in Java Using Arrays Week 9.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Introduction to Lucene Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
Introduction to Information Retrieval Introduction to Information Retrieval Lucene Tutorial Chris Manning, Pandu Nayak, and Prabhakar Raghavan.
Full-Text Search with Lucene Yonik Seeley 02 May 2007 Amsterdam, Netherlands.
Full-Text Search with Lucene Yonik Seeley 02 May 2007 Amsterdam, Netherlands slides:
1 Introduction to Lucene Rong Jin. What is Lucene ?  Lucene is a high performance, scalable Information Retrieval (IR) library Free, open-source project.
Apache Lucene in LexGrid. Lucene Overview High-performance, full-featured text search engine library. Written entirely in Java. An open source project.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock
Softvérové knižnice a systémy Vyhľadávanie informácií Michal Laclavík.
1 Lucene Jianguo Lu School of Computer Science University of Windsor.
Lucene Open Source Search Engine. Lucene - Overview Complete search engine in a Java library Stand-alone only, no server – But can use SOLR Handles indexing.
Vyhľadávanie informácií Softvérové knižnice a systémy Vyhľadávanie informácií Michal Laclavík.
Patient Empowerment for Chronic Diseases System Sifat Islam Graduate Student, Center for Systems Integration, FAU, Copyright © 2011 Center.
Lucene Part2. Lucene Jarkarta Lucene ( is a high- performance, full-featured, java, open-source, text search engine.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Lucene Part1 ‏. Lucene Use Case Store data in a 2 dimensional way How do we do this. Spreadsheet Relational Database X/Y.
Sébastien François, EPrints Lead Developer EPrints Developer Powwow, ULCC.
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
NoteSearch - Find what you’re looking for. Prototype Team B.
Indexing UMLS concepts with Apache Lucene Julien Thibault University of Utah Department of Biomedical Informatics.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Search Engine Architecture
Lucene-Demo Brian Nisonger. Intro No details about Implementation/Theory No details about Implementation/Theory See Treehouse Wiki- Lucene for additional.
“ Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and.NET ”
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
Search Tools and Search Engines Searching for Information and common found internet file types.
Design a full-text search engine for a website based on Lucene
What Does the User Really Want ? Relevance, Precision and Recall.
Lucene Jianguo Lu.
©2003 Paula Matuszek GOOGLE API l Search requests: submit a query string and a set of parameters to the Google Web APIs service and receive in return a.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Lucene Tutorial Chris Manning and Pandu Nayak
Jianguo Lu School of Computer Science University of Windsor
CS276 Lucene Section.
Searching and Indexing
Search Engine Architecture
Building Search Systems for Digital Library Collections
Lucene in action Information Retrieval A.A
Data Mining Chapter 6 Search Engines
International Marketing and Output Database Conference 2005
Introduction to Information Retrieval
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Consider Write a program that prompts a user to enter the number of students and then, their names and grades. The program will then outputs the average.
Information Retrieval and Web Design
Why We Need Car Parking Systems - Wohr Parking Systems
Types of Stack Parking Systems Offered by Wohr Parking Systems
Add Title.
Presentation transcript:

Lucene : Text Search IG5 – TILE Esther Pacitti

Basic Architecture

Indexing The documents are the textual objects you want to index. A Document is a set of fields Each field has a key and a value (e.g. Title = « Luncene… ») The IndexWriter is the java object that will process the documents to add them in the LuceneIndex

Indexing Document doc = new Document(); Field f = new Field("title","le titre du document"); doc.add(f); iwriter.addDocument(doc);

Indexing Indexing (full code) Directory directory = new RAMDirectory();//LuceneIndex : in memory; Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_43);//How to process string; IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_43, analyzer); IndexWriter iwriter = new IndexWriter(directory, config); Document doc = new Document(); Field f = new Field(“title”,”le titre du document”); doc.add(f); iwriter.addDocument(doc);

Searching Users can submit keyword queries The IndexSearcher will search the most relevant items noted hits with respect to the Query that are indexed in LuceneIndex Directory Each hit in Hits is associated to a score representing its relevance with respect to Query Hits are sorted in decreasing order of score

Searching Query q = parser.parse(“my query”); ScoreDoc[] hits = isearcher.search(q, null, 10).scoreDocs; //10 is maximum number of documents the system will return for (int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); System.out.println( hitDoc.get(“title") +” with a score of “+hits[i].score); }

Searching (full code) IndexSearcher isearcher = new IndexSearcher(DirectoryReader.open(directory)); QueryParser parser = new QueryParser(Version.LUCENE_43, "title", analyzer);//On indique sur quel(s) champ(s) des documents la recherche doit être faite Query q = parser.parse(“my query”); ScoreDoc[] hits = isearcher.search(q, null, 10).scoreDocs; for (int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); System.out.println( hitDoc.get(“title") +” with a score of “+hits[i].score); }