1 Using the Lucene Search Engine. 2 Team Phil Corcoran Project Leader 10 Years Software Telecoms, Finance, Manufacturing Reqs, Design, Test Derek O’ Keeffe.

Slides:

Advertisements

Similar presentations

Dynamic Demand Inventory Control System By Supamas Viriyanusorn Jitrayut Junnapart.

Advertisements

Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.

Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,

Chapter 5: Introduction to Information Retrieval

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert

For ITCS 6265 Professor: Wensheng Wu Present by TA: Xu Fei.

James Martin CpE 691, Spring 2010 February 11, 2010.

1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.

Compass Semantic search

Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.

Searching with Lucene Chapter 2. For discussion Information retrieval What is Lucene? Code for indexer using Lucene Pagerank algorithm.

Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.

Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.

Databases & Data Warehouses Chapter 3 Database Processing.

Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.

SciFinder Web Version Pootorn R. Book Promotion & Service Co.,Ltd. Thailand.

QWise software engineering – refactored! Testing, testing A first-look at the new testing capabilities in Visual Studio 2010 Mathias Olausson.

Configuration Management and Server Administration Mohan Bang Endeca Server.

OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.

Page 1 CSISS Center for Spatial Information Science and Systems Design and Implementation of CWIC Metrics Weiguo Han, Liping Di, Yuanzheng Shao, Lingjun.

University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.

Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.

Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

PaSaMuF v.1.0 file sharing system “There is a word in search of a meaning and a product in search of a name” PaSaMuF Publish and Share and Manage ur Files.

NoteSearch - Find what you’re looking for. Prototype Team B.

RefWorks Your Personal Online Database And Bibliography Creator.

Towards an Experience Management System at Fraunhofer Center for Experimental Software Engineering Maryland (FC-MD)

Restricted Search Engine Laurent Balat Christophe Decis Thomas Forey Sebastien Leclercq ESSI2 Project Supervisor: Johny BOND June 2002.

IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.

Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator.

By: Namrata Lele Mentors: Dave Vieglais Bruce Wilson 1 VDC/TWG Meeting August 09.

Database Concepts Track 3: Managing Information using Database.

Design a full-text search engine for a website based on Lucene

Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)

Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.

Microsoft ® Official Course Implementing Enterprise Content Management Microsoft SharePoint 2013 SharePoint Practice.

Lucene Jianguo Lu.

1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.

Slide 1 © 2016, Lera Technologies. All Rights Reserved. SAP BO vs SPLUNK vs OBIEE By Lera Technologies.

Visit the ccScan Website Scan, Import, and Automatically File documents to the Cloud SCAN, IMPORT, AND AUTOMATICALLY FILE DOCUMENTS TO SKYDRIVE Introduction.

Our Goal Provide our medical clients the ability to:  Organize  Navigate  Manage their digital documents All through a simplified, innovative and affordable.

Welcome to Greentree in CAM. Purpose Awareness 1.What is Greentree 2.Why do we use it in CAM 3.Access to Greentree 4.Logon to Greentree 1.Full Login 2.eService.

Data mining in web applications

HedEx Lite Obtaining and Using Huawei Documentation Easily

Create Online Surveys for Free by Using Google Documents

Architecture Review 10/11/2004

The STEM Academy Data Solution

IST 516 Fall 2010 Dongwon Lee, Ph.D. Wonhong Nam, Ph.D.

Searching and Indexing

Unit – 5 JAVA Web Services

Web Engineering.

Building Search Systems for Digital Library Collections

Implementation Issues & IR Systems

Microsoft Access 2003 Illustrated Complete

PRG 421 MART Knowledge is divine-- prg421mart.com.

PRG 421 MART Lessons in Excellence-- prg421mart.com.

PRG 421 GUIDE Lessons in Excellence -- prg421guide.com.

PRG 421 GUIDE Education for Service-- prg421guide.com.

5.8 Presentation.

Database & Record Structure

Communication and Information Resource Centre Administrator

Data Mining Chapter 6 Search Engines

Procedural Information Extraction from Text:

Working With Databases

The Search Engine Architecture

The Social Life of Information

Academic & More Group 4 谢知晖王逸雄郭嘉宋程若愚.

Personal Data Usage Monitor

Web Application Development Using PHP

APE EAD3 introduction - DARIAH - Brussels

Presentation transcript:

1 Using the Lucene Search Engine

2 Team Phil Corcoran Project Leader 10 Years Software Telecoms, Finance, Manufacturing Reqs, Design, Test Derek O’ Keeffe Senior Developer 10 Years Software Java, ERP Joomla, Knowledge Tree Design, Code

3 Concepts

4 Lucene Full Text Search Cross Platform Lucene Document Inverted Index

5 Lucene Inserts index records Searches index Gets Lucene documents You Manage the indexing Select data files Parse files Manage querying Accept user’s query Display results to user Retrieve original documents

6 iViewXT

7 Search Improvements

8 Test Document Collections 1. UAT 2.

9 Super Mario

10 Implementation Derek

11 Performance

12 Lucene Implementation

13 Lucene Implementation: Indexing

14 Lucene Implementation: Indexing

15 Lucene Implementation: Indexing

16 Lucene Indexing

17 Lucene Indexing Step 1 of 5

18 Lucene Indexing Step 2 of 5

19 Lucene Indexing Step 3 of 5

20 Lucene Indexing Step 4 of 5

21 Lucene Indexing Step 5 of 5

22 Lucene Indexing

23 Text Extraction  Lucene not a complete application.  PDF files text extraction  Microsoft files text extraction

24 Lucene Implementation

25 Lucene Implementation

26 Searching:

27 Searching: Step 1 of 6

28 Searching: Step 2 of 6

29 Searching: Step 3 of 6

30 Searching: Step 4&5 of 6

31 Searching: Step 6 of 6

32 Searching:

33 Luke - Lucene Index Toolbox  Client application to link directly into your index.  Java-webstart app  Handy for testing searches and performance.

34 Some problems encountered  Max clause count exception: Take care automatically adding wildcards!!  Performance: Do the work while indexing, not while searching. Pagination: Get one page at a time from the Hits.  Our security model Stored collection of allowed containers in UserSession.  Visibility of indexing job. Added logging “Indexing document 426 of 204,532”

35 Resources (general)‏ An open source document management system in php with a java lucene search engine Handy ajax autocomplete component.

36 Resources (text extraction)‏ Text extractor for pdf files JXL Text extractor for excel files. Text extractor for word documents. API to access Microsoft format files. (xls/doc/ppt). I would recommend this one over jxl or text-mining above.

37 Summary Lucene querying is fast (take care what you do with the results) Indexing is slow (Make indexing job visible) Use Luke Add lots to the index (Do the work while indexing)

38 END