1 Using the Lucene Search Engine. 2 Team Phil Corcoran Project Leader 10 Years Software Telecoms, Finance, Manufacturing Reqs, Design, Test Derek O’ Keeffe.

Slides:



Advertisements
Similar presentations
Dynamic Demand Inventory Control System By Supamas Viriyanusorn Jitrayut Junnapart.
Advertisements

Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Chapter 5: Introduction to Information Retrieval
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
For ITCS 6265 Professor: Wensheng Wu Present by TA: Xu Fei.
James Martin CpE 691, Spring 2010 February 11, 2010.
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Compass Semantic search
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Searching with Lucene Chapter 2. For discussion Information retrieval What is Lucene? Code for indexer using Lucene Pagerank algorithm.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Databases & Data Warehouses Chapter 3 Database Processing.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
SciFinder Web Version Pootorn R. Book Promotion & Service Co.,Ltd. Thailand.
QWise software engineering – refactored! Testing, testing A first-look at the new testing capabilities in Visual Studio 2010 Mathias Olausson.
Configuration Management and Server Administration Mohan Bang Endeca Server.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Page 1 CSISS Center for Spatial Information Science and Systems Design and Implementation of CWIC Metrics Weiguo Han, Liping Di, Yuanzheng Shao, Lingjun.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.
PaSaMuF v.1.0 file sharing system “There is a word in search of a meaning and a product in search of a name” PaSaMuF Publish and Share and Manage ur Files.
NoteSearch - Find what you’re looking for. Prototype Team B.
RefWorks Your Personal Online Database And Bibliography Creator.
Towards an Experience Management System at Fraunhofer Center for Experimental Software Engineering Maryland (FC-MD)
Restricted Search Engine Laurent Balat Christophe Decis Thomas Forey Sebastien Leclercq ESSI2 Project Supervisor: Johny BOND June 2002.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator.
By: Namrata Lele Mentors: Dave Vieglais Bruce Wilson 1 VDC/TWG Meeting August 09.
Database Concepts Track 3: Managing Information using Database.
Design a full-text search engine for a website based on Lucene
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Microsoft ® Official Course Implementing Enterprise Content Management Microsoft SharePoint 2013 SharePoint Practice.
Lucene Jianguo Lu.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Slide 1 © 2016, Lera Technologies. All Rights Reserved. SAP BO vs SPLUNK vs OBIEE By Lera Technologies.
Visit the ccScan Website Scan, Import, and Automatically File documents to the Cloud SCAN, IMPORT, AND AUTOMATICALLY FILE DOCUMENTS TO SKYDRIVE Introduction.
Our Goal Provide our medical clients the ability to:  Organize  Navigate  Manage their digital documents All through a simplified, innovative and affordable.
Welcome to Greentree in CAM. Purpose Awareness 1.What is Greentree 2.Why do we use it in CAM 3.Access to Greentree 4.Logon to Greentree 1.Full Login 2.eService.
Data mining in web applications
HedEx Lite Obtaining and Using Huawei Documentation Easily
Create Online Surveys for Free by Using Google Documents
Architecture Review 10/11/2004
The STEM Academy Data Solution
IST 516 Fall 2010 Dongwon Lee, Ph.D. Wonhong Nam, Ph.D.
Searching and Indexing
Unit – 5 JAVA Web Services
Web Engineering.
Building Search Systems for Digital Library Collections
Implementation Issues & IR Systems
Microsoft Access 2003 Illustrated Complete
PRG 421 MART Knowledge is divine-- prg421mart.com.
PRG 421 MART Lessons in Excellence-- prg421mart.com.
PRG 421 GUIDE Lessons in Excellence -- prg421guide.com.
PRG 421 GUIDE Education for Service-- prg421guide.com.
5.8 Presentation.
Database & Record Structure
Communication and Information Resource Centre Administrator
Data Mining Chapter 6 Search Engines
Procedural Information Extraction from Text:
Working With Databases
The Search Engine Architecture
The Social Life of Information
Academic & More Group 4 谢知晖 王逸雄 郭嘉宋 程若愚.
Personal Data Usage Monitor
Web Application Development Using PHP
APE EAD3 introduction - DARIAH - Brussels
Presentation transcript:

1 Using the Lucene Search Engine

2 Team Phil Corcoran Project Leader 10 Years Software Telecoms, Finance, Manufacturing Reqs, Design, Test Derek O’ Keeffe Senior Developer 10 Years Software Java, ERP Joomla, Knowledge Tree Design, Code

3 Concepts

4 Lucene Full Text Search Cross Platform Lucene Document Inverted Index

5 Lucene Inserts index records Searches index Gets Lucene documents You Manage the indexing Select data files Parse files Manage querying Accept user’s query Display results to user Retrieve original documents

6 iViewXT

7 Search Improvements

8 Test Document Collections 1. UAT 2.

9 Super Mario

10 Implementation Derek

11 Performance

12 Lucene Implementation

13 Lucene Implementation: Indexing

14 Lucene Implementation: Indexing

15 Lucene Implementation: Indexing

16 Lucene Indexing

17 Lucene Indexing Step 1 of 5

18 Lucene Indexing Step 2 of 5

19 Lucene Indexing Step 3 of 5

20 Lucene Indexing Step 4 of 5

21 Lucene Indexing Step 5 of 5

22 Lucene Indexing

23 Text Extraction  Lucene not a complete application.  PDF files text extraction  Microsoft files text extraction

24 Lucene Implementation

25 Lucene Implementation

26 Searching:

27 Searching: Step 1 of 6

28 Searching: Step 2 of 6

29 Searching: Step 3 of 6

30 Searching: Step 4&5 of 6

31 Searching: Step 6 of 6

32 Searching:

33 Luke - Lucene Index Toolbox  Client application to link directly into your index.  Java-webstart app  Handy for testing searches and performance.

34 Some problems encountered  Max clause count exception: Take care automatically adding wildcards!!  Performance: Do the work while indexing, not while searching. Pagination: Get one page at a time from the Hits.  Our security model Stored collection of allowed containers in UserSession.  Visibility of indexing job. Added logging “Indexing document 426 of 204,532”

35 Resources (general)‏ An open source document management system in php with a java lucene search engine Handy ajax autocomplete component.

36 Resources (text extraction)‏ Text extractor for pdf files JXL Text extractor for excel files. Text extractor for word documents. API to access Microsoft format files. (xls/doc/ppt). I would recommend this one over jxl or text-mining above.

37 Summary Lucene querying is fast (take care what you do with the results) Indexing is slow (Make indexing job visible) Use Luke Add lots to the index (Do the work while indexing)

38 END