By: Namrata Lele Mentors: Dave Vieglais Bruce Wilson 1 VDC/TWG Meeting August 09.

Slides:



Advertisements
Similar presentations
Dublin Core for Digital Video: Overview of the ViDe Application Profile.
Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
AskMe A Web-Based FAQ Management Tool Alex Albu. Background Fast responses to customer inquiries – key factor in customer satisfaction Costs for customer.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
PaperScope: Visually Exploring the ADS Mark Holliman VOTECH Web Developer University of Edinburgh ADASS XVII, London,
Overview of Search Engines
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Postgraduate (Research) - Databases
GOAT SEARCH Revorg GOAT Search Solution (Powered by Lucene)
PORTAL DEVELOPMENT ARTEM VORONTSOV. DISTINGUISHING FEATURES Distributed data providers with different archival legal system Distributed development teams.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Terrier: TERabyte RetRIevER An Introduction By: Kavita Ganesan (Last Updated April 21 st 2009)
Wien, January Infrastructure for Spatial Information in the European Community The INSPIRE Community Geoportal EC INSPIRE GEOPORTAL TEAM European.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Managed by UT-Battelle for the Department of Energy Mercury – Distributed Metadata Tool for Finding and Retrieving CDIAC Data CDIAC UWG Meeting September.
Lucene. Lucene A open source set of Java Classses ◦ Search Engine/Document Classifier/Indexer 
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
Data Management: Documentation & Metadata Metadata (Structured Documentation)
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Information Retrieval
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Lucene Jianguo Lu.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Jason W. Karl, Ph.D. Jeffrey K. Gillan Jason W. Karl, Ph.D. Jeffrey K. Gillan 23 October 2013 Ty Montgomery Richard Bliss Ty Montgomery Richard Bliss
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
1 Using the Lucene Search Engine. 2 Team Phil Corcoran Project Leader 10 Years Software Telecoms, Finance, Manufacturing Reqs, Design, Test Derek O’ Keeffe.
High performance, full-featured text search engine written in Java. Technology suitable for nearly any application requiring full-text search, especially.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Software Documentation
Building Search Systems for Digital Library Collections
Retrieval Utilities Relevance feedback Clustering
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

By: Namrata Lele Mentors: Dave Vieglais Bruce Wilson 1 VDC/TWG Meeting August 09

 Problem Overview  Implementation Details  Deliverables  Results 2 VDC/TWG Meeting August 09

3 Document ADocument B Parser to parse document format and extract terms Measure Semantic Relatedness of terms from documents A and B VDC/TWG Meeting August 09

 Implemented parsers to parse the following metadata documents :  DublinCore  DarwinCore  EML  Deliverables :  DublinCore Parser DublinCore Parser  DarwinCore Parser DarwinCore Parser  EML Parser EML Parser 4 VDC/TWG Meeting August 09

 Measure Semantic Relatedness of terms using the following libraries :  Lucene : It is a full-featured text search engine written entirely in Java, and it is an open source project available for free download Lucene  GTM (General Text Matcher) :GTM measures the similarity between texts.GTM is written in Java, and is open source, released under the BSD license. GTM (General Text Matcher) 5 VDC/TWG Meeting August 09

 Lucene scoring uses a combination of the Vector Space Model (VSM) of Information Retrieval and the Boolean model to determine how relevant a given Document is to a User's queryVector Space Model (VSM) of Information RetrievalBoolean model  Idea behind the VSM is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query 6 VDC/TWG Meeting August 09

7 Parsed input document B 1) Read term- description from document A Xml file 3) Search term and description in lucene index for document B Get matching terms (lucene documents) Output Lucene Index Builder (Stemming and stop-word filtering) Expand description for synonyms using Wordnet index Lucene index Parsed input document A Vocabulary Term Mapper Xml file 2) Stem, stop- word filter and expand description (query) for synonyms using Wordnet index Create xml output file VDC/TWG Meeting August 09

 Lucene Index Builder Lucene Index Builder  Lucene Vocabulary Term Mapper Lucene Vocabulary Term Mapper  Wordnet Index Builder Wordnet Index Builder 8 VDC/TWG Meeting August 09

 Original Description : the full, unabbreviated name of the country  Stop word filtered,stemmed and synonym expanded description : the full entire fully good total undivided wax wide unabbrevi name advert appoint call cite constitute describe diagnose discover distinguish epithet figure gens identify key list make mention nominate refer of countri 9 VDC/TWG Meeting August 09

10 Parsed input document B Read term- description from document A (Stem and stop- word filter) Get similarity score using GTM Parsed input document A Vocabulary Term Mapper Xml file Maintain top 5 scores for every term-description in DocumentA Xml file VDC/TWG Meeting August 09 Read term- description from document B (Stem and stop- word filter) Output Score, Term, Desc, Xpath to xml file

 Modified GTM Library Modified GTM Library  GTM Vocabulary Term Mapper GTM Vocabulary Term Mapper 11 VDC/TWG Meeting August 09

 The results obtained from various mappings are as follows :  DublinCore – DarwinCore DublinCore – DarwinCore  DarwinCore – DublinCore DarwinCore – DublinCore  EML – DarwinCore EML – DarwinCore  DarwinCore – EML DarwinCore – EML  EML – DublinCore EML – DublinCore  DublinCore – EML DublinCore – EML 12 VDC/TWG Meeting August 09

 Following is a list of resulting terms obtained from Lucene Vocabulary Term Mapper which matches with the existing mapping between DublinCore and EMLexisting  Title – Title  Creator – Creator  Publisher – Publisher  Format – Physical  Coverage- Coverage  Rights – Intellectual Rights 13 VDC/TWG Meeting August 09

 The results obtained from various mappings are as follows :  DublinCore – DarwinCore DublinCore – DarwinCore  DarwinCore – DublinCore DarwinCore – DublinCore 14 VDC/TWG Meeting August 09

15 VDC/TWG Meeting August 09

 Fix a bug in EML parser.  Provide two versions of EML parser: One that has description for all terms in the hierarchy and one that has description for only the current term. 16 VDC/TWG Meeting August 09

 Got a chance to learn new libraries (Lucene and GTM)  Learnt new concepts about semantic similarity  Honed my XML skills  Enjoyed working with this team 17

 General Text Matcher  ml ml  ache/lucene/wordnet/package-summary.html ache/lucene/wordnet/package-summary.html 18 VDC/TWG Meeting August 09

VDC/CCIT Meeting June 09 19