Information Retrieval – and projects we have done. Group Members: Aditya Tiwari (08005036) Harshit Mittal (08005032) Rohit Kumar Saraf (08005040) Vinay.

Slides:



Advertisements
Similar presentations
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Advertisements

For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Presented by Zeehasham Rasheed
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval in Practice
MIND MAPPING.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Metadata Understanding the Value and Importance of Proper Data Documentation Exercise 2 Reading a Metadata File Exercise 3 Using the Workbook Exercise.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Multimedia Databases (MMDB)
VCE Learning. To unpack the challenge of enhancing the quality of VCE learning What does the student need to know about how to interpret the task ? Ho.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
LOGO Searching the Web CHAPTER 2 Eastern Mediterranean University School of Computing and Technology Department of Information Technology ITEC229 Client-Side.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Trevor Crum 04/23/2014 *Slides modified from Shamil Mustafayev’s 2013 presentation * 1.
Mind-Map  A Mind-Map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key idea.  Mind-maps.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Research Skills. Electronic Sources of Information Search Engines Search Engines Databases Databases Communication Communication Tools Tools.
ExPLORE – Information Literacy Model Ali Mundie School Librarian Woodfarm HS April 2013.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
ExPLORE Information Plan Locate Organise Represent Evaluate EX.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Shamil Mustafayev 04/16/
Mind Mapping Prepared by : Iyas A. Fares. Topic Outlines.
Research Skills for Your Essay Where to begin…. Starting the search task for real Finding and selecting the best resources are the key to any project.
English for EAP Practice activities Lesson 2 Reading more efficiently Three types of reading English for Academic Purposes Practice activities Reading.
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Information Retrieval in Practice
Information Sources for Academic Work: Beyond Google and Wikipedia
Visual Information Retrieval
Information Retrieval and Web Search
Search Engine Architecture
Summary: More than Words Week#3: Ujarrás
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Search Engine Architecture
Visual recall of class information
Information Retrieval and Web Search
Introduction to Search Engines
Presentation transcript:

Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay Surana ( ) Guided by Prof. Pushpak Bhattacharyya

Motivation Web, documents and encyclopedia all have tremendous amount of data and information in them. The information thus available serves only the intent of the creator or collector of data. However, there can be other uses of that data/information as well. The need is to mine the right information from the data and use it appropriately.

Information Retrieval

Applications Web search – Google, Yahoo Querying/QA system like Watson (developed by IBM). Spam filtering Automatic Summarization Cross-lingual retrieval en.wikipedia.org/wiki/Information_retrieval_applications

Information Retrieval IR is the study of concerned with searching for documents, and for metadata about documents, as well as that of searching relational databases and the WWW. The data objects that are collected can be images, documents, videos, mind maps, music en.wikipedia.org/wiki/Information_Retrieval

Wiki Mind Mapping Harshit Mittal (IIT-B) Aditya Tiwari (IIT-B) Akhil Bhiwal (VIT University) 6

Project Idea Represent the textual information in graphical form which is easier to understand and more intuitive to read. The visual representation should be able to summarize the text. 7

Research Goal Use of phrases to represent semantic information. Hierarchical representation of information of a given text 8

Mind maps A mind map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea. Example Mind map in the next slide. 9

Mind map 10

What’s the difficult part? We can’t represent information from any article in mind-map as it is. That would make it incoherent and clumsy. Phrase extraction General rules of grammar don’t apply here. 11

Possible Solution Develop new linguistic rules for representation of text in visual form. Use existing summarization tools to generate summary and try to represent that in mind-map. 12

How we did it. Pulling out the article section wise from the Wikipedia page. Parsing each section sentence wise using the Stanford parser. Extracting “relevant” phrases using Tregex (another Stanford tool). Putting these phrases into a mind map, section wise. 13

Extraction of relevant information Identifying subtrees from the parse tree of a sentence that are important. This was done using a few heuristics like: ◦ Presence of a superlative adjective in a noun phrase 14

Extraction of relevant information Presence of a cardinal number in a noun phrase 15

Extraction of relevant information Matching of a particular verb to the bag of verbs that were considered relevant for a particular article. For example : for the history section, verbs like find, discover, settle, decline were considered “more useful”, as compared to words like derive, deduce etc. which were considered useful for some other section. 16

Ex : The name India is derived from Indus. 17 Extraction of relevant information

18 Code Generated Mind Map

Evaluation 19

Evaluation Survey based: Asking a person to generate 10 questions from given article. Asking another person to answer those question with the help of mind-map. Repeating the same exercise in reverse manner for another article. 20

Observations Pros: ◦ Extraction of right information with high accuracy. ◦ Concept of phrase extraction works well. ◦ High precision value were obtained (between ). 21

Observations Cons ◦ Information presented in mindmap of low depth is clumsy. ◦ Low recall value (0.2 – 0.4) ◦ Linking of node phrases with their apt description. ◦ Heuristics defining “important phrases” need to be refined. 22

Limitations Bag of words and Tregex expressions is hand-coded instead of machine learned. Garbage phrases are being generated. Level of hierarchy is limited to 3. 23

Future work Using machine learning to determine the important keywords for a given sentence. We want to explore the possibility of finding patterns in subtree expressions using machine learned approach. Refinement of generated phrases. 24

References Tool : Stanford Parser and Stanford Tregex Match

Vision Based Attribute Segmentation from lists in Web Pages 26 - by Rohit Kumar Saraf