Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Chapter 5: Introduction to Information Retrieval
Improved TF-IDF Ranker
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Towards the Self-Annotating Web Philipp Cimiano, Siegfried Handschuh, Steffen Staab Presenter: Hieu K Le (most of slides come from Philipp Cimiano) CS598CXZ.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Towards large-scale, open-domain and ontology-based named entity classification Philipp Cimiano and Johanna Völker University of Karlsruhe Proceedings.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
BY PHILIPP CIMIANO PRESENTED BY JOSEPH PARK CONCEPT HIERARCHY INDUCTION.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Survey of Semantic Annotation Platforms
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Presented by Tienwei Tsai July, 2005
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Presenter: Shanshan Lu 03/04/2010
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Learning Attributes and Relations
An Empirical Study of Learning to Rank for Entity Search
Video Google: Text Retrieval Approach to Object Matching in Videos
Extracting Semantic Concept Relations
Video Google: Text Retrieval Approach to Object Matching in Videos
Hierarchical, Perceptron-like Learning for OBIE
Information Retrieval and Web Design
Topic: Semantic Text Mining
Presentation transcript:

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Introduction Develop the Semantic Web:  Automated annotation PANKOW (Pattern-based ANotation through Knowledge On the Web):  Identify each instance  Use Google queries to find pages about these instances  Use the pages Google finds to annotate each instance with an appropriate concept

Context ‘Niger’  is a country  is a state  is a river  is a region Statistical distribution of ‘is a’-patterns for Niger

C-PANKOW (context-driven) 1. Instances are extracted out of a web page. 2. For each instance discovered, queries are made to Google, and abstracts of the n first hits are downloaded. 3. Similarity between each abstract and the web page are calculated. Abstracts with a similarity above a threshold, t, are analyzed. 4. The concept label for each instance is updated according to the similarity calculated. 5. Each instance is annotated with the concept that has the largest number, as well as most contextually relevant hits.

Instance Recognition Instances are detected via the following regular expression: INSTANCE := (\w+{DT})? ([a-z]+{JJ})? PRE (MID POST)? PRE := POST := (([A-Z][a-z]*){NNS|NNP|NN|NP|JJ|UH})+ MID := the{DT} | of{IN} | -{-} | ‘{POS} | (de|la|los|las|del){FW} | [a-z]+{NP|NPS|NN|NNS}

Downloading Google Abstracts For each instance, i, seven queries are made to Google: 1. such as i 2. especially i 3. including i 4. i and other 5. i or other 6. the i 7. i is

Similarity Assessment Remove stopwords from documents. Adopt a bag-of-words model to create vectors of word counts. Similarity is the cosine of the angle between two vectors. Only consider abstracts with a similarity over the threshold, t.

Updating Concept Labels Search abstracts for concepts via the following patterns:

Updating Concept Labels Search abstracts for concepts via the following patterns:

C-PANKOW

Run Time and Query Size Complexity Runtime: O(|I|*|P|*n)  |I| is the total number of instances  |P| is the number of patterns  n is the maximum number of pages downloaded  |P| and n are constants Google API allows the retrieval of 10 documents per query.

Evaluation Used pruned version of the tourism ontology in GETESS. Consists of 682 concepts. Two humans annotated 30 texts from destination descriptions.

Instance Detection Precision: P = 43.75% Recall: R = 57.20% F-measure: F = 48.39%

Instance Classification Accuracy: Acc’ Accuracy: Acc” Learning Accuracy: LA Precision: P Recall: R F-measure: F

Threshold Choose threshold of 0.05 Results of varying the threshold (no weighting, n=100)

Similarity Use similarity weighting Impact of using the similarity measure (n=100)

Number of Pages Choose 100 pages Results of varying the number of pages (t=0.05)

A posteriori evaluation 307 news stories from total annotations produced One annotator analyzed the annotations a posteriori  Ranked each annotation from 0 (incorrect) to 3 (totally correct).

A posteriori evaluation Average score: 1.81  P3 score: 54.88%  P2 score: 57.95%  P1 score: 68.66% Average score: 2.1  P3 score: 58.14%  P2 score: 71.1%  P1 score: 76.8% Lonely Planet Dataset

WordNet as Ontology Used the general purpose ontology, WordNet Implemented simple word sense disambiguation algorithm Results using 307 news stories:  P3 score: 27.91%  P2 score: 33.47%  P1 score: 43.43%

Related Work

Conclusions By linguistically analyzing and normalizing pages, recall of pattern matching process is improved. The number of queries to the Google API is reduced (it is now constant for each instance being annotated). More accurate annotations are made because of contextualization.

Future Work Learn to annotate conceptual relations between discovered instances. Learn new patterns indicating a certain relation by a certain rule induction process.