Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

Slides:



Advertisements
Similar presentations
ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
Advertisements

From Words to Meaning to Insight Julia Cretchley & Mike Neal.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Improved TF-IDF Ranker
Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Machine Learning for Information Extraction Li Xu.
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
Predicting the Semantic Orientation of Adjectives
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen.
Chapter 5: Information Retrieval and Web Search
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
12th of October, 2006KEG seminar1 Combining Ontology Mapping Methods Using Bayesian Networks Ontology Alignment Evaluation Initiative 'Conference'
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
1 Query Operations Relevance Feedback & Query Expansion.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Single Document Key phrase Extraction Using Neighborhood Knowledge.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Exploring and Navigating: Tools for GermaNet
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers

Presentation Layout Introduction to Research Methods used Overview of GermaNet Overview of SPPC Details of their Approach Results Conclusion

Goal Automatic Acquisition of Domain Relevant terms and their relations How? Single-word Terms: TFIDF classification Domain Relevant Relations: Use Lexico-syntactic patters:  Existing Ontologies  Collocation methods  Introduction Methods Used GermaNet SPPC Approach Results Conclusion

Input No seed words No syntactic patterns Just a collection of classified documents  Introduction Methods Used GermaNet SPPC Approach Results Conclusion

Methods Used Builds on Other Systems: GermaNet (They built an Ontology Inference Machine to search GermaNet) For: Accessing Semantic relations SPPC (Shallow Processing Production Center) For: Linguistic Annotation Introduction  Methods Used GermaNet SPPC Approach Results Conclusion

Accessing Semantic Relations GermaNet Developed within the LSD Project at the Division of Computational Linguistics of the Linguistics Department at the University of Tübingen, Germany A lexical-semantic net German nouns, verbs, and adjectives are semantically grouped by an underlying lexical concept (like a thesaurus) – called synsets Synsets are connected by semantic relations Lexical relationships include synonyms, antonyms, and “pertains to” Conceptual relations include hyponyms (‘is-a’), meronyms (‘has-a’), entailment, and cause Based off the technology of WordNet (Princeton) Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

Accessing Semantic Relations WordNet Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

Accessing Semantic Relations WordNet Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

Accessing Semantic Relations WordNet Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

Accessing Semantic Relations Inference Machine Allows GermaNet’s relations to be searched by other applications Provides 3 different functions: Retrieval of relations assigned to words Example : “Find all synonyms for the word bar”  rod, saloon, … Retrieval of relations between words Example : “Find relations between Internet-Service-Provider and Company”  hyponym (so and ISP is a company) Navigation in the GermaNet graph Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

Linguistic Annotation SPPC SPPC (Shallow Processing Production Center) Robust German NLP that uses cascaded optimized weighted finite state devices SPPC parts: Tokenizer Lexical Processor Part-of-Speech Filtering Named-entity Finder Chunk recognizer Introduction Methods Used GermaNet  SPPC Approach Results Conclusion

Their Extraction Engine Three Main components: 1. TFIDF-based single-word term classifier 2. Lexico-syntactic pattern finder 1. Learns patterns based on known relations 2. Learns patterns based on term collocation methods 3. Relation Extractor Introduction Methods Used GermaNet SPPC  Approach Results Conclusion

Their Extraction Engine Introduction Methods Used GermaNet SPPC  Approach Results Conclusion 1. Extract Single- word terms 2. Learn multi-word terms & identify syntactic patterns 3. Learn patterns from known relations 4. Extract related terms using found lexico- syntactic patterns Single-word term extraction (KFIDF)

Discovering Domain Relevant Terms Apply a TFIDF measure: KFIDF Introduction Methods Used GermaNet SPPC  Approach Results Conclusion

Their Extraction Engine Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Collocation learner

Learning Term Collocations Examples : man-eating shark, dead serious, depend on, blue-collard Measures: Mutual Information (probabilities) - Occurrence of one word predicts the occurrence of another - Not practical for sparse data Log-Likelihood Measures (contingency tables) - Tells how much more likely the occurrence of one pair is over the another T-test - Accept or reject the null hypothesis (terms are independent) Introduction Methods Used GermaNet SPPC  Approach Results Conclusion

Their Extraction Engine Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Relation Extractor

Learning Relations with Lexico-syntactic patterns Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Example of a lexico-syntactic pattern finding relations Pattern: “or other” Sentence: Bruises, wounds, or other injuries are common. Hyponym Relations: (Bruises, Injuries), (Wounds, Injuries) Pattern: “as well as” Sentence: Cocaine as well as Hashish, and LSD… Near synonyms? -- Now we can match LSD to Drug domain

Learning Relations with Lexico-syntactic patterns Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Extracted terms GermaNet (semantic relationships) Terms with semantic relations (synonymy, hyponymy, meronymy) Put semantically similar fragments Into Landau-Finkelstien and Morin’s Algorithm to cluster patterns Domain independent patterns Domain specific patterns Term relation extractor applies newly extracted lecixo-syntactic patterns With Near Synonyms – search GermaNet to find common hyponyms, then assign the newly found hyponymous relation to the term not encode in the GermaNet List of related terms with possible hyponymous relations

Results Introduction Methods Used GermaNet SPPC Approach  Results Conclusion There’s a correlation between corpus size and precision LogLike delivers best result compared to Mutual Information And T-Test Noun-Verb collocations were most prominent and had best results In Drug domain, N-V = 56% precision and N-N = 41% precision

KFIDF proves promising for single-word term extraction Statistical measures are suitable for free- word order languages like German Extracting term relations useful for real- world IE Conclusion Introduction Methods Used GermaNet SPPC Approach Results  Conclusion

+ Uses well known existing systems + Seemingly no human interaction + Domain Adaptive (robust) - Precision does not seem to be too impressive, and recall? I’d like to see more results We see from the past few papers that automatic ontology generation approaches consist of: Combining multiple strategies (statistics, existing ontologies) Have a cyclic, machine learning nature. My Evaluation Introduction Methods Used GermaNet SPPC Approach Results  Conclusion