Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Slides:

Advertisements

Similar presentations

University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.

Advertisements

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Semantic Role Labeling Abdul-Lateef Yussiff

47 th Annual Meeting of the Association for Computational Linguistics and 4 th International Joint Conference on Natural Language Processing Of the AFNLP.

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Mining and Summarizing Customer Reviews

Webpage Understanding: an Integrated Approach

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Survey of Semantic Annotation Platforms

A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.

Ling 570 Day 17: Named Entity Recognition Chunking.

Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.

1 Technologies for (semi-) automatic metadata creation Diana Maynard.

Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

A Language Independent Method for Question Classification COLING 2004.

 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.

Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Natural language processing tools Lê Đức Trọng 1.

Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

Algorithmic Detection of Semantic Similarity WWW 2005.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

CREAM: Semantic annotation system May 24, 2013 Hee-gook Jun.

Ontology based Information Extraction

LREC Authors Mithun Balakrishna, Dan Moldovan, Marta Tatu, Marian Olteanu Presented by Chris Irwin Davis Semi-Automatic Domain Ontology Creation.

POS Tagger and Chunker for Tamil

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

PhD Dissertation Defense Scaling Up Machine Learning Algorithms to Handle Big Data BY KHALIFEH ALJADDA ADVISOR: PROFESSOR JOHN A. MILLER DEC-2014 Computer.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.

Statistical techniques for video analysis and searching chapter Anton Korotygin.

Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date:

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Question Classification Ling573 NLP Systems and Applications April 25, 2013.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Sentiment analysis algorithms and applications: A survey

CSCI 5832 Natural Language Processing

Social Knowledge Mining

Machine Learning in Natural Language Processing

Automatic Detection of Causal Relations for Question Answering

Automatic Extraction of Hierarchical Relations from Text

SVM Based Learning System for F-term Patent Classification

Extracting Information from Diverse and Noisy Scanned Document Images

CSCI 5832 Natural Language Processing

Using Uneven Margins SVM and Perceptron for IE

Hierarchical, Perceptron-like Learning for OBIE

By Hossein Hematialam and Wlodek Zadrozny Presented by

Extracting Information from Diverse and Noisy Scanned Document Images

Presentation transcript:

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations from Text

Outlines Introduction Motivation Contribution Experiment and Results Conclusion Discussion points

Introduction What is Information Extraction (IE)? is a process which takes unseen texts as input and produces fixed-format, unambiguous data as output. It involves processing text to identify selected information, such as particular named entity or relations among them from text documents.

Introduction Most researches have focused on use of IE for populating ontologies with concept instances. Examples: Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM Semi- automatic CREAtion of Metadata, Motta, E., VargasVera, M., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi- Automatic and Automatic Support for Semantic Markup, 2002.

Motivation An Ontology-based application can’t be adapted to work with different domains. Some Machine Learning (ML) techniques were used to overcome the problem this problem. ML techniques: Hidden Markov Models (HMM). Conditional Random Fields (CRF). Maximum Entropy Models (MEM). Support Vector Machine (SVM)--- The best

Contribution The paper propose a new technique by applying SVM with new features to discover a relation between entities and then determine the type of that relation. This technique can be applied to any domain. The Information Extraction system that used as a base to the proposed technique was Automatic Content Extraction (ACE).

The Automatic Content Extraction (ACE) Is a relational extraction program that uses Relation Detection and Characterization (RDC) according to a predefined entity type system. ACE2004 introduced a Type and Subtype hierarchy for both entity and relations. Entities are categorized in a two level hierarchy, consisting of 7 types and 44 subtypes.

ACE2004

Why SVM? Even though it is a binary classifier but it can be easily extended to be multi-class classifier by using simple techniques like one-against-all or one-against-one. It is scalable which means it can work with large scale and complex data set. It start with a huge number of features but then it ignores and eliminate unnecessary features.

Features for relation extraction The researchers have used General Architecture for Text Engineering (GATE) for feature extraction. Let’s take this example of a sentence to show different type of features: Atlanta has many cars

Cont.. Word Features: 14 features include: Entity mention (Atlanta,cars) The two heads (two words before entity and two after) Word list between two entities POS Tag Features : part-of-speech tagging Atlanta/NNP has/VBZ many/JJ cars/NNS NNP: proper name JJ: adjective NNS: plural noun

Cont.. Entity Features: ACE2004 classify each entity into it’s proper Type, subtype, and class. Atlanta is GPE Mention Features: includes Mention type (Atlanta  NAM, Cars  NOM) Role information (only for GPE) Overlap Features: concern on the position of entities The number of words separating them. Number of other entity mentions in between. Whether one mention contains the other.

Cont.. Chunk Features: GATE integrate two chunk parsers: Noun phrase chunker (NP)  (Atlanta,Cars). Verb phrase chunker (VP)  (has). Dependency Features: determine the dependency relationships between the words of a sentence. Parse Tree Features: the features on syntactic level are extracted from the parse tree. BuChart parser used in this research. Atlanta

Cont.. Semantic Features from SQLF: Buchart provides semantic analysis to produce SQLF for each phrasal constituent. Semantic features from WordNet: Synset-id list of the two entity mentions. Synset-id of the heads (two words before and words after)

Experiment Results To assess the accuracy of classification these measures are used: Precision Recall F-measure

Data Set

Results on different kernel

Result on different features

Result on different classification levels

Conclusion This research investigated SVM-based classification for relation extraction and explored a diverse set of NLP features. The research introduces some new features including: POS tag, entity subtype, entity mention role..etc The experiments show an important contribute to performance improvements

Any Question?

Discussion points Is this technique convenience to automate ontology building? Are you with or against using huge number of features (in our case 94) to represent a relation? How many people see that this is an applicable and useful technique for relation extraction? Why yes and why No?

Thank You