Ontology based Information Extraction

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

An Ontology Creation Methodology: A Phased Approach
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Ontology-based Information Extraction Hilário Tomaz Alves de Oliveira.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Software Testing and Quality Assurance
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
Presented by Zeehasham Rasheed
BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,
Ontology-based Information Extraction for Business Intelligence
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Survey of Semantic Annotation Platforms
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Extracting metadata for spatially- aware information retrieval on the internet Pual Clough Presented by Ali Khodaei CS 572.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Ontology-Based Information Extraction: Current Approaches.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
Digital libraries and web- based information systems Mohsen Kamyar.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Ontology Based Annotation of Text Segments Presented by Ahmed Rafea Samhaa R. El-Beltagy Maryam Hazman.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Information Extractors Hassan A. Sleiman. Author Cuba Spain Lebanon.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
A Brief Introduction to Distant Supervision
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
University of Computer Studies, Mandalay
Social Knowledge Mining
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015

Outline Definition Common Architectures Information Extraction Methods Ontology Construction/Enhancement Performance Evaluation

Information Extraction The process of obtaining pertinent information (facts) from documents. Examples: The forest area in India extended to about 75 million hectares, which in terms of geographical area is approximately 22 percent of the total land. What’s the relationship between forest area and geographical area?

Ontology Based Information Extraction (OBIE) Terminology Ontology Based Information Extraction(Wimalasuriya and Dou, 2010) Ontology-driven Information Extraction(Yildiz and Miksch, 2007) The same as Ontology Based Information Extraction Whether the ontology part is within the system (Yildiz and Miksch, 2007)

Ontology Based Information Extraction (OBIE) Key Characteristics Process unstructured or semi-structured natural language text Present the output using ontologies Ontology as input(Li and Bontcheva, 2007), released Use an IE process guided by an ontology no new IE method an existing one is oriented to identify the components of an ontology (classes, properties and instances) Extractors belong to an ontology? linguistic rules

Ontology Based Information Extraction (OBIE) Why An ontology helps to clarify a domain’s semantics. E.g., concepts and their relationships To alleviate a wide variety of natural language ambiguities

Ontology Based Information Extraction (OBIE) Applications Business Intelligence (BI) in e-business Social Media—twitter Metadata Generation for digital resources. ……

Common Architectures Major Challenges Information Extraction: Identify instances from the ontology in the text. Classes, Instances, Mentions, Properties, Property Values Free texts in natural language. Example 1: Classical fried egg Mycoplasma-type colonies were not observed on 1% agar medium. Example 2: The cells are not motile, are not lysed in 1% SDS (wt/vol), and stain Gram positively.

Ontology Enhancement / Updating Common Architectures Major Challenges Ontology Enhancement / Updating Upgrade the ontology with new instances to cover the knowledge better in a domain Not in the common architecture.

Common Architectures General Architecture

Define the semantic elements to be extracted Common Architectures First Step Define the semantic elements to be extracted An example (Muller et al., 2004) Concept (C): named entities about every parts of human body such as heart,lung, kidney… Name of Disease (N): words or phrases of disease names. Description (D): any words or phrases that describe Concepts. “Description”refers to any kind of words or phrases that relates semantically to Concepts. Pair of Concept and Description (P): all possible combinations of Concepts and Descriptions. Combinations contain full meaning of relationships between C and D.

Information Extraction Methods Linguistic rules Using regular expressions/patterns (watched|seen) <NP> Part-of-Speech Tag Implemented using finite-state transducers which consist of a series of finite-state automata Automatically generate regular rules: “[Ii]nteract(s|ed|ing)?”“interact,” “interacts,” “interacted,” “interacting,” ”Interact,” “Interacts,” “Interacted,” and “Interacting.” Simple, surprisingly good results

Information Extraction Methods Linguistic rules automatically mine extraction rules from text A dictionary inductive learning algorithm(Vargas-Vera et al., 2001) Finding the longest common subsequence problem (Romano et al., 2006) Relational Learning(Califf and Mooney, 1999), a bottom-up learning

Information Extraction Methods Gazetteer Lists To recognize individual words or phrases widely used in the named-entity recognition E.g., to recognize states of the US or countries of the world Conditions: Specify exactly what is being identified by the gazetteer. Specify where the information for the gazetteer lists was obtained from.

Information Extraction Methods Classification Techniques Linguistic features such as POS tags, capitalization information and individual Part of IE as classification problems: whether a word token is the start/end of an entity (Li et al., 2004) identify different components of an ontology such as instances (Li and Bontcheva, 2007) and property values (Wu and Weld, 2007)

Information Extraction Methods Syntax/Shallow NLP A semantically annotated parse tree for the text as a part of the IE process Linguistic extraction rules with partial parse trees (Todirascu et al., 2002).

Ontology Construction to consider the ontology as an input to the system to construct an ontology as a part of the OBIE process

Ontology Enhancement update the ontology by adding new classes and properties through the IE process. NOT instances and their property values Such systems include the implementations by Maedche et al. (2003) and Dung and Kameyama (2007). Fuzzy Relationship Rule: Define rules according to the relationships among semantic elements. Generate a suggestion list for the domain experts to extract real semantic elements.

Performance Evaluation Measure the accuracy of identifying instances and property values. Most IE systems face a trade-off between improving precision and recall. when β2<1, p should be more important

Performance Evaluation Evaluation in different scales (Maynard et al., 2004) each answer is categorized as correct or incorrect, however, different degrees of correctness should be allowed. Learning Accuracy (LA) : This measures the closeness of the assigned class label to the correct class label based on the hierarchy of the ontology (Cimiano et al., 2005). Multi-dimensional evaluation beyond Precision and Recall

Performance Evaluation Cost-based metrics(Maynard et al., 2004) cost would typically be associated with a miss and a false alarm (spurious answer) augmented precision (AP) augmented recall (AR)

Potentials Automatically processing the information contained in natural language text Creating semantic contents for the Semantic Web automatic metadata generation semantic annotation Improving the quality of ontologies

ACKNOWLEDGEMENT Most of the materials are adapted from: Wimalasuriya, D. C., & Dou, D. (2010). Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science. Other References (part): Muhammad, A., & Dey, L. (2005). Biological Ontology enhancement with Fuzzy Relation: A Text Mining Framework. In International Conference on Web Intelligence WI (Vol. 5).  R. Romano, L. Rokach and O. Maimon, Automatic discovery of regular expression patterns representing negated findings in medical narrative reports. In: Proceedings of the 6th International Workshop on Next Generation Information Technologies and Systems (Springer, Berlin, 2006). Muller, H. M., Kenny, E. E., & Sternberg, P. W. (2004). Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol, 2(11), e309. Dung, T. Q., & Kameyama, W. (2007). Ontology-based information extraction and information retrieval in health care domain. In Data Warehousing and Knowledge Discovery (pp. 323-333). Springer Berlin Heidelberg.

Thank you!