Information Extraction from Biomedical Text Jerry R. Hobbs Artificial Intelligence Center SRI International.

Slides:



Advertisements
Similar presentations
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,
Advertisements

Information Extraction: What It Is How to Do It Where It’s Going Douglas E. Appelt Artificial Intelligence Center SRI International.
Bio-Medical Interaction Extractor Syed Toufeeq Ahmed ASU.
Processing of large document collections Part 8 (Information extraction) Helena Ahonen-Myka Spring 2005.
BioContrasts: Extracting and Exploiting Protein-protein Contrastive Relations from Biomedical Literature Jung-jae Kim 1, Zhuo Zhang 2, Jong C. Park 1 and.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Clauses and Phrases A clause is a group of words that contains both a subject and a verb that complement each other. A phrase is a group of words that.
Information Extraction Junichi Tsujii Graduate School of Science University of Tokyo Japan Ronen Feldman Bar Ilan University Israel.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Computer System Analysis Chapter 10 Structuring System Requirements: Conceptual Data Modeling Dr. Sana’a Wafa Al-Sayegh 1 st quadmaster University of Palestine.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Information Extraction
Knowledge Enabled Information and Services Science GlycO.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Evaluating Statistically Generated Phrases University of Melbourne Department of Computer Science and Software Engineering Raymond Wan and Alistair Moffat.
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Finite State Parsing & Information Extraction CMSC Intro to NLP January 10, 2006.
©2003 Paula Matuszek CSC 9010: Information Extraction Dr. Paula Matuszek (610) Fall, 2003.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Notes on Pinker ch.7 Grammar, parsing, meaning. What is a grammar? A grammar is a code or function that is a database specifying what kind of sounds correspond.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Two Applications of Information Extraction to Biological Science Journal Articles: Enzyme Interactions and Protein Structures Kevin Humphreys, George.
 An entity-relationship diagram (ERD) is a specialized graphic that illustrates the interrelationships between entities in a database.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
What is Object-Oriented?  Organization of software as a collection of discreet objects that incorporate both data structure and behavior.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
Fundamentals, Design, and Implementation, 9/e Appendix B The Semantic Object Model.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.
Clauses and Phrases Quick recap from Day 1. Clauses and phrases Clauses and phrases are groups of words Clauses have a subject and verb.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Words, Phrases, Clauses, & Sentences
PHRASE.
Daily Grammar Practice Week One Grade 8
Conceptual Modeling.
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
DGP Week of March 15, 2010.
Grammar Chapter 2 Nouns.
Social Knowledge Mining
Clustering Algorithms for Noun Phrase Coreference Resolution
Parts of Speech Review Commas
Grundlagen Englisch Verb patterns HFW Bern Philipp Brunner.
Daily Grammar Practice Week One Grade 8
Automatic Detection of Causal Relations for Question Answering
Parts of Speech Review Commas
©2004 Pearson Education, Inc., publishing as Longman Publishers.
Week 13 Warm-Ups English 12 Mrs. Fountain.
Text Mining Application Programming Chapter 3 Explore Text
Artificial Intelligence 2004 Speech & Natural Language Processing
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Information Extraction from Biomedical Text Jerry R. Hobbs Artificial Intelligence Center SRI International

Introduction Information Extraction:  Extract entities, relations, events  Capture structured information  Domain specific  Focus only relevant parts  Mainly on economic and military interest?  Biomedical domain

Cascaded Finite-State Transducers Separate Processing into several stages FASTUS (Finite-State Automaton Text Understanding System) Earlier Stages:  Smaller linguistic objects  Domain independent Later Stages:  Domain dependent patterns

Cascaded Finite-State Transducers Complex Words Basic Phrases Complex phrases Domain Patterns Merging Structures

Example gamma-Glutamyl kinase, the 1 st enzyme of the proline biosynthetic pathway, was puried to a homogeneity from an Escherichia coli strain resistant to the proline analog 3,4-dehydroproline. The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000-dalton subunits.

Target Database Reaction Object:  Attributes ID  Pathway  Enzyme .. Enzyme Object  Attribute ID  Name  Molecular-Weight  Subunit-Component  Subunit-Number

Complex Words gamma-Glutamyl kinase, the 1 st enzyme of the proline biosynthetic pathway, was purified to a homogeneity from an Escherichia coli strain resistant to the proline analog 3,4- dehydroproline. The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000-dalton subunits. gamma-Glutamyl kinase, the 1 st enzyme of the proline biosynthetic pathway, was purified to a homogeneity from an Escherichia coli strain resistant to the proline analog 3,4- dehydroproline. The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000-dalton subunits. Recognizes multiword fixed phrases proper names Rich in the biological domain Use lexicon or ML and Statistic methods

Basic Phrases Segment a sentence into noun groups, verb groups, and particles Use Sager 1981 grammar

Complex Phrases Appositives with their Head none groups “of” prepositional phrases to Their head noun groups

Complex Phrases Structures of basic and complex phrases, entities and events

Clause-Level Domain Patterns The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000- dalton subunits.

Clause-Level Domain Patterns The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000- dalton subunits.

Merging Structures First 4 levels: processes within single sentence This level: collect and combine information for on entity or relationship Three Criteria:  The internal structure of noun groups  The nearness along some metric  Consistency and compatibility of the 2 structures

Compile – Time Transformations Subject-Verb-Object pattern  linguistic patterns (passive, relative clauses, etc)

Types of Specialized Domains “noun-driven” approach  The type of an entity is highly predictive of its role in event  Loose S-V-O patterns “verb-driven” approach  The role of the entities in events cannot be predicted from their type  Tight S-V-O patterns

Limitation of IE Technology MUC (1990):  Name recognition: ~95% recall and precision  Event recognition: ~60% recall and precision Possible reasons:  Process of merging  Only works with explicit information  Common cases are covered, how about those rare cases?