6/27/031 Integrating Syntactic and Semantic Annotation of Biomedical Text Seth Kulick, Mark Liberman, Martha Palmer and Andrew Schein The University of.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University.
Bio-Medical Interaction Extractor Syed Toufeeq Ahmed ASU.
FP7 meeting - Gent - Carlos Rodríguez - April 18 WP4: Conceptual Mining from Text for Knowledge Engineering State of the Art WP Coordinators: Alfonso Valencia.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Multilinugual PennTools that capture parses and predicate-argument structures, and their use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus,
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
BioContrasts: Extracting and Exploiting Protein-protein Contrastive Relations from Biomedical Literature Jung-jae Kim 1, Zhuo Zhang 2, Jong C. Park 1 and.
Semantic Role Labeling Abdul-Lateef Yussiff
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
1 Words and the Lexicon September 10th 2009 Lecture #3.
IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text Syed Toufeeq Ahmed Deepthi Chidambaram Hasan Davulcu Chitta Baral.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
6/29/051 New Frontiers in Corpus Annotation Workshop, 6/29/05 Ann Bies – Linguistic Data Consortium* Seth Kulick – Institute for Research in Cognitive.
BIOI7791 Spring 2005 Projects in bioinformatics: natural language processing March 31, 2005 © Kevin Cohen.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Biomedical Information Extraction. Outline Intro to biomedical information extraction PASTA [Demetriou and Gaizauskas] Biomedical named entities Name.
OntoNotes project Treebank Syntax Training Data Decoders Propositions Verb Senses and verbal ontology links Noun Senses and targeted nominalizations Coreference.
 Text mining for biology and medicine: Glasgow, Feb , 2008 Biomedical information extraction at the University of Pennsylvania Mark Liberman
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
SMBM Talks SMBM, Cambridge, April (Edinburgh May 2) NLP for Biomedical Text Mining.
Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Survey of Semantic Annotation Platforms
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
Information Extraction from the Cancer Literature The Pediatric Hematology/Oncology Seminar Series Children’s Hospital of Philadelphia March 8, 2005 Philadelphia,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
BioNLP related talks and demos at ACL and CONLL ‘05 Presented by Beatrice Alex BioNLP meeting 11 th of July 2005.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
1 Automated recognition of malignancy mentions in biomedical literature BMC Bioinformatics 2006, 7:492 Speaker: Yu-Ching Fang Advisors: Hsueh-Fen Juan.
©2003 Paula Matuszek CSC 9010: Information Extraction Dr. Paula Matuszek (610) Fall, 2003.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
LREC 2008 Marrakech1 Clustering Related Terms with Definitions Scott Piao, John McNaught and Sophia Ananiadou
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Labeling and Enhancing Life Science Links S. Heymann*, F. Naumann*, L. Raschid +, P. Rieger * * Humboldt Universität zu Berlin + University of Maryland.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
MedKAT Medical Knowledge Analysis Tool December 2009.
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Text Mining and Knowledge Management Junichi Tsujii GENIA Project, Kototoi Project ( tokyo.ac.jp/GENIA/) Computer Science, University.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Open Health Natural Language Processing Consortium
Overview of Statistical NLP IR Group Meeting March 7, 2006.
1 GAPSCORE: Finding Gene and Protein Names one Word at a Time Jeffery T. Chang 1, Hinrich Schutze 2 & Russ B. Altman 1 1 Department of Genetics, Stanford.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
English Proposition Bank: Status Report
A Brief Introduction to Distant Supervision
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Improving a Pipeline Architecture for Shallow Discourse Parsing
Towards comprehensive syntactic and semantic annotations of the clinical narrative Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler.
Introduction Task: extracting relational facts from text
CS224N Section 3: Corpora, etc.
Presentation transcript:

6/27/031 Integrating Syntactic and Semantic Annotation of Biomedical Text Seth Kulick, Mark Liberman, Martha Palmer and Andrew Schein The University of Pennsylvania Support from: NSF ITR-EIA

6/27/032 Contributors The University of Pennsylvania zAnn Bies, Susan Davidson, Hubert Jin, Aravind Joshi, Seth Kulick, Jeremy Lacivita, Mark Liberman, Mark Mandel, Mitch Marcus, Marty McCormick, Tom Morton, Martha Palmer, Eric Pancoast, Fernando Pereira, Andrew Schein, Val Tannen, Lyle Ungar, Peng Wang eGenome (Children’s Hospital of Philadelphia) zYang Jin, Peter White, Scott Winters GlaxoSmithKline zJim Butler, Paula Matuszek, Robin McEntire Other zRobert Gaizauskas, Jun-ichi Tsujii, Bonnie Webber

6/27/033 Goal zInformation Extraction from the biomedical literature, particularly Medline yEnzyme Inhibition Relations Expression of CYP3A11 and PXR was suppressed by inactivation of HNF4alpha customer: GlaxoSmithKline yMutation/Malignancy Relations Ki-ras mutations were detected in 17.2% of the adenomas. customer: eGenome zAnnotate 1-10K abstracts for each domain

6/27/034 Approach to Information Extraction zPhase 1: yDevelop definitions and ontologies yAnnotate data according to definitions zPhase 2: Train corpus-based algorithms exploiting various annotation: xParsing xPredicate-argument analysis xReference resolution zPhase 3: “Active Annotation”

6/27/035 Active Annotation Machine Learning Selective Sampling/ Labeling Hand Correction Hand Annotation Selected Documents

6/27/036 Challenge: Diversity in Expression 1.“Activation of the C-Ki-ras genes by point mutations in codons 12 or 13...” 2.“Point mutations in codons 12 and 13 activated C- Ki-ras” 3.“Point mutations in codons 12 and 13 were activators of C-Ki-ras gene” Want to populate a factbank with: activation(C-Ki-ras, point mutation in codon 12) activation(C-Ki-ras, point mutation in codon 13)

6/27/037 Approaches to Handling Diversity zCurrent Approach is to either: yHand build extraction patterns to cover all variant expressions or yAnnotate lots of data to get examples of variant expressions (for machine learning) zProposed Approach: Linguistic analysis of the sentences

6/27/038 Information Extraction Approaches Lexical Info Extracted Relations Extraction Algorithm Linguistic Annotation Common Approach Proposed Approach

6/27/039 Our Annotation Effort Together for the first time… Annotations include: yTreebank (Syntax) yProbank (predicate-argument structure) yEntities (genes, malignancies) yReference and Coreference yFactbanking (end goal)

6/27/0310 NP PP Activation the of Nom genes c-ki-ras NP PP point by mutations PP NP inNP Nom or Nom Codons 12 Nom t 13 Syntactic Structure (Treebank Annotation)

6/27/0311 More Examples of Coordination  “the ortho and meta positions”  (= the ortho positions and meta positions)  “PLC and cytochrome P450 arachidonate epoxygenase activity”  (= PLC arachidonate epoxygenase activity and cytochrome P450 arachidonate…)  “enhanced CYP2C9 expression and 11,12 EET production”  (= enhanced CYP2C9 expression and enhanced 11,12 EET production)

6/27/0312 Predicate-Argument Annotation: Propbank z“Point mutations in codons 12 and 13 were activators of C-K-ras genes” z“Activation of the C-K-ras genes by point mutations in Codons 12 or 13...” zPredicate-Argument Structure (Propbank): yREL: activation activatee: c-ki-ras genes activator: point mutations in codons 12 or 13 yREL: mutations type: point position(s): Codons 12 or 13

6/27/0313 Why Combine Treebank and Propbank? zTreebank indicates constituents ysubject, verb, direct object, etc. zPropbank indicates roles of constituents y“agent,” “theme,” “quantification”, etc. yinhibitor, inhibitee, inhibition rate zPrior work combines Treebank/Propbank for financial text IE: (Surdeneau et al., 2003, Gildea and Palmer, 2002)

6/27/0314 Entity Annotation z Entities we annotate include: “gene”, “protein”, “substance”, “malignancy” zMetonymy Issues: yis a reference a gene or a protein? yWe use subtypes, following ACE conference convention yGene is broken in to three categories: “Generic,” “Gene/RNA” and “Protein”

6/27/0315 The Gene Entity Generic Gene/RNA Protein

6/27/0316 WordFreak Annotation Tool Morton, Lacivita, Pancoast:

6/27/0317 Reference and Co-reference Annotation zCo-reference is an equivalence relation zsubtypes prevent nonsense in a co-ref graph Example of reference types: “K-Ras is a member of the Ras family of Oncogenes. The protein form is actively expressed in…” class-membership(K-Ras, Ras family) anaphor(K-Ras_protein, protein form)

6/27/0318 Current Activities zIn Progress: yEntity Annotation of “Gene,” “Chemical,” “Malignancy,” “genetic variation,” etc. yPOS annotation yTraining Treebank Syntactic Annotators zStarting Up: yStart coreference annotation yBuild our first entity tagging models

6/27/0319 zJanuary Entity tagging and coreference on oncology domain complete. We publish: annotation guidelines data baseline statistical taggers zMay First draft syntactic analysis of oncology domain (1-10K Medline abstracts) Some Projected Milestone Dates

6/27/0320 Some Annotation Projects and Related Research zGENIA Project and U Tokyo Work: zPasta system and Sheffield Work:  GENIES system and Columbia/CUNY Work zModeling Linguistic Phenomenon: yRay/Craven, IJCAI-2001 yPustejovsky et al. 2003

6/27/0321 The End.

6/27/0322 Some Examples Follow

6/27/0323 Reference and Co-reference zOur reference subtypes are: yAcronyms (definitions and linkages) yAnaphor (such as pronouns) yClasses versus their members y“Is-a” relation, i.e. “{CYP450}, {an enzyme} found in…” yStandardized database reference

6/27/0324 Complex Coordination Example Inhibition of CB -52 and -101 metabolism Note coordination of “CB” and also “metabolism”! The sentence above can be represented as: Inhibition of CB-52 metabolism and CB-101 metabolism)