Bio-Medical Interaction Extractor Syed Toufeeq Ahmed ASU.

Slides:



Advertisements
Similar presentations
Processing Complex Sentences for Information Extraction Deepthi Chidambaram December 22, 2004 BY 510 Committee Dr. Hasan Davulcu Dr. Chitta Baral Dr. Yoganand.
Advertisements

Logic form identification of medical clinical trials Clint Tustison.
FP7 meeting - Gent - Carlos Rodríguez - April 18 WP4: Conceptual Mining from Text for Knowledge Engineering State of the Art WP Coordinators: Alfonso Valencia.
Semantics (Representing Meaning)
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
BioContrasts: Extracting and Exploiting Protein-protein Contrastive Relations from Biomedical Literature Jung-jae Kim 1, Zhuo Zhang 2, Jong C. Park 1 and.
Verb, Adverb, Preposition, Conjunction, Interjection
Used in place of a noun pronoun.
Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters.
IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text Syed Toufeeq Ahmed Deepthi Chidambaram Hasan Davulcu Chitta Baral.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Copyright © Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya.
Predicting Gene Functions from Text Using a Cross- Species Approach Emilia Stoica and Marti Hearst SIMS University of California, Berkeley.
Information Extraction from Biomedical Text Jerry R. Hobbs Artificial Intelligence Center SRI International.
Biological information extraction from natural language text Chitta Baral Arizona State University.
Link Grammar ( by Davy Temperley, Daniel Sleator & John Lafferty ) Syed Toufeeq Ahmed ASU.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Grammar Rules. Pronouns 1.Use as a S, DO, PN, or IO 2.Personal pronouns may be adjectives 3.Relative pronouns may introduce adjective clauses.
1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lawrence Hunter & K. Bretonnel Cohen Center for Computational Pharmacology UCHSC School of Medicine Using.
RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
C. Putnam L. Raney.  Clause – a group of words that have a subject and a verb that must always agree  Phrase – a noun, verb, or preposition with all.
Flexible Text Mining using Interactive Information Extraction David Milward
Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,
Functions of a Noun A noun is a person, place, thing or idea. A noun can be found in any part of a sentence. The function of a noun will vary depending.
The Eight Parts of Speech Establishing a common grammar vocabulary.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
A Biology Primer Part IV: Gene networks and systems biology Vasileios Hatzivassiloglou University of Texas at Dallas.
Regulation of Gene Expression All genes not expressed at all times All genes not expressed in all cells Regulation of gene expression is necessary to ensure.
GRAMMAR LINK VERBS, ADVERBS, PREPOSITIONS, CONJUNCTIONS, AND INTERJECTIONS ARE FIVE OF THE EIGHT PARTS OF SPEECH THE FOLLOWING SLIDES PROVIDE BRIEF DEFINITIONS.
Mining the Biomedical Research Literature Ken Baclawski.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
Information Extraction from BioMedical Abstracts Dr. Hasan Davulcu Syed Toufeeq Ahmed Deepthi Chidambaram.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Pre positions Words that show how nouns and pronouns relate to other words within a sentence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
Parts of Speech Review.
Appendix A: Basic Grammar and Punctuation Reference
Biomedical Text Mining and Its Applications
Statistical NLP: Lecture 3
Protein association networks with STRING
Semantics (Representing Meaning)
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
DGP – Sentence 1 Sentence Parts.
Diagramming Sentences Notes
Concept Maps.
Prepositions: show relationship of one noun or pronoun to some other word in the sentence; often reflect spatial or time relationships.
Using UMLS CUIs for WSD in the Biomedical Domain
Initial Considerations
Parts of Speech Mr. White English I.
Complex Sentence Processor
Observable cell differentiation results from the expression of genes for tissue-specific proteins. Re-write the sentence above in your own words.
©2004 Pearson Education, Inc., publishing as Longman Publishers.
PREPOSITIONAL PHRASES
Prepositions: show relationship of one noun or pronoun to some other word in the sentence; often reflect spatial or time relationships.
CS246: Information Retrieval
A Link Grammar for an Agglutinative Language
Presentation transcript:

Bio-Medical Interaction Extractor Syed Toufeeq Ahmed ASU

Matching with BioMedical Ontology Gene List (508,477) from Locus Link ( Interaction List (1500) from UMLS ( A noun phrases is tagged as a GENE (G) if it matches a gene from the Gene list Any word is tagged as an INTERACTION (I) if it matches an interaction from the interaction list (after stemming)

Syntactic Roles with Link Grammar “HMBA could inhibit the MEC-1 cell proliferation by down-regulation of PCNA expression.” Subject Verb Object Modifying Phrase

Scopes Various syntactic roles (such as Subject, Object and Modifying phrase) and their linguistically significant combinations makes up SCOPES A SCOPE MATCHING is: Elementary (E) : If the scope contains a Gene /Protein (G) name or an interaction word (I). Partial (P) : If the scope has a Gene/Protein (G) name and an interaction word (I). Complete (C) : If the scope has at least two Gene /Protein (G) names and an interaction word (I).

Scopes Elementary (Subject) Elementary (Object) Partial (Modifying Phrase) “HMBA could inhibit the MEC-1 cell proliferation by down-regulation of PCNA expression.” Interaction (Verb)

Scopes & Matches “The kinase phosphorylation of Gene1 by Gene2 could inhibit Gene3. ” Complete (Subject)

Algorithm of Interaction Extractor: SOMPMP S-OS- M SubjectModifying Phrase Object complete (G,I,G)  interact: {G,I,G} complete (G,I,G)  interact: {G,I,G} complete (G,I,G)  interact: {G,I,G} Elementary (G1) Elementary (G2) Is Main Verb an Interaction (I) ? Interaction : { G1, I, G2 } Partial (I,G2) Interaction : { G1, I, G2 }

Algorithm 1) Using the linkage given by the Link Grammar parser, the Subject, the Object and the modifying phrase scopes are obtained (S, O and MP respectively). 2) If S, O or MP has a complete interaction, then we use Preposition- based approach to find agent, theme and action to extract the interaction. 3) a) Identify the main verb of the sentence and extract interaction from the combination of Subject and Object scopes. b) If the above step gives a complete interaction from subject-object combination (S4 = C) and the scope of modifying phrase is Elementary then skip STEP 4. 4) Extract interaction from the combination of Subject and modifying scopes.

Different possible cases for subject-object combination when main verb is not an interaction word. When main verb is NOT an Interaction wordExtracted Interaction a)S = E (G1)O = P (I1,G2){G1,I1,G2} b)S = P (G1,I1)O = E (G2){G1,I1,G2} c)S = C (G1,I1,G2)O = P (I2,G3){(G1,I1,G2), I2,G3} d)S = P (G1,I1)O = C (G2,I2,G3){G1,I1,(G2,I2,G3)}

When main verb is an Interaction word (I1)Extracted Interaction a)S = E (G1)O = E (G2) {G1,I1,G2} b)S = E (G1)O = P (I2,G2) {G1,I1,(I2/G2)} c)S = P (G1,I2)O = E (G2) {(G1/I2), I1, G2} d)S = P (G1,I2)O = P (I3,G2){(G1/I2), I1, (I3/G2)} e)S = C (G1,I2,G2)O = E (G3){(G1,I2,G2), I1,G3) f)S = E (G1)O = C (G2,I2,G3){G1,I1,(G2,I2,G3)} Different possible cases for subject-object combination when main verb is an interaction word.

Different possible cases for subject-modifying phrase combination when main verb is not an interaction word. When main verb is NOT an Interaction wordExtracted Interaction a)S = E (G1)MP = P (I1,G2){G1,I1,G2} b)S = P (G1,I1)MP = E (G2){G1,I1,G2} c)S = C (G1,I1,G2)MP = P (I2,G3){(G1,I1,G2), I2,G3} d)S = P (G1,I1)MP= C (G2,I2,G3){G1,I1,(G2,I2,G3)}

Different possible cases for subject- modifying phrase combination when main verb is an interaction word. When main verb is an Interaction word (I1)Extracted Interaction a)S = E (G1)MP = E (G2) {G1,I1,G2} b)S = E (G1)MP = P (I2,G2) {G1,I1,(I2/G2)} c)S = P (G1,I2)MP = E (G2) {(G1/I2), I1, G2} d)S = P (G1,I2)MP = P (I3,G2){(G1/I2), I1, (I3/G2)} e)S = C (G1,I2,G2)MP= E (G3){(G1,I2,G2), I1,G3) f)S = E (G1)MP = C (G2,I2,G3){G1,I1,(G2,I2,G3)}

Example Elementary (G) Elementary (G) Partial “HMBA could inhibit the MEC-1 cell proliferation by down-regulation of PCNA expression.” Main Verb (I) { “HMBA”, “inhibit”, “the MEC-1 cell proliferation” } { “HMBA”, “down-regulation”, “PCNA expression”}

Example 1) The main verb ( “inhibit” ) is identified: Subject: “HMBA” (Elementary) Object: “the MEC-1 cell proliferation” (Elementary) Modifying Phrase: “by down-regulation of PCNA expression” (Partial) “HMBA could inhibit the MEC-1 cell proliferation by down-regulation of PCNA expression.”

{ “HMBA”, “inhibit”, “the MEC-1 cell proliferation” } a)S = E (G1)O = E (G2) {G1,I1,G2} b)S = E (G1)O = P (I2,G2) {G1,I1,(I2/G2)} c)S = P (G1,I2)O = E (G2) {(G1/I2), I1, G2} d)S = P (G1,I2)O = P (I3,G2){(G1/I2), I1, (I3/G2)} e)S = C (G1,I2,G2)O = E (G3){(G1,I2,G2), I1,G3) f)S = E (G1)O = C (G2,I2,G3){G1,I1,(G2,I2,G3)} 3) Interaction between subject and object is extracted.

4) Now we extract interaction between Subject and modifying phrase. a)S = E (G1)MP = P (I1,G2){G1,I1,G2} b)S = P (G1,I1)MP = E (G2){G1,I1,G2} c)S = C (G1,I1,G2)MP = P (I2,G3){(G1,I1,G2), I2,G3} d)S = P (G1,I1)MP= C (G2,I2,G3){G1,I1,(G2,I2,G3)} {“HMBA”, “down-regulation”, “PCNA expression”}

Example 2 “The kinase phosphorylation of Gene1 by Gene2 could inhibit Gene3. ” Complete (Subject) Elementary (Object) Main verb

Preposition-based patterns Subject is Complete and Of / by pattern: ….. Of.. … by.. …. “The kinase phosphorylation of Gene1 by Gene2” { “Gene2”, “phosphorylation”, “Gene1” }

Sub: “The kinase phosphorylation of Gene1 by Gene2” Obj: “Gene3” Verb: “inhibit” Nested Interaction: { { “Gene2”, “phosphorylation”, “Gene1” }, “inhibit”, “Gene3” } “The kinase phosphorylation of Gene1 by Gene2 could inhibit Gene3. ” { “Gene2”, “phosphorylation”, “Gene1” }

Next Steps Handling negations in the sentences (such as “not interact”, “fails to induce”, “does not inhibit”). Extraction of detailed contextual attributes of interactions (such as bio-chemical context or location) by interpreting modifiers: Location/Position modifiers (in, at, on, into, up, over…) Agent/Accompaniment modifiers (by, with…) Purpose modifiers( for…) Theme/association modifiers ( of..) Extraction of relationships between interactions from among multiple sentences in abstracts (signaling pathways)

Next Steps Visualization of Signaling Pathways

Preliminary Results DatasetPrecision %Recall % Curated text 95.4 % Abstracts %89.18 %

References Link Grammar: LocusLink: UMLS: