Complex Sentence Processor Using Link Grammar to simplify complex sentences 12/24/2018 Deepthi Chidambaram
Problem Statement John played the pipes. Extraction of gene-gene interactions from unstructured biomedical text. Corpus – Biomedical abstracts, curated text Rich in interactions Freely available Approach – verb based extraction. John played the pipes. Interactions in a noun phrase are also extracted – detailed by Toufeeq. Crux of the sentence 12/24/2018 Deepthi Chidambaram
Sentences in abstracts Interactions specified in ‘creative’ ways HMBA inhibits MEC-1 cell proliferation. GBMs commonly overexpress the oncogenes EGFR and PDGFR, and contain mutations and deletions of tumor suppressor genes PTEN and TP53. Protein kinase B (PKB) has emerged as the focal point for many signal transduction pathways, regulating multiple cellular processes such as glucose metabolism, transcription, apoptosis, cell proliferation, angiogenesis, and cell motility. 12/24/2018 Deepthi Chidambaram
Problems that come up Anaphora resolution [Anaphora] Pronominals – It activates HMBA. Sortal anaphora – Both enzymes are phosphorylated. Event anaphora – This reaction acts in a mediated environment. Multiple interactions - Complex sentences Most of the tumor-suppressive properties of Pten are dependent on its lipid phosphatase activity, which inhibits the phosphatidylinositol-3'-kinase (PI3K)/Akt signaling pathway through dephosphorylation of phosphatidylinositol-(3,4,5)-triphosphate 12/24/2018 Deepthi Chidambaram
Our solution: Pronoun resolution Pronouns in abstracts – third person It, itself, them, themselves. Replace pronouns with first noun group that matches the number. References in the absence of pronouns – handled by Link Grammar. 12/24/2018 Deepthi Chidambaram
Pronoun Resolution: walkthrough Ku loads onto dsDNA ends and it can diffuse along the DNA in an energy-independent manner. Ku loads onto dsDNA ends and Ku can diffuse along the DNA in an energy-independent manner. When breast cancers were examined for NGAL mRNA and protein levels, they were found to exhibit heterogeneous expression. When breast cancers were examined for NGAL mRNA and protein levels , breast cancers were found to exhibit heterogeneous expression . 12/24/2018 Deepthi Chidambaram
Complex Sentence Structures Independent clauses with connectives Many dependent clauses with one independent clause with / without connectives Multiple agents and goals in a single clause Gene14 binds to Gene15 in response to 1-b-Gene16 or methylmethanesulfonate ; this interaction does not require Gene17-Gene18-Gene19. Gene57-Gene58-Gene59-Gene60 is blocked by Gene61, which binds to Gene62-Gene63-Gene64-Gene65. Gene96 or Gene97 competes with Gene98 for binding to Gene99 and Gene100 or Gene101 stimulates Gene102-Gene103-Gene104 in vitro in the absence of Gene105. 12/24/2018 Deepthi Chidambaram
Our Solution: Complex Sentences Identify clauses in complex sentences. Build simple sentences from the clauses. Tool used – Link Grammar Parser [Link] Clause Format. Subject | Verb | Object | Modifying phrase (Adverbial Phrase/ Prepositional Phrase) 12/24/2018 Deepthi Chidambaram
CSP – Goal Upon growth factor stimulation of quiescent cells, Gene100 declines late in Gene101 and Gene102 is replaced by Gene103, which is absent in quiescent cells. Upon growth factor stimulation of quiescent cells, Gene100 declines late in Gene101. Gene102 is replaced by Gene103. Gene103 is absent in quiescent cells. 12/24/2018 Deepthi Chidambaram
Complex Sentence Processor E|18|Upon growth factor stimulation of quiescent cells, Gene100 declines late in Gene101 and Gene102 is replaced by Gene103, which is absent in quiescent cells. C|2|In Gene11-Gene12, Gene13 stimulates Gene14-Gene15-Gene16-Gene17. | CSP E|18|upon growth factor stimulation of quiescent cells , Gene100|declines||late#in Gene101#| E|18|Gene102|is replaced||by Gene103 , which#| E|18|Gene103 |is absent||in quiescent cells#| C|2|in Gene11-Gene12 , Gene13|stimulates|Gene14-Gene15-Gene16-Gene17#|| Subject Verb Objects Modifying Phrases Upon… declines late # in Gene101# … 12/24/2018 Deepthi Chidambaram
Complex Sentence Processor CSP – Data Flow Pronoun Resolution module Prolog Abstracts Gene Tagger Pre-Processor Link Grammar, Java Complex Sentence Processor Sentence database 12/24/2018 Deepthi Chidambaram
Illustration
Partial List of References [Link] Daniel Sleator and Davy Temperley. 1991. Parsing English with a Link Grammar. Carnegie Mellon University Computer Science technical report CMU-CS-91-196, October 1991. [Kohn] Kohn, K. W. (1999). "Molecular Interaction Map of the Mammalian Cell Cycle Control and DNA Repair Systems." Molecular Biology of the cell 10: 2703-2734. [Locuslink] Pruitt, K. D. and D. R. Maglott (2001). "RefSeq and LocusLink: NCBI gene-centered resources." Nucleic Acids Res 29(1): 137-140. (http://www.ncbi.nlm.nih.gov/LocusLink/ ) [Anaphora] Casta˜no, J., Zhang, J., Pustejovsky, J., Anaphora Resolution in Biomedical Literature 12/24/2018 Deepthi Chidambaram