Biological information extraction from natural language text Chitta Baral Arizona State University.

Slides:



Advertisements
Similar presentations
Day 1 Punctuation and Capitalization
Advertisements

Bio-Medical Interaction Extractor Syed Toufeeq Ahmed ASU.
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Identifying Prepositional Phrases
Used in place of a noun pronoun.
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text Syed Toufeeq Ahmed Deepthi Chidambaram Hasan Davulcu Chitta Baral.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
The Eight Parts of Speech
Logics for Data and Knowledge Representation Exercises: Languages Fausto Giunchiglia, Rui Zhang and Vincenzo Maltese.
By Marsha Barrow.
Rules for Longhorn Jeopardy Points to be taken away for wrong answers Make sure you state your answer in a question. Pay attention to all of the questions.
2 pt 3 pt 4 pt 5pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2pt 3 pt 4pt 5 pt 1pt 2pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4pt 5 pt 1pt Parts of speech PunctuationVerbals.
Overview Project Goals –Represent a sentence in a parse tree –Use parses in tree to search another tree containing ontology of project management deliverables.
Writing Effective Sentences Prof ADama. Objective To help the student write clear and effective sentences.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
The Eight Parts of Speech Establishing a common grammar vocabulary.
IVAN CAPP The 8 Parts of Speech.
© 2006 SOUTH-WESTERN EDUCATIONAL PUBLISHING 11th Edition Hulbert & Miller Effective English for Colleges Chapter 7 PREPOSITIONS.
_____________________ Definition Part of Speech (circle one) Picture Antonym (Opposite) Vocab Word Noun Pronoun Adjective Adverb Conjunction Verb Interjection.
8 Parts of Speech Review Nouns, Pronouns, Verbs, Adjectives, Adverbs, Prepositions, Interjections, Conjunctions.
The Parts of Speech The 8 Parts of Speech… Nouns Adjectives Pronouns Verbs Adverbs Conjunctions Prepositions Interjections.
Parts of Speech Review.
Artificial Intelligence: Natural Language
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
LANGUAGE ARTS LA WORKS UNIT 3 REVIEW STUDY GUIDE.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
How many parts of speech can you list?
Parts of Speech There are 8 parts of speech.
Parts of Speech Review. A Noun is a person, place, thing, or idea.
USE CORNELL NOTES AS WE REVIEW THE PARTS OF SPEECH. Parts of Speech Review.
The Building Blocks of Sentences. The 8 Parts of Speech 1.Noun -is the name of a person, place, thing, or idea. Example: John is Tall The name John is.
The Building Blocks of Good Writing
Information Extraction from BioMedical Abstracts Dr. Hasan Davulcu Syed Toufeeq Ahmed Deepthi Chidambaram.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
DGP – S ENTENCE 1 Parts of Speech. S ENTENCE / W ORD B ANK Laugh often with the friends of your heart, for troubles fill all our lives. Word Bank: action.
LING/C SC/PSYC 438/538 Lecture 9 Sandiway Fong. Adminstrivia Homework 4 graded Homework 5 out today – Due Saturday night by midnight – (Gives me Sunday.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
Subject/Predicate Bell Ringer…
Grammar The “4 – Level” Analysis. The 4 - Levels Jack ate a delicious sandwich. Level 1 – parts of speech Level 2 – parts of a sentence Level 3 – phrases.
Parts of Speech Review.
Grammar for Parents 20th October 2016 Welcome! Questions are welcome…
Parts of Speech Review.
Introduction to Machine Learning and Text Mining
Appendix A: Basic Grammar and Punctuation Reference
Comparing the past and the present
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Grammar Review.
GRAMMAR: PARTS OF SPEECH
Conjunctions Prepared by: Khaled Hadi Al Ahbabi Grade: 12 LC
The Eight Parts of Speech
What part of speech is that word?
Complex Sentence Processor
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Chunk Parsing CS1573: AI Application Development, Spring 2003
Grammar Time! STEP 1: Pick up 5 pieces of paper from the student center – whatever color and color combination you desire.
PREPOSITIONAL PHRASES
Week 13 Warm-Ups English 12 Mrs. Fountain.
Parts of Speech.
The Phrase.
Natural Language Processing
Prepositions and Prepositional Phrases
Parts of Speech II.
The building blocks of language!
Review of the 8 major parts of speech in the English language
Part-of-Speech Tagging Using Hidden Markov Models
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
Parts of Speech.
Presentation transcript:

Biological information extraction from natural language text Chitta Baral Arizona State University

Goal Extract `simple’ information from text. This is somewhat simpler than complete natural language understanding Examples of `simple’ information (structure is anticipated) –John was in Phoenix in March at( John, Phoenix, March) –Protein-x in presence of enzyme y breaks down to components z and w. breaks_in_presence_of( x, y, [z, w] ) Not so `simple’ information (meta-informations, unanticipated or untargeted structure) –John only visits cities where he has a friend

Main approach Use extraction rules that can extract the targeted information –Extract P(X,Y,Z) from a sentence if in that sentence X is a proper noun, Y is a verb that immediately follows the noun and Z is a noun phrase that immediately follows Y. Coming up with extraction rules –Manually –Learning extraction rules Develop your own learning program Cast your problem appropriately so as to use existing learning programs (such as Progol, FOIL, etc.) Take an existing information extraction system and make appropriate changes to it so as to make it applicable for our case

Learning extraction rules Mark the text of what is to be extracted Parse the text (with markings) and do part of speech tagging Extract pattern Use the pattern on other text, and add conditions or modify pattern to avoid false positives. Repeat the above steps until an acceptable performance is achieved.

An example HMBA could inhibit the MEC-1 cell proliferation by down-regulation of PCNA expression, it could also induce apoptosis effectively that might be through the way of up- regulation of bax and bcl-2 gene expression. Interaction(HMBA, inhibit, MEC-1 cell proliferation) Interaction(HMBA, down-regulation, PCNA expression)

Parsing and POS tagging [ word([tag= 'NNP',arg(1)],'HMBA'), vg([word([tag= 'MD'],'could'), word([tag = 'VB',arg(2)],'inhibit')]), ng([arg(3)], [word([tag= 'DT'],'the'), word([tag= 'NNP'],'MEC-1'), word([tag= 'NN'],'cell'), word([tag= 'NN'],'proliferation') ] ), word([tag= 'IN'],'by'), word([tag= 'NN'],'down-regulation'), word([tag= 'IN'],'of'), ng([word([tag= 'NNP'],'PCNA'), word([tag= 'NN'],'expression') ]), word([tag= ','],','), word([tag= 'PRP'],'it'), vg([word([tag= 'MD'],'could'), word([tag= 'RB'],'also'), word([tag= 'VB'],'induce') ]), word([tag= 'NN'],'apoptosis'), word([tag= 'RB'],'effectively'), word([tag= 'WDT'],'that'), vg([word([tag= 'MD'],'might'), word([tag= 'VB'],'be')]), word([tag= 'IN'],'through'), ng([word([tag= 'DT'],'the'), word([tag= 'NN'],'way') ]), word([tag= 'IN'],'of'), word([tag= 'NN'],'up-regulation'), word([tag= 'IN'],'of'), word([tag= 'NN'],'bax'), word([tag= 'CC'],'and'), ng([word([tag= 'JJ'], 'bcl-2'), word([tag= 'NN'],'gene'), word([tag= 'NN'],'expression') ]) ]

An alternate way to code sentence(s). first(s, p1). next(p1,p2). next(p2,p3). next(p3,p4). next(p4,p5). next(p5,p6). next(p6,p7). next(p7,p8). next(p8,p9). next(p9,p10). next(p10,p11). next(p11,p12). next(p12,p13). next(p13,p14). next(p14,p15). next(p15,p16). next(p16,p17). next(p17,p18). next(p18,p19). next(p19,p20). next(p20,empty). type(p1, word). tag(p1, nnp). content(p1, hmba). marked(p1,arg1). type(p2, vg). …

POS tags NNP – proper noun MD -- modal VB – verb base form DT -- determiner NN – common noun IN -- preposition PRP RB -- adverb WDT -- CC – coordinating conjunction JJ -- adjective

Extracted interaction rule extract( [ word([tag = NNP],_h18724), word([tag = VB],_h18725), ng(_h18726) ], interact(_h18724,_h18725,_h18726), true).

Tagged text Interact (HMBA, [word ([tag = MD], could), word ([tag = VB], inhibit)], [word ([tag = DT], the), word ([tag = NNP],MEC-1), word ([tag = NN], cell), word ([tag = NN], proliferation)]). Interact (HMBA, down-regulation, [word ([tag = NNP],PCNA), word ([tag = NN], expression)]).

Prolog code for learning extraction rules :-import append/3 from basics. learn( S):- find_interact( S,I,P), nl, write( I), nl, write( P), write_file( P,I). –P : extraction pattern –I : interaction fact –S: tagged text find_interact([word([T,arg(1)],_) | R], interact (A,B,C), P ) :- A=X, pattern ([ word ([T],A)|PR],P), find_interact (SR, interact (A,B,C),PR). More rules for find_interact. pattern( W,P):- P=W. write_file( P,I):- E=extract (P, I, true), open( 'extract.P', append, F), write( F, E), write( F,'.'), nl( F), close( F).

A set of extraction patterns extract( [ word ([tag = 'NNP'],_h13664),word([tag = 'VB'],_h13665), word ([tag = 'NNP'],_h13666)],interact(_h13664,_h13665,_h13666),true). extract( [word ([tag ='NNP'],_h62915),vg(_h62916),ng(_h62917)], interact(_h62915,_h62916,_h62917),true). extract( [word ([tag = 'NNP'],_h112469), word ([tag = 'NN'],_h112470), ng(_h112471)], interact(_h112469,_h112470,_h112471),true). extract( [word ([tag = 'NNP'],_h161953),word([tag = 'NN'],_h161954), word ([tag = 'NNP'],_h161955)], interact(_h161953,_h161954,_h161955),true). extract( [word ([tag = 'VB'],_h17857),vg(_h17858),ng(_h17859)], interact(_h17857,_h17858,_h17859),true). extract( [word ([tag = 'NNP'],_h42739),word([tag = 'NN'],_h42740),ng(_h42741)], interact(_h42739,_h42740,_h42741),true). extract( [word ([tag = 'NNP'],_h44071),word([tag = 'NN'],_h44072),ng(_h44073)], interact(_h44071,_h44072,_h44073),true). extract( [word ([tag = 'NNP'],_h16431),word([tag = 'NN'],_h16432),ng(_h16433)], interact(_h16431,_h16432,_h16433),true).

Code that extracts patterns :- load_dyn( 'extract.P'). matcher(_,[],_). matcher( [SH|ST],[SH|PT],_) :- matcher(ST,PT,_). matcher( [SH|ST],[PH|PT],_) :- SH \== PH, matcher( ST,[PH|PT],_). run( S):- process( S). process(S) :- extract( P,F,_), matcher( S,P,_), write_file(F), fail. process(_). write_file(I):- open( 'interact.P', append,File), write(File,I), write(File,'.'),nl(File), close(File).

Applications of interest Finding interaction between genes and proteins Given a set of genes, say obtained using micro array experiments, using such extracted information get a rough idea about the various genes and proteins that interact with these genes. Now build a pathway.