Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biological information extraction from natural language text Chitta Baral Arizona State University.

Similar presentations


Presentation on theme: "Biological information extraction from natural language text Chitta Baral Arizona State University."— Presentation transcript:

1 Biological information extraction from natural language text Chitta Baral Arizona State University

2 Goal Extract `simple’ information from text. This is somewhat simpler than complete natural language understanding Examples of `simple’ information (structure is anticipated) –John was in Phoenix in March at( John, Phoenix, March) –Protein-x in presence of enzyme y breaks down to components z and w. breaks_in_presence_of( x, y, [z, w] ) Not so `simple’ information (meta-informations, unanticipated or untargeted structure) –John only visits cities where he has a friend

3 Main approach Use extraction rules that can extract the targeted information –Extract P(X,Y,Z) from a sentence if in that sentence X is a proper noun, Y is a verb that immediately follows the noun and Z is a noun phrase that immediately follows Y. Coming up with extraction rules –Manually –Learning extraction rules Develop your own learning program Cast your problem appropriately so as to use existing learning programs (such as Progol, FOIL, etc.) Take an existing information extraction system and make appropriate changes to it so as to make it applicable for our case

4 Learning extraction rules Mark the text of what is to be extracted Parse the text (with markings) and do part of speech tagging Extract pattern Use the pattern on other text, and add conditions or modify pattern to avoid false positives. Repeat the above steps until an acceptable performance is achieved.

5 An example HMBA could inhibit the MEC-1 cell proliferation by down-regulation of PCNA expression, it could also induce apoptosis effectively that might be through the way of up- regulation of bax and bcl-2 gene expression. Interaction(HMBA, inhibit, MEC-1 cell proliferation) Interaction(HMBA, down-regulation, PCNA expression)

6 Parsing and POS tagging [ word([tag= 'NNP',arg(1)],'HMBA'), vg([word([tag= 'MD'],'could'), word([tag = 'VB',arg(2)],'inhibit')]), ng([arg(3)], [word([tag= 'DT'],'the'), word([tag= 'NNP'],'MEC-1'), word([tag= 'NN'],'cell'), word([tag= 'NN'],'proliferation') ] ), word([tag= 'IN'],'by'), word([tag= 'NN'],'down-regulation'), word([tag= 'IN'],'of'), ng([word([tag= 'NNP'],'PCNA'), word([tag= 'NN'],'expression') ]), word([tag= ','],','), word([tag= 'PRP'],'it'), vg([word([tag= 'MD'],'could'), word([tag= 'RB'],'also'), word([tag= 'VB'],'induce') ]), word([tag= 'NN'],'apoptosis'), word([tag= 'RB'],'effectively'), word([tag= 'WDT'],'that'), vg([word([tag= 'MD'],'might'), word([tag= 'VB'],'be')]), word([tag= 'IN'],'through'), ng([word([tag= 'DT'],'the'), word([tag= 'NN'],'way') ]), word([tag= 'IN'],'of'), word([tag= 'NN'],'up-regulation'), word([tag= 'IN'],'of'), word([tag= 'NN'],'bax'), word([tag= 'CC'],'and'), ng([word([tag= 'JJ'], 'bcl-2'), word([tag= 'NN'],'gene'), word([tag= 'NN'],'expression') ]) ]

7 An alternate way to code sentence(s). first(s, p1). next(p1,p2). next(p2,p3). next(p3,p4). next(p4,p5). next(p5,p6). next(p6,p7). next(p7,p8). next(p8,p9). next(p9,p10). next(p10,p11). next(p11,p12). next(p12,p13). next(p13,p14). next(p14,p15). next(p15,p16). next(p16,p17). next(p17,p18). next(p18,p19). next(p19,p20). next(p20,empty). type(p1, word). tag(p1, nnp). content(p1, hmba). marked(p1,arg1). type(p2, vg). …

8 POS tags NNP – proper noun MD -- modal VB – verb base form DT -- determiner NN – common noun IN -- preposition PRP RB -- adverb WDT -- CC – coordinating conjunction JJ -- adjective

9 Extracted interaction rule extract( [ word([tag = NNP],_h18724), word([tag = VB],_h18725), ng(_h18726) ], interact(_h18724,_h18725,_h18726), true).

10 Tagged text Interact (HMBA, [word ([tag = MD], could), word ([tag = VB], inhibit)], [word ([tag = DT], the), word ([tag = NNP],MEC-1), word ([tag = NN], cell), word ([tag = NN], proliferation)]). Interact (HMBA, down-regulation, [word ([tag = NNP],PCNA), word ([tag = NN], expression)]).

11 Prolog code for learning extraction rules :-import append/3 from basics. learn( S):- find_interact( S,I,P), nl, write( I), nl, write( P), write_file( P,I). –P : extraction pattern –I : interaction fact –S: tagged text find_interact([word([T,arg(1)],_) | R], interact (A,B,C), P ) :- A=X, pattern ([ word ([T],A)|PR],P), find_interact (SR, interact (A,B,C),PR). More rules for find_interact. pattern( W,P):- P=W. write_file( P,I):- E=extract (P, I, true), open( 'extract.P', append, F), write( F, E), write( F,'.'), nl( F), close( F).

12 A set of extraction patterns extract( [ word ([tag = 'NNP'],_h13664),word([tag = 'VB'],_h13665), word ([tag = 'NNP'],_h13666)],interact(_h13664,_h13665,_h13666),true). extract( [word ([tag ='NNP'],_h62915),vg(_h62916),ng(_h62917)], interact(_h62915,_h62916,_h62917),true). extract( [word ([tag = 'NNP'],_h112469), word ([tag = 'NN'],_h112470), ng(_h112471)], interact(_h112469,_h112470,_h112471),true). extract( [word ([tag = 'NNP'],_h161953),word([tag = 'NN'],_h161954), word ([tag = 'NNP'],_h161955)], interact(_h161953,_h161954,_h161955),true). extract( [word ([tag = 'VB'],_h17857),vg(_h17858),ng(_h17859)], interact(_h17857,_h17858,_h17859),true). extract( [word ([tag = 'NNP'],_h42739),word([tag = 'NN'],_h42740),ng(_h42741)], interact(_h42739,_h42740,_h42741),true). extract( [word ([tag = 'NNP'],_h44071),word([tag = 'NN'],_h44072),ng(_h44073)], interact(_h44071,_h44072,_h44073),true). extract( [word ([tag = 'NNP'],_h16431),word([tag = 'NN'],_h16432),ng(_h16433)], interact(_h16431,_h16432,_h16433),true).

13 Code that extracts patterns :- load_dyn( 'extract.P'). matcher(_,[],_). matcher( [SH|ST],[SH|PT],_) :- matcher(ST,PT,_). matcher( [SH|ST],[PH|PT],_) :- SH \== PH, matcher( ST,[PH|PT],_). run( S):- process( S). process(S) :- extract( P,F,_), matcher( S,P,_), write_file(F), fail. process(_). write_file(I):- open( 'interact.P', append,File), write(File,I), write(File,'.'),nl(File), close(File).

14 Applications of interest Finding interaction between genes and proteins Given a set of genes, say obtained using micro array experiments, using such extracted information get a rough idea about the various genes and proteins that interact with these genes. Now build a pathway.


Download ppt "Biological information extraction from natural language text Chitta Baral Arizona State University."

Similar presentations


Ads by Google