Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and Jude Shavlik Department of Computer Sciences University of Wisconsin – Madison USA 19 Sept 2004

Talk Outline Inductive Logic Programming Biomedical Information Extraction Our Gleaner Approach Aleph Ensembles Evaluation and Results Future Work

Inductive Logic Programming Machine Learning Classify data into categories Divide data into train and test sets Generate hypotheses on train set and then measure performance on test set In ILP, data are Objects … person, block, molecule, word, phrase, … and Relations between them grandfather, has_bond, is_member, …

Learning daughter(A,B) Positive daughter(mary, ann) daughter(eve, tom) Negative daughter(tom, ann) daughter(eve, ann) daughter(ian, tom) daughter(ian, ann) … Background Knowledge mother(ann, mary) mother(ann, tom) father(tom, eve) father(tom, ian) female(ann) female(mary) female(eve) male(tom) male(ian) Ann IanEve MaryTom Possible Rules daughter(A,B) :- true. daughter(A,B) :- female(A). daughter(A,B) :- female(A), male(B). daughter(A,B) :- female(A), father(B,A). daughter(A,B) :- female(A), mother(B,A). … Father Mother

ILP Domains Object Learning Trains, Carcinogenesis Link Learning Binary predicates

Biomedical Information Extraction *image courtesy of National Human Genome Research Institute

Yeast Protein Database

Biomedical Information Extraction Given: Medical Journal abstracts tagged with protein localization relations Do: Construct system to extract protein localization phrases from unseen text NPL3 encodes a nuclear protein with an RNA recognition motif and similarities to a family of proteins involved in RNA metabolism.

Biomedical Information Extraction NPL3 encodes a nuclear protein with … verbnounarticleadjnounprepsentence prep phrase … verb phrase noun phrase noun phrase alphanumericmarked location noun phrase noun phrase

Sample Extraction Structure Find structures using ILP S CB A article contains alphanumeric contains alphanumeric P noun L noun contains marked location contains no between halfX verb

Protein Localization Extraction Hand-labeled dataset (Ray & Craven ’01) 7,245 sentences from 871 abstracts Examples are phrase-phrase combinations 1,810 positive & 279,154 negative 1.6 GB of background knowledge Structural, Statistical, Lexical and Ontological In total, 200+ distinct background predicates

Our Generate-and-Test Approach rel(Prot, Loc) rel(Prot, Loc) rel(Prot, Loc) Parsed sentence (NP’s non-blue) Candidates generated NPL3 encodes a nuclear protein with …

Some Ranking Predicates High-scoring words in protein phrases repressor, ypt1p, nucleoporin High-scoring words in location phrases cytoskeleton, inner, predominately High-scoring BETWEEN prot & loc cofraction, mainly, primarily, …, locate Stemming seemed to hurt here … Warning: must do PER fold

Some Biomedical Predicates On-Line Medical Dictionary natural source for semantic classes eg, word occurs in category ‘cell biology’ Medical Subject Headings (MeSH) canonized method for indexing biomedical articles ISA hierarchy of words and subcategories Gene Ontology (GO) another ISA hierarchy of biological knowledge

Some More Predicates Look-ahead Phrase Predicates few_POS_in_phrase(Phrase, POS) phrase_contains_specific_word_triple(Phrase, W1, W2, W3) phrase_contains_some_marked_up_arg(Phrase, Arg#, Word, Fold) Relative Location of Phrases protein_before_location(ExampleID) word_pair_in_between_target_phrases(ExampleID, W1, W2)

Link Learning Large skew toward negatives 500 relational objects 5000 positive links means 245,000 negative links Difficult to measure success Always negative classifier is 98% accurate ROC curves look overly optimistic Enormous quantity of data 4,285,199,774 web pages indexed by Google PubMed includes over 15 million citations

Our Approach Develop fast ensemble algorithms focused on recall and precision evaluation Key Ideas of Gleaner Keep wide range of clauses Create separate theories for different recall ranges Evaluation Area Under Recall Precision Curve (AURPC) Time = Number of clauses considered

Gleaner - Background Prediction vs Actual Positive or Negative True or False Focus on positive examples Recall = Precision = FNTP  FPTP  FP FN TN actual prediction

Gleaner - Background Seed Example A positive example that our clause must cover Bottom Clause All predicates which are true about seed example Rapid Random Restart ( Zelezny et al ILP 2002 ) Stochastic selection of starting clause Time-limited local heuristic search We store variety of clauses (based on recall)

Gleaner - Learning Precision Recall Create B Bins Generate Clauses Record Best Repeat for K seeds

Gleaner - Combining Combine K clauses per bin If at least L of K clauses match, call example positive How to choose L ? L=1 then high recall, low precision L=K then low recall, high precision Our method Choose L such that ensemble recall matches bin b Bin b’s precision should be higher than any clause in it We should now have set of high precision rule sets spanning space of recall levels

How to use Gleaner Precision Recall Generate Curve User Selects Recall Bin Return Classifications With Precision Confidence Recall = 0.50 Precision = 0.70

Aleph - Learning Aleph learns theories of clauses (Srinivasan, v4, 2003) Pick positive seed example, find bottom clause Use heuristic search to find best clause Pick new seed from uncovered positives and repeat until threshold of positives covered Theory produces one recall-precision point Learning complete theories is time-consuming Can produce ranking with ensembles

Aleph Ensembles We compare to ensembles of theories Algorithm ( Dutra et al ILP 2002 ) Use K different initial seeds Learn K theories containing C clauses Rank examples by the number of theories Need to balance C for high performance Small C leads to low recall Large C leads to converging theories

Aleph Ensembles (100 theories)

Evaluation Metrics Area Under Recall- Precision Curve (AURPC) All curves standardized to cover full recall range Averaged AURPC over 5 folds Number of clauses considered Rough estimate of time Both are “stop anytime” parallel algorithms Recall Precision 1.0

AURPC Interpolation Convex interpolation in RP space? Precision interpolation is counterintuitive Example: 1000 positive & 9000 negative TPFPTP RateFP RateRecallPrec 500 0.500.060.50 100090001.00 0.10 Example CountsRP CurvesROC Curves 75047500.750.530.750.14

AURPC Interpolation

Experimental Methodology Performed five-fold cross-validation Variation of parameters Gleaner (20 recall bins) # seeds = {25, 50, 75, 100} # clauses = {1K, 10K, 25K, 50K, 100K, 250K, 500K} Ensembles (0.75 minacc, 35,000 nodes) # theories = {10, 25, 50, 75, 100} # clauses per theory = {1, 5, 10, 15, 20, 25, 50}

Results: Testfold 5 at 1,000,000 clauses Ensembles Gleaner

Results: Gleaner vs Aleph Ensembles

Further Results

Conclusions Gleaner Focuses on recall and precision Keeps wide spectrum of clauses Good results in few cpu cycles Aleph ensembles ‘Early stopping’ helpful Require more cpu cycles AURPC Useful metric for comparison Interpolation unintuitive

Future Work Improve Gleaner performance over time Explore alternate clause combinations Better understanding of AURPC Search for clauses that optimize AURPC Examine more ILP link-learning datasets Use Gleaner with other ML algorithms

Acknowledgements USA NLM Grant 5T15LM007359-02 USA NLM Grant 1R01LM07050-01 USA DARPA Grant F30602-01-2-0571 USA Air Force Grant F30602-01-2-0571 Condor Group David Page Vitor Santos Costa, Ines Dutra Soumya Ray, Marios Skounakis, Mark Craven Dataset available at (URL in proceedings) ftp://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/IE-protein-location

Deleted Scenes Clause Weighting Gleaner Algorithm Director Commentary onoff

Take-Home Message Definition of Gleaner One who gathers grain left behind by reapers Gleaner and ILP Many clauses constructed and evaluated in ILP hypothesis search We need to make better use of those that aren’t the highest scoring ones Thanks, Questions?

Clause Weighting Single Theory Ensemble rank by how many clauses cover examples Weight clauses using tuneset statistics CN2 (average precision of matching clauses) Lowest False Positive Rate Score Cumulative F1 score Recall Precision Diversity

Clause Weighting

Gleaner Algorithm Create B equal-sized recall bins For K different seeds Generate rules using Rapid Random Restart Record best rule (precision x recall) found for each bin For each recall bin B Find threshold L of K clauses such that recall of “at least L of K clauses match examples” = recall for this bin Find recall and precision on testset using each bin’s “at least L of K” decision process

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

Similar presentations

Presentation on theme: "Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

Similar presentations

Presentation on theme: "Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and."— Presentation transcript:

Similar presentations

About project

Feedback