Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Imbalanced data David Kauchak CS 451 – Fall 2013.

ICML Linear Programming Boosting for Uneven Datasets Jurij Leskovec, Jožef Stefan Institute, Slovenia John Shawe-Taylor, Royal Holloway University.

Learning Algorithm Evaluation

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Model Evaluation Metrics for Performance Evaluation

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.

Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.

© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.

1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.

Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.

Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.

Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Experimental Evaluation of Learning Algorithms Part 1.

Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.

CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.

Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.

Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Gleaning Relational Information from Biomedical Text Mark Goadrich Computer Sciences Department University of Wisconsin - Madison Joint Work with Jude.

For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?

Biomedical Information Extraction using Inductive Logic Programming Mark Goadrich and Louis Oliphant Advisor: Jude Shavlik Acknowledgements to NLM training.

Ensemble Methods in Machine Learning

Learning Ensembles of First- Order Clauses That Optimize Precision-Recall Curves Mark Goadrich Computer Sciences Department University of Wisconsin - Madison.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Information Retrieval Quality of a Search Engine.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Final Report (30% final score) Bin Liu, PhD, Associate Professor.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Multiple-goal Search Algorithms and their Application to Web Crawling Dmitry Davidov and Shaul Markovitch Computer Science Department Technion, Haifa 32000,

BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.

Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)

Frank DiMaio and Jude Shavlik Computer Sciences Department

7. Performance Measurement

Louis Oliphant and Jude Shavlik

Mark Goadrich Computer Sciences Department

iSRD Spam Review Detection with Imbalanced Data Distributions

Mark Rich & Louis Oliphant

Panagiotis G. Ipeirotis Luis Gravano

CS246: Information Retrieval

Presentation transcript:

Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University of Wisconsin – Madison USA 17 Dec 2004

Talk Outline Background Inductive Logic Programming Evaluation Metrics Biomedical Information Extraction Preliminary Work Three Ensemble Approaches Empirical Results Proposed Work Extensions to Algorithms Theoretical Results

Inductive Logic Programming Machine Learning Classify data into positive, negative categories Divide data into train and test sets Generate hypotheses on train set and then measure performance on test set In ILP, data are Objects … person, block, molecule, word, phrase, … and Relations between them grandfather, has_bond, is_member, …

Learning daughter(A,B) Positive Examples daughter(mary, ann) daughter(eve, tom) Negative Examples daughter(tom, ann) daughter(eve, ann) daughter(ian, tom) daughter(ian, ann) … Background Knowledge mother(ann, mary) mother(ann, tom) father(tom, eve) father(tom, ian) female(ann) female(mary) female(eve) male(tom) male(ian) Ann IanEve MaryTom Possible Clauses daughter(A,B) :- true. daughter(A,B) :- female(A). daughter(A,B) :- female(A), male(B). daughter(A,B) :- female(A), father(B,A). daughter(A,B) :- female(A), mother(B,A). … Father Mother Correct Theory

ILP Domains Object Learning Trains, Carcinogenesis Link Learning Binary predicates

Link Learning Large skew toward negatives 500 relational objects 5000 positive links means 245,000 negative links Enormous quantity of data 4,285,199,774 web pages indexed by Google PubMed includes over 15 million citations Difficult to measure success Always negative classifier is 98% accurate ROC curves look overly optimistic

Evaluation Metrics Classification vs Correctness Positive or Negative True or False Evaluation Recall Precision FPTP  FP FN TN correctness classification True Positive Rate False Positive Rate FNTP  FPTP  FPTP FP 

Evaluation Metrics Area Under Recall- Precision Curve (AURPC) Cumulative measure over recall-precision space All curves standardized to cover full recall range Average AURPC over 5 folds Recall Precision 1.0

AURPC Interpolation Convex interpolation in RP space? Precision interpolation is counterintuitive Example: 1000 positive & 9000 negative TPFPTP RateFP RateRecallPrec Example CountsRP CurvesROC Curves

AURPC Interpolation

Biomedical Information Extraction *image courtesy of National Human Genome Research Institute

Biomedical Information Extraction Given: Medical Journal abstracts tagged with protein localization relations Do: Construct system to extract protein localization phrases from unseen text NPL3 encodes a nuclear protein with an RNA recognition motif and similarities to a family of proteins involved in RNA metabolism.

Biomedical Information Extraction Hand-labeled dataset (Ray & Craven ’01) 7,245 sentences from 871 abstracts Examples are phrase-phrase combinations 1,810 positive & 279,154 negative 1.6 GB of background knowledge Structural, Statistical, Lexical and Ontological In total, 200+ distinct background predicates

Biomedical Information Extraction NPL3 encodes a nuclear protein with … verbnounarticleadjnounprepsentence prep phrase … verb phrase noun phrase noun phrase alphanumericmarked location noun phrase noun phrase

Related Work Bagging in ILP (Dutra et. al.) Boosting FOIL (Quinlan) Boosting ILP (Hoche) Structural HMM (Ray and Craven) WAWA-IE (Eliassi-Rad and Shavlik) Markov Logic Nets (Richardson and Domingos) ELCS (Bunescu et. al.)

Talk Outline Background Inductive Logic Programming Evaluation Metrics Biomedical Information Extraction Preliminary Work Three Ensemble Approaches Empirical Results Proposed Work Extensions to Algorithms Theoretical Results

Aleph - Background Seed Example A positive example that our clause must cover Bottom Clause All predicates which are true about seed example seed

Aleph - Learning Aleph learns theories of clauses (Srinivasan, v4, 2003) Pick positive seed example, find bottom clause Use heuristic search to find best clause Pick new seed from uncovered positives and repeat until threshold of positives covered Theory produces one recall-precision point Learning complete theories is time-consuming Can produce ranking with ensembles

ILP Ensembles Three Approaches Aleph Ensembles of Multiple Theories Clause Weighting of One Theory Gleaner Evaluation Area Under Recall Precision Curve (AURPC) Time = Number of clauses considered

Aleph Ensembles We construct ensembles of theories Algorithm ( Dutra et al ILP 2002 ) Use K different initial seeds Learn K theories containing C clauses Rank examples by the number of theories Need to balance C for high performance Small C leads to low recall Large C leads to converging theories

Aleph Ensembles (100 theories)

Clause Weighting Single Theory Ensemble rank by how many clauses cover examples Weight clauses using tuneset statistics Ordered Rank by Precision or Lowest False Positive Rate Average Among all matching clauses Cumulative PrecisionDiversity on negatives F1 scoreRecall

Clause Weighting

Gleaner Develop fast ensemble algorithms focused on recall and precision evaluation Definition of Gleaner One who gathers grain left behind by reapers Key Ideas of Gleaner Keep wide range of clauses Create separate theories for different recall ranges

Gleaner - Background Rapid Random Restart ( Zelezny et al ILP 2002 ) Stochastic selection of initial clause Time-limited local heuristic search Randomly choose new initial clause and repeat seed initial

Gleaner - Learning Precision Recall Create B Bins Generate Clauses Record Best per Bin Repeat for K seeds

Gleaner - Combining Combine K clauses per bin If at least L of K clauses match, call example positive How to choose L ? L=1 then high recall, low precision L=K then low recall, high precision Our method Choose L such that ensemble recall matches bin b Bin b’s precision should be higher than any clause in it We should now have set of high precision rule sets spanning space of recall levels

How to use Gleaner Precision Recall Generate Curve User Selects Recall Bin Return Classifications With Precision Confidence Recall = 0.50 Precision = 0.70

Experimental Methodology Performed five-fold cross-validation Variation of parameters Gleaner (20 recall bins) seeds = {25, 50, 75, 100} clauses = {1K, 10K, 25K, 50K, 100K, 250K, 500K} Aleph Ensembles (0.75 minacc, 35,000 nodes) theories = {10, 25, 50, 75, 100} clauses per theory = {1, 5, 10, 15, 20, 25, 50} Clause Weighting (1 Aleph theory) clauses = {25, 50, 100, 271}

Empirical Results

Results: Testfold 5 at 1,000,000 clauses Ensembles Gleaner

Results: Testfold 5 at 1,000,000 clauses

Conclusions Gleaner Focuses on recall and precision Keeps wide spectrum of clauses Aleph ensembles ‘Early stopping’ helpful Clause Weighting Cumulative statistics important AURPC Useful metric for comparison Interpolation unintuitive

Talk Outline Background Inductive Logic Programming Evaluation Metrics Biomedical Information Extraction Preliminary Work Three Ensemble Approaches Empirical Results Proposed Work Extensions to Algorithms Theoretical Results

Proposed Work Improve Gleaner in High Recall areas Need more emphasis on diverse clauses Search for clauses that optimize AURPC Use RankBoost and AURPC heuristic Examine more ILP link-learning datasets Focus within Information Extraction Better understanding of AURPC Relationship with ROC curves, F1-score

Gleaner – Precision Bins Precision Recall Create B Bins Generate Clauses Record Best per Bin Repeat for K seeds

Gleaner – Save Per Jump Rapid Random Restart makes Jumps Every 1,000 clauses, find new space to search Saving best “per jump” will increase diversity seed initial

Gleaner – Negative Seeds High Recall clauses found at top of lattice Perform Breadth-First Search Bias search away from Negative Examples seed

ROC vs. RP Curves

What is the relationship between ROC curves and RP curves? Will optimizing one optimize the other? ROC vs RP Curves

Optimizing AURPC WARNING! SLIDE INCOMPLETE!

Acknowledgements USA NLM Grant 5T15LM USA NLM Grant 1R01LM USA DARPA Grant F USA Air Force Grant F Condor Group David Page Vitor Santos Costa, Ines Dutra Soumya Ray, Marios Skounakis, Mark Craven