Biomedical Information Extraction using Inductive Logic Programming Mark Goadrich and Louis Oliphant Advisor: Jude Shavlik Acknowledgements to NLM training.

Slides:



Advertisements
Similar presentations
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Advertisements

Data Mining Classification: Alternative Techniques
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.
Machine Learning and the Semantic Web
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Information Extraction CS 652 Information Extraction and Integration.
Information Extraction CS 652 Information Extraction and Integration.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.
Sampletalk Technology Presentation Andrew Gleibman
Recent Trends in Text Mining Girish Keswani
Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.
Information Extraction: Distilling Structured Data from Unstructured Text. -Andrew McCallum Presented by Lalit Bist.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, February 4, 2000 Lijun.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Ensemble Methods: Bagging and Boosting
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
KDD-2008 Anticipating Annotations and Emerging Trends in Biomedical Literature Fabian Mörchen, Mathäus Dejori, Dmitriy Fradkin, Julien Etienne, Bernd Wachmann.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Gleaning Relational Information from Biomedical Text Mark Goadrich Computer Sciences Department University of Wisconsin - Madison Joint Work with Jude.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Class Imbalance in Text Classification
Learning Ensembles of First- Order Clauses That Optimize Precision-Recall Curves Mark Goadrich Computer Sciences Department University of Wisconsin - Madison.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
Recent Trends in Text Mining
Ensembles (Bagging, Boosting, and all that)
School of Computer Science & Engineering
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Louis Oliphant and Jude Shavlik
Mark Goadrich Computer Sciences Department
Batyr Charyyev.
Mark Rich & Louis Oliphant
Ensembles (Bagging, Boosting, and all that)
Presentation transcript:

Biomedical Information Extraction using Inductive Logic Programming Mark Goadrich and Louis Oliphant Advisor: Jude Shavlik Acknowledgements to NLM training grant 5T15LM

Abstract Automated methods for finding relevant information from the large amount of biomedical literature are needed. Information extraction (IE) is the process of finding facts from unstructured text such as biomedical journals and putting those facts in an organized system. Our research mines facts about a relationship (e.g. protein localization) from PubMed abstracts. We use Inductive Logic Programming (ILP) to learn a set of logical rules that explain when and where a relationship occurs in a sentence. We build rules by finding patterns in syntactic as well as semantic information for each sentence in a training corpus that has been previously marked with the relationship. These rules can then be used on unmarked text to find new instances of the relation. Some major research issues involved in this approach are handling unbalanced data, searching the enormous space of clauses, learning probabilistic logical rules, and incorporating expert background knowledge.

The Central Dogma Discoveries  protein - protein interactions  protein localizations  genetic diseases Most knowledge stored in articles Just Google it? *image courtesy of National Human Genome Research Institute

World of Publishing Current  authors write articles in Word, LaTeX, and publish in conferences, journals, etc  humans index and extract relevant information (time and cost intensive) Future?  all published articles available on the Web  semantic web – extension of HTML for content  articles automatically annotated and indexed into searchable databases

Information Extraction Given a set of abstracts tagged with biological relationships between phrases Do learn a theory (eg, set of inference rules) that accurately extracts these relations

Training Data

Why Use ILP? KDD Cup 2002  logical rules (handcrafted) did best on IE task Hypotheses are comprehensible  written in first-order predicate calculus (FOPC)  aim to cover only positive examples Background knowledge easily incorporated  expert advice  linguistic knowledge of English parse trees  biomedical knowledge (eg. MESH)

ILP Example: Family Tree Positive  daughter(mary, ann)  daughter(eve, tom) Negative  daughter(tom, ann)  daughter(eve, ann)  daughter(ian, tom)  daughter(ian, ann)  … Background Knowledge mother(ann, mary) mother(ann, tom) father(tom, eve) father(tom, ian) female(ann) female(mary) female(eve) male(tom) male(ian) Ann IanEve MaryTom Possible Rules daughter(A,B) if male(A) and father(B,A) daughter(A,B) if mother(B,A) daughter(A,B) if female(A) and male(B) daughter(A,B) if female(A) and mother(B,A) Father Mother

Sundance Parsing Sentence …NP-Conj segVP segmentNP segment smf1 and smf2 unkconjunk are cop mitochondrial membrane_proteins Sentence Structure Predicates parent(smf1,np-conj seg) parent(np-conj seg,sentence) child(np-conj seg,smf1) child(sentence,np-conj seg) next(smf1,and) next(np-conj seg,vp seg) after(np-conj seg,np seg) … Part of Speech Predicates noun(membrane_proteins) verb(are) unk(smf1) noun_phrase(np seg) verb_phrase(vp seg) … … …… unknoun Lexical Word Predicates novelword(smf1) novelword(smf2) alphabetic(and) alphanumeric(smf1) … Biomedical Knowledge Predicates in_med_dict(mitochondrial) go_mitochondrial_membrane(smf1) go_mitochondrion(smf1) …

Sample Learned Rule gene_disease(E,A) :- isa_np_segment(E), isa_np_segment(A), prev(A,B), pp_segment(B), child(A,C), next(C,D), alphabetic(D), novelword(C), child(E,F), alphanumeric(F). Sent. AEB CDF Noun Phrase Prepositional Phrase Novel WordAlphabetic WordAlphanumeric Word

Ensembles for Rules N heads are better than one…  learn multiple (sets of) rules with training data  aggregate the results by voting on classification of testing data Bagging (Brieman ’96)  each rule-set gets one vote Boosting (Freund and Shapire ’96)  each rule gets weighted vote

Drawing a PR Curve ConfClassPreRec Recall Precision

Testset Results Craven Group Boosting Rule Quality Bagging

Handling Large Skewed Data 5 fold cross-validation  train : 1007 positive / 240,874 negative  test : 284 positive / 243,862 negative With a 95% accurate rule set …  270 true positives  12,193 false positives!  recall = 270 / 284 = 95.0%  precison = 270 / 12,363 = 2.1%

Handling Large Skewed Data Ways to handle data  assign different costs to each class much more important to not cover negatives  under-sampling with bagging negatives under-represented key is to pick good negatives filter data to restore equal ratio in testing data  use naïve Bayes to learn relational parts

pos neg noun phrase filter split into parts genesdiseases naïve Bayes filter join back pos neg Filters to Reduce Negatives 1 : : 1,979 1 : 39

Probabilistic Rules Logical rules are too strict and often overfit Add probabilistic weight to each rule  based on accuracy on tuning set Learn parameters  make each rule a binary feature  use any standard Machine Learning algorithm (Naïve Bayes, perceptron, logistic regression…) to learn the weights  assign probability to examples based on weights

Weighted Exponential Model where is a weight for each feature Taking logs we get We need to set to maximize log probability of the tuning set

Weighted Exponential Model

Incorporating Background Knowledge Creation of predicates that capture salient features  endsIn(word, ‘ase’)  occursInAbstractNtimes(word, 5) Incorporation of prior knowledge into the learning system  protein(word) if endsIn(word, ‘ase’) and occursInAbstractNtimes(word, 5).

Searching in Large Spaces Probabilistic bottom clause  probabilistically remove least significant predicates from the “bottom clause” Random rule generation  in place of hill-climbing, randomly select rules of a given length from the bottom clause  retain only those rules which do well on a tune set Learn coverage of clauses  neural network, Bayesian learning, etc.

References Nelson, Stuart J.; Powell, Tammy; Humphreys, Betsy L. The Unified Medical Language System (UMLS) Project. In: Encyclopedia of Library and Information Science. Forthcoming.The Unified Medical Language System (UMLS) Project Christopher D. Manning and Hinrich Schutze Foundations of Statistical Natural Language Processing MIT Press Ellen Riloff The Sundance Sentence Analyzer. 2002The Sundance Sentence Analyzer Ines De Castro Dutra, et. al. An Emperical Evaluation of Bagging in Inductive Logic Programming in Proceedings of the International Conference on Inductive Logic Programming. Syndey, Australia.An Emperical Evaluation of Bagging in Inductive Logic Programming Dayne Frietag and Nicholas Kushmerick Boosted Wrapper Induction in Proceedings of American Association of Artificial Intelligence (AAAI-2000)Boosted Wrapper Induction Souyma Ray and Mark Craven Representing Sentence Structure in Hidden Markov Models for Information Extraction in Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI-2001)Representing Sentence Structure in Hidden Markov Models for Information Extraction Tina Eliassi-Rad and Jude Shavlik A Theory-Refinement Approach to Information Extraction in Proceedings of the 18th International Conference on Machine LearningA Theory-Refinement Approach to Information Extraction M. Craven and J. Kumlien Constructing biological knowledge-bases by extracting information from text sources in Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages Germany.Constructing biological knowledge-bases by extracting information from text sources Leo Breiman. Bagging Predictors Machine Learning, 24(2): Bagging Predictors Yoav Freund and Robert E. Schapire. Experiments with a New Boosting Algorithm in International Conference on Machine Learning, pages Experiments with a New Boosting Algorithm