Classification by Machine Learning Approaches Michael J. Kerner – Center for Biological Sequence Analysis Technical.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Lecture 3: CBR Case-Base Indexing
COMP3740 CR32: Knowledge Management and Adaptive Systems
Decision Trees Decision tree representation ID3 learning algorithm
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Classification Techniques: Decision Tree Learning
Decision Trees.
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Pattern Recognition: Readings: Ch 4: , , 4.13
Biological Data Mining (Predicting Post-synaptic Activity in Proteins)
Decision Trees an Introduction.
New EDA-approaches to feature selection for classification (of biological sequences) Yvan Saeys.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
K Nearest Neighbor Classification Methods Qiang Yang.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Algorithms for Classification: The Basic Methods.
5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z
Introduction to machine learning
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Evaluating Classifiers
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Machine Learning Chapter 3. Decision Tree Learning
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Appendix: The WEKA Data Mining Software
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
CpSc 810: Machine Learning Decision Tree Learning.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
Learning from Observations Chapter 18 Through
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Decision-Tree Induction & Decision-Rule Induction
Knowledge Discovery via Data mining Enrico Tronci Dipartimento di Informatica, Università di Roma “La Sapienza”, Via Salaraia 113, Roma, Italy,
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Classification And Bayesian Learning
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
W E K A Waikato Environment for Knowledge Aquisition.
Example: input data outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes.
K Nearest Neighbor Classification Methods. Training Set.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
An Exercise in Machine Learning
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
10. Decision Trees and Markov Chains for Gene Finding.
Classification Algorithms
Prepared by: Mahmoud Rafeek Al-Farra
Decision Tree Saed Sayad 9/21/2018.
Clustering.
Machine Learning: Lecture 3
CSCI N317 Computation for Scientific Applications Unit Weka
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Classification by Machine Learning Approaches Michael J. Kerner – Center for Biological Sequence Analysis Technical University of Denmark

Outline Introduction to Machine Learning Datasets, Features Feature Selection Machine Learning Approaches (Classifiers) Model Evaluation and Interpretation Examples, Exercise

Machine Learning – Data Driven Prediction To Learn: “to gain knowledge or understanding of or skill in by study, instruction, or experience” (Merriam Webster English Dictionary, 2005) Machine Learning: Learning the theory automatically from the data, through a process of inference, model fitting, or learning from examples: Automated extraction of useful information from a body of data by building good probabilistic models. Ideally suited for areas with lots of data in the absence of a general theory.

Why do we need Machine Learning? Some tasks cannot be defined well, except by examples (e.g. recognition of faces or people). Large amounts of data may have hidden relationships and correlations. Only automated approaches may be able to detect these. The amount of knowledge about a certain problem / task may be too large for explicit encoding by humans (e.g. in medical diagnostics) Environments change over time, and new knowledge is constantly being discovered. A continuous redesign of the systems “by hand” may be difficult.

The Machine Learning Approach Input Data Classifier ML e.g. Gene Expression Profiles, … Machine Learning Prediction: Yes / No

Machine Learning Learning Task: –What do we want to learn or predict? Data and assumptions: –What data do we have available? –What is their quality? –What can we assume about the given problem? Representation: –What is a suitable representation of the examples to be classified? Method and Estimation: –Are there possible hypotheses? –Can we adjust our predictions based on the given results? Evaluation: –How well does the method perform? –Might another approach/model perform better?

Learning Tasks Classification: –Prediction of an item class. Forecasting: –Prediction of a parameter value. Characterization: –Find hypotheses that describe groups of items. Clustering: –Partitioning of the (unassigned) data set into clusters with common properties. (Unsupervised learning)

Emergence of Large Datasets Dataset examples: Image processing Spam detection Text mining DNA micro-array data Protein function Protein localization Protein-protein interaction …

Dataset Examples Edible or poisonous ?

Dataset Examples

mRNA Splicing

mRNA Splice Site Prediction

Protein Function Prediction: ProtFun Predict as many biologically relevant features as we can from the sequence Train artificial neural networks for each category Assign a probability for each category from the NN outputs

############## ProtFun 2.2 predictions ######## >KCNA1_HUMAN # Functional category Prob Odds Amino_acid_biosynthesis Biosynthesis_of_cofactors Cell_envelope Cellular_processes Central_intermediary_metabolism Energy_metabolism Fatty_acid_metabolism Purines_and_pyrimidines Regulatory_functions Replication_and_transcription Translation Transport_and_binding => # Enzyme/nonenzyme Prob Odds Enzyme Nonenzyme => # Enzyme class Prob Odds Oxidoreductase (EC ) Transferase (EC ) Hydrolase (EC ) Lyase (EC ) Isomerase (EC ) Ligase (EC ) # Gene Ontology category Prob Odds Signal_transducer Receptor Hormone Structural_protein Transporter Ion_channel Voltage-gated_ion_channel => Cation_channel Transcription Transcription_regulation Stress_response Immune_response Growth_factor Metal_ion_transport

Complexity of datasets: Many instances (examples) Instances with multiple features (properties / characteristics) Dependencies between the features (correlations) Emergence of Large Datasets

Data Preprocessing Instance selection: –Remove identical / inconsistent / incomplete instances (e.g. reduction of homologous genes, removal of wrongly annotated genes) Feature transformation / selection: –Projection techniques (e.g. principal components analysis) –Compression techniques (e.g. minimum description length) –Feature selection techniques

Benefits of Feature Selection Attain good and often even better classification performance using a small subset of features –Less noise in the data Provide more cost-effective classifiers –Less features to take into account  smaller datasets  faster classifiers Identification of (biologically) relevant features for the given problem

Feature Selection Feature Subset Selection Learning Algorithm All Features Feature Subset Selection Learning Algorithm All Features Feature Subset Search Algorithm Selection Criterion Learning Algorithm Selected Features Evaluation Optimal Features Optimal Features Optimal Features Filter approachWrapper approach

Filter Approach Independent of the classification model A relevance measure for each feature is calculated Features with a value lower than a selected threshold t will be removed Example: Feature-class entropy Measures the “uncertainty” about the class when observing feature i f1 f2 f3 f4 class

Wrapper approach Specific to a classification algorithm The search for a good feature subset is guided by a search algorithm The algorithm uses the evaluation of the classifier as a guide to find good feature subsets Search algorithm examples: sequential forward or backward search, genetic algorithms Sequential backward elimination –Starts with the set of all features –Iteratively discards the feature whose removal results in the best classification performance

Wrapper approach Full feature set : f1,f2,f3,f4 f2,f3,f4 0.7 f1,f3,f4 0.8 f1,f2,f4 0.1 f1,f2,f f3,f f1,f4 0.1 f1,f3 0.8 f4 0.2 f3 0.7

Classification Methods -Decision trees -Hidden Markov Models (HMMs) -Support vector machines -Artificial Neural Networks -Bayesian methods -…

Decision Trees Simple, practical and easy to interpret Given a set of instances (with a set of features), a tree is constructed with internal nodes as the features and the leaves as the classes

Example Dataset: Shall we play golf? Instance Attributes / Features Class dayoutlooktemperaturehumiditywindyPlay Golf ? 1sunnyhothighFALSEno 2sunnyhothighTRUEno 3overcasthothighFALSEyes 4rainymildhighFALSEyes 5rainycoolnormalFALSEyes 6rainycoolnormalTRUEno 7overcastcoolnormalTRUEyes 8sunnymildhighFALSEno 9sunnycoolnormalFALSEyes 10rainymildnormalFALSEyes 11sunnymildnormalTRUEyes 12overcastmildhighTRUEyes 13overcasthotnormalFALSEyes 14rainymildhighTRUEno todaysunnycoolhighTRUE?

Example: Shall we play golf today? WEKA data file (arff format) outlook {sunny, overcast, temperature {hot, mild, humidity {high, windy {TRUE, play {yes, sunny,hot,high,FALSE,no sunny,hot,high,TRUE,no overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes rainy,cool,normal,TRUE,no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no sunny,cool,normal,FALSE,yes rainy,mild,normal,FALSE,yes sunny,mild,normal,TRUE,yes overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes rainy,mild,high,TRUE,no InstanceIndependent features (attributes)Class DayOutlookTemperatureHumidityWindyPlay Golf? 1sunnyhothighFALSEno 2sunnyhothighTRUEno 3overcasthothighFALSEyes 4rainymildhighFALSEyes 5rainycoolnormalFALSEyes 6rainycoolnormalTRUEno 7overcastcoolnormalTRUEyes 8sunnymildhighFALSEno 9sunnycoolnormalFALSEyes 10rainymildnormalFALSEyes 11sunnymildnormalTRUEyes 12overcastmildhighTRUEyes 13overcasthotnormalFALSEyes 14rainymildhighTRUEno

Feature compositions sunnyovercastrainy hot coolmild high normal TrueFalseYES NO YES

Decision Trees J48 pruned tree outlook = sunny | humidity = high: no (3.0) | humidity = normal: yes (2.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) Number of Leaves : 5 Size of the tree : 8 Attributes / Features Attribute Values Classes

Artificial Neural Networks (ANNs) Artificial Neuron Neural Network

Overfitting Overfitting: A classifier that performs well on the training examples, but poorly on new examples. Training and testing on the same data will generally produce a good classifier (for this dataset) with high overfitting. To avoid overfitting: Use separate training and testing data Use cross-validation Use the simplest model possible

Performance Evaluation Cross-Validation (10 fold) Data Training Set Test Set Performance Evaluation Classifier ML (9/10) (1/10) 10x

Performance Evaluation Confusion Matrix TPTrue Positives TNTrue Negatives FPFalse Positives FNFalse Negatives PredictedLabel positivenegative Known positive TPFN Label negative FPTN

Performance Evaluation Precision (PPV)TP / (TP + FP) –Percentage of correct positive predictions Recall / SensitivityTP / (TP + FN) –Percentage of positively labeled instances, also predicted as positive SpecificityTN / (TN + FP) –Percentage of negatively labeled instances, also predicted as negative Accuracy(TP + TN) / (TP + TN + FP + FN) –Percentage of correct predictions Correlation Coefficient(TP * TN – FP * FN) (TP+FP)*(FP+TN)*(TN+FN)*(FN+TP) -1 ≤ cc ≤ 1cc = 1 : no FP or FN cc = 0 : random cc = -1: only FP and FN

ROC – Receiver Operating Characteristic ( FP / (FP + TN) ) False Positive Rate, (1 - Specificity) True Positive Rate, Sensitivity TP / (TP + FN)

ROC – Receiver Operating Characteristic 1 - Specificity Sensitivity

Case Study - Splice Site Prediction

Splice site prediction: Correctly identify the borders of introns and exons in genes (splice sites) Important for gene prediction Split up into 2 tasks: –Donor prediction (exon -> intron) –Acceptor prediction (intron -> exon)

Case Study - Splice Site Prediction Splice sites are characterized by a conserved dinucleotide in the intron part of the sequence –Donor sites : –Acceptor sites : Classification problem: –Distinguish between true GT, AG and false GT, AG.

Case Study - Splice Site Prediction Position dependent features e.g. an A on position 1, C on position 17, …. Position independent features e.g. subsequence “TCG” occurs, “GAG” occurs,… atcgatcagtatcgat GT ctgagctatgag Features:

Original Data – Human Acceptor Splice Site Sites >HUMGLUT4B_3535 GGGCCCCTAGCGGAAGGAAAAAAATCATGGTTCCATGTGACATGCTGTGTCTTTGTGTCTGCCTGTTCAGGATGGGGAACCCCCTCAGCA >HUMGLUT4B_3763 GAGGACAGGTGTCTCGGGGGTGGTGGAAAGGGGACGGTCTGCAGGAAATCTGTCCTCTGCTGTCCCCCAGGTGATTGAACAGAGCTACAA >HUMGLUT4B_4028 TGGGGGAAACAGGAAGGGAGCCACTGCTGGGTGCCCTCACCCTCACAGCCTCACTCTGTCTGCCTGCCAGGAAAAGGGCCATGCTGGTCA >HUMGLUT4B_4276 TGGGCTTTCAGATGGGAATGGACACCTGCCCTCAGCCCTCTCTTCTTCCCTCGCCCAGGGCTGACATCAGGGCTGGTGCCCATGTACGTG >HUMGLUT4B_4507 ATATGGTGGGCTTCCAAGGTAAGGCAGAAGGGCTGAGTGACCTGCCTTCTTTCCCAACCTTCTCCCACAGGTGCTGGGCTTGGAGTCCCT >HUMGLUT4B_4775 GCCTCCGCCTCATCTTGCTAGCACCTGGCTTCCTCTCAGGTCCCCTCAGGCCTGACCTTCCCTTCTCCAGGTCTGAAGCGCCTGACAGGC >HUMGLUT4B_5125 CCAGCCTGTTGTGGCTGGAGTAGAGGAAGGGGCATTCCTGCCATCACTTCTTCTTCTCCCCCACCTCTAGGTTTTCTATTATTCGACCAG >HUMGLUT4B_5378 CCTCACCCACGCGGCCCCTCCTACTTCCCGTGCCCAAAAGGCTGGGGTCAAGCTCCGACTCTCCCCGCAGGTGTTGTTGGTGGAGCGGGC >HUMGLUT4B_5995 CTGAGTTGAGGGCAAGGGAAGATCAGAAAGGCCTCAACTGGATTCTCCACCCTCCCTGTCTGGCCCCTAGGAGCGAGTTCCAGCCATGAG >HUMGLUT4B_6716 CTGGTTGCCTGAAACTACCCCTTCCCTCCCCACCTCACTCCGTCAACACCTCTTTCTCCACCTGTCCCAGGAGGCTATGGGGCCCTACGT >HSRPS6G_1493 CTTTGTAGATGGCTCTACAATTACCTGTATAGATAGTTTCGTAAACTATTTCCCCCCTTTTAATCCTTAGCTGAACATCTCCTTCCCAGC [...]

Arff Data File _A -68_T -68_C -68_G -67_A -67_T -67_C -67_G {0,1} 20_A 20_T 20_C 20_G class 0,0,0,1,0,0,0,1, [...],1,0,0,0,true 0,0,0,1,1,0,0,0, [...],1,0,0,0,true 0,1,0,0,0,0,0,1, [...],1,0,0,0,true 0,1,0,0,0,0,0,1, [...],0,0,0,1,true [...] 1,0,0,0,0,1,0,0, [...],0,1,0,0,true 0,0,0,1,0,0,1,0, [...],0,0,1,0,true 0,0,1,0,0,0,1,0, [...],0,0,0,1,true 0,0,1,0,0,0,1,0, [...],0,0,1,0,true The original sequence files in FASTA format have been converted to represent the four DNA bases in a binary fashion A: T: C: G:

Case Study - Splice Site Prediction Local context of 88 nucleotides around the splice site 88 position dependent features A=1000, T=0100, C=0010, G=0001  352 binary features Reduce the dataset to contain fewer but relevant features

352 Binary features

15 Binary features

Case Study – Splice Site Sequence Logos Acceptor Sites: Donor Sites:

Exercise: Building a prediction tool for human mRNA splice sites Feature selection for classification of splice sites Tool: The WEKA machine learning toolkit. Go to and follow the instructions

Acknowledgements Slides and Exercises Adapted from and inspired by: Søren Brunak David Gilbert, Aik Choon Tan Yvan Saeys