Classification Using Top Scoring Pair Based Methods Tina Gui
Introduction Top Scoring Pair Experiments Design Future Work Conclusion Outline
Using DNA microarray technology, the limitations of current methods are 1 : 1. Small Samples 2. Lack of Interpretability Objective: Differentiate between two classes by finding pairs of genes whose expression levels typically invert from one class to the other D. Geman, C. d'Avignon, D. Naiman and R. Winslow (2004). "Classifying gene expression profiles from pairwise mRNA comparisons". Introduction
Rank-Based Approach Drawback: Information is lost using this procedure Comparison-Based Approach In some cases, accurate prediction can be achieved by comparing the expression levels of a single pair of genes Simple example to classifying gene expression profiles - Top Scoring Pair (TSP) Classifier Approaches
G genes whose expression levels X = {X 1, X 2, … X G } Each profile X has a true class label in {1, 2, … C} Ex. C = 2 Marker Gene Pairs (i, j) a significant difference in the probability of X i < X j from class 1 to class 2 profile classification is then based on the collection of distinguished pairs Top Scoring Pair
The quantities of interest p ij (c) = P (X i < X j |c), c = 1, 2 (P, probabilities of observing X i < X j in each class) Expression values Δ ij = |p ij (1) − p ij (2)| (Δ ij, the “Score” of (i, j). ) Top Scoring Pair
Rank the Expression Values Rank the scores Δ ij from largest-to-smallest Select all pairs achieving the Top score. Example of scoring a gene pair: 52 profiles -> class 1 50 profiles -> class 2 p ij (1) = 50/52 p ij (2) = 3/50 Top Scoring Pair
Computing the score Notes: Since p ij (1) > p ij (2), the classifier based on this gene pair votes for class 1 for a profile with X i < X j and for class 2 otherwise Top Scoring Pair Classifier
In some instances, the TSPs may change when the training data are perturbed by adding or deleting a few examples K-TSP classifier uses the k top scoring disjoint gene pairs from the list Increasing the accuracy of the TSP classifier K-TSP Classifier
Baseline Augmented Space Alternate Space Experiments Design
Raw Data Baseline TSP classifier (A 13 : A 45 ) (A 7 : A 21 ) (A 1 : A 72 ) (A 1 : A 25 ) : (A x : A y ) A1A1..A 13..A 21..A 45..AMAM M N
Adding top ranked pairs Augment K A1..A 72 A 7_45 A 13_21 A 1_72..A a_b M + K N K-TSP classifier (A 13 : A 45 ) (A 7 : A 21 ) (A 1 : A 72 ) (A 1 : A 25 ) : : (A a : A b )
Deal with the K-TSP columns only Alteration A 7_45 A 13_21 A 1_72..A x_y K N
Combination of Decision Tree and Top Scoring Pairs 1 1. Czajkowski M, Krtowski M. (2011) “Top Scoring Pair Decision Tree for Gene Expression Data Analysis,” Future Work
TSP classifier predictions are based entirely on the top-scoring pairs. Beauty of Top Scoring Pair - Simplicity Main Goal - Improve the classification accuracy Conclusion