Pfizer HTS Machine Learning Algorithms: November 2002

Slides:

Advertisements

Similar presentations

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Advertisements

ECG Signal processing (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Indian Statistical Institute Kolkata

1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)

Instance Based Learning

Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.

Support Vector Machine

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Classification and risk prediction

Decision Tree Rong Jin. Determine Milage Per Gallon.

Instance Based Learning

Fraud Detection Experiments Chase Credit Card –500,000 records spanning one year –Evenly distributed –20% fraud, 80% non fraud First Union Credit Card.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.

Linear Models for Classification

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.

LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.

Usman Roshan Dept. of Computer Science NJIT

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Neural networks and support vector machines

PREDICT 422: Practical Machine Learning

Data Transformation: Normalization

Machine Learning – Classification David Fenyő

Chapter 7. Classification and Prediction

ECE 471/571 - Lecture 19 Review 02/24/17.

DEEP LEARNING BOOK CHAPTER to CHAPTER 6

Classification with Gene Expression Data

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Machine Learning Basics

NEURAL NETWORK APPROACHES FOR AUTOMOBILE MPG PREDICTION

Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.

Machine Learning Week 1.

Students: Meiling He Advisor: Prof. Brain Armstrong

Machine Learning Today: Reading: Maria Florina Balcan

Hyperparameters, bias-variance tradeoff, validation

ECE 471/571 – Review 1.

CS 478 Homework CS Homework.

Word Embedding Word2Vec.

iSRD Spam Review Detection with Imbalanced Data Distributions

Biointelligence Laboratory, Seoul National University

Support Vector Machine I

Machine Learning: UNIT-4 CHAPTER-1

CSE 491/891 Lecture 25 (Mahout).

Nearest Neighbors CSC 576: Data Mining.

Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.

Linear Discrimination

Predicting Loan Defaults

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Support Vector Machines 2

Hairong Qi, Gonzalez Family Professor

Machine Learning for Cyber

ECE – Pattern Recognition Midterm Review

Presentation transcript:

Pfizer HTS Machine Learning Algorithms: November 2002 Paul Hsiung (hsiung+@cs.cmu.edu) Paul Komarek (komarek@cs.cmu.edu) Ting Liu (tingliu@cs.cmu.edu) Andrew W. Moore (awm@cs.cmu.edu) Auton Lab, Carnegie Mellon University School of Computer Science www.autonlab.org

Datasets Our Name Num. Records Num Attributes Num non-zero input cells Num positive outputs Description train1 26,733 6,348 3.7M 804 The original dataset sent to CMU in Feb 2002 test1 1,456 6,121 0.2M 878 The test set associated with the above training set jun-3-1 88,358 1,143,054 30M 423 The large “TEST3” dataset sent to us in May 2002. the “-1” at the end denotes that we were using the first of the four activation columns combined 211 Combining the “TEST3” datasets. The activation in Combined is positive if and only if at least two of the four original activations were positive. Auton Lab, www.autonlab.org

Projections train1 train100 train10 test1 test100 test10 train-pls-100 Our Name name given to original name given to 100 dimensional projection name given to 10 dimensional projection train1 train100 train10 test1 test100 test10 train-pls-100 train-pls-10 test-pls-100 test-pls-10 jun-3-1 n/a combined Auton Lab, www.autonlab.org

Previous Algorithms BC Bayes Classifier Dtree Decision Tree SVM On original data, a naïve categorical classifier was used. On Real-valued projected data, a Naïve Gaussian classifier was used. Dtree Decision Tree This technique is also known as Recursive Partitioning and CART. It was only implemented for the original data. SVM Support Vector Machine. Except where stated otherwise, a linear SVM was used. We could not find significant performance difference between Linear SVM and Radial Basis Function SVM with a variety of RBF parameters. k-NN k-nearest neighbor Except where stated otherwise, k=9 neighbors were used. Only implemented for projected data. LR Logistic Regression Except where stated otherwise, used Conjugate Gradient to perform intermediate weighted regressions, using a newly developed technique. Auton Lab, www.autonlab.org

New Algorithms new-KNN Tractable High dimensional k-nearest neighbor Can work on the 1,000,000 dimensional “June” data. EFP Explicit False Positive Logistic Regression Logistic regression that accounts for the high false positive rate. SMod Super Model. Automatically combining the predictions from multiple algorithms with a “meta-level” of logistic regression. PLS-proj Partial Least Squares Projection Using PLS instead of PCA to project down data PLS Partial Least Squares Prediction Using the PLS algorithm as a predictor Auton Lab, www.autonlab.org

Explicit False Positive Model Auton Lab, www.autonlab.org

Explicit False Positive Model Auton Lab, www.autonlab.org

Example in 2 dimensions: Decision Boundary Auton Lab, www.autonlab.org

Example in 2 dimensions: 100 true positives Auton Lab, www.autonlab.org

100 true positives and 100 true negatives Auton Lab, www.autonlab.org

100 TP, 100 TN, 10 FP Auton Lab, www.autonlab.org

Using regular logistic regression Auton Lab, www.autonlab.org

Using EFP Model Auton Lab, www.autonlab.org

Example: 10000 true positives Auton Lab, www.autonlab.org

10000 true positives, 10000 true negatives Auton Lab, www.autonlab.org

10000 TP, 10000 TN, 1000 FP Auton Lab, www.autonlab.org

Using regular logistic regression Auton Lab, www.autonlab.org

Using EFP Model Auton Lab, www.autonlab.org

EFP Model Real Data Results K-fold Auton Lab, www.autonlab.org

EFP Effect …Very impressive on Train1 / Test1 Auton Lab, www.autonlab.org

Log X-axis Auton Lab, www.autonlab.org

EFP Effect …Unimpressive on jun31 / jun32 Auton Lab, www.autonlab.org

Super Model Divide Training Set into Compartment A and Compartment B Learn each of N models on Compartment A Predict each of N models on Compartment B Learn best weighting of opinions with Logistic Regression of Predictions on Compartment B Apply the models and their weights to Test Data Auton Lab, www.autonlab.org

Comparison Auton Lab, www.autonlab.org

Log X-Axis Scale Auton Lab, www.autonlab.org

Comparison on 100-dims Auton Lab, www.autonlab.org

Log X-axis Auton Lab, www.autonlab.org

Comparison on 10 dims Auton Lab, www.autonlab.org

Log X-axis Auton Lab, www.autonlab.org

NewKNN summary of results and timings Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

PLS summary of results PLS projections did not do so well. However, PLS as a predictor performed well, especially under train100/test100. PLS is fast. The runtime varies from 1 to 10 minutes. But PLS takes large amounts of memory. Impossible to use in a sparse representation. (This is due to the update on each iteration.) Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Auton Lab, www.autonlab.org

Summary of results SVM best early on in Train1, LR better in the long-haul. Projecting to 10-d always a disaster Projecting to 100-d often indistinguishable from behavior with original data (and much cheaper) Naïve Gaussian Bayes Classifier best on JUN-3-1 (k-nn better for long haul) Naïve Gaussian Bayes Classifier best on combined Non-linear SVM never seems distinguishable from Linear SVM All methods have won in at least one context, except Dtree. Auton Lab, www.autonlab.org

Some AUC Results * = Not statistically significantly different Experiment Algorithm AUC Train on Train1 then test on Test1 Linear SVM 0.876* Best non-Linear SVM 0.875* BC 0.867* LR 0.71 KNN 0.872* DTree 0.70 Combined SVM 0.638 0.700 0.606 0.603 * = Not statistically significantly different Auton Lab, www.autonlab.org

Some AUC Results Experiment Algorithm AUC 10-fold cross-validation on Train1 Linear SVM 0.919 BC 0.885 LR 0.933 DTree 0.894 Auton Lab, www.autonlab.org