© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance Dhruv Batra, Carnegie Mellon University Adarsh Kowdle, Cornell.
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Computer vision: models, learning and inference
On Discriminative vs. Generative classifiers: Naïve Bayes
Pattern Recognition and Machine Learning
Locally Constraint Support Vector Clustering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Chapter 6: Multilayer Neural Networks
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Classification / Regression Neural Networks 2
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
© Devi Parikh 2008 Localization and Segmentation of 2D High Capacity Color Barcodes Gavin Jancke Microsoft Research, Redmond Devi Parikh Carnegie Mellon.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Non-Bayes classifiers. Linear discriminants, neural networks.
Learning with AdaBoost
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CEE 6410 Water Resources Systems Analysis
Artificial Intelligence
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
Schizophrenia Classification Using
Classification / Regression Neural Networks 2
Boosting Nearest-Neighbor Classifier for Character Recognition
CS 188: Artificial Intelligence
Zhengjun Pan and Hamid Bolouri Department of Computer Science
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
[Figure taken from googleblog
Department of Electrical Engineering
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Parametric Methods Berlin Chen, 2005 References:
Linear Discrimination
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform

2 © Devi Parikh 2008 Outline  Motivation  Related work  dtransform  Results  Conclusion Motivation Related work dtransform Results Conclusion

3 © Devi Parikh 2008 Motivation  Consider a three-class classification problem  Multi-layer perceptron (MLP) neural network classifier  Normalized outputs for a test instance  class 1: 0.5  class 2: 0.4  class 3: 0.1  Which class do we pick?  If we looked deeper… ~ c 1 c1c class 1 ~ c 2 c c3c3 ~ c examples + examples class 2 Adaptability Motivation Related work dtransform Results Conclusion

4 © Devi Parikh 2008 Motivation  Diversity among classifiers due to different  Classifier types  Feature types  Training data subset  Randomness in learning algorithm  Etc.  Bring to common grounds for  Comparing classifiers  Combining classifiers  Cost considerations  Goal: A transformation that  Estimates posterior probabilities from classifier outputs  Incorporates statistical properties of trained classifier  Is independent of classifier type, etc. Motivation Related work dtransform Results Conclusion

5 © Devi Parikh 2008 Related work  Parameter tweaking  In two-class problems (biometric recognition), ROC curves are prevalent  Straightforward multi-class generalizations are not known  Different approaches for estimating posterior probabilities for different classifier types  Classifier type dependent  Do not adapt to statistical properties of classifiers post-training  Commonly used transforms:  Normalization  Softmax  Do not adapt Motivation Related work dtransform Results Conclusion

6 © Devi Parikh 2008 dtransform Set-up: “Multiple classifiers system”  Multiple classifiers  One classifier with multiple outputs  Any multi-class classification scenario where classification system gives a score for each class Motivation Related work dtransform Results Conclusion

7 © Devi Parikh 2008 dtransform  For each output  c  Raw output  c maps to transformed output 0.5  Raw output 0 maps to transformed output 0  Raw output 1 maps to transformed output 1  Monotonically increasing ~ cc cc examples + examples cc Motivation Related work dtransform Results Conclusion

8 © Devi Parikh 2008 dtransform raw output:  transformed output: D  = 0.1  = 0.9  = 0.5 Motivation Related work dtransform Results Conclusion

9 © Devi Parikh 2008 dtransform  Logistic regression  Two (not so intuitive) parameters to be set  Histogram itself  Non-parameteric: subject to overfitting  dtransform: just one intuitive parameter  Affine transform Motivation Related work dtransform Results Conclusion

10 © Devi Parikh 2008 Experiment 1  Comparison with other transforms  Same ordering, different values  Normalization and softmax  not adaptive  tsoftmax and dtransform  adaptive  Similar values, different ordering  softmax and tsoftmax Motivation Related work dtransform Results Conclusion

11 © Devi Parikh 2008 Experiment 1  Synthetic data  True posterior probabilities known  3 class problem  MLP neural network with 3 outputs Motivation Related work dtransform Results Conclusion

12 © Devi Parikh 2008 Experiment 1  Comparing classification accuracies Motivation Related work dtransform Results Conclusion

13 © Devi Parikh 2008 Experiment 1  Comparing KL distance Motivation Related work dtransform Results Conclusion

14 © Devi Parikh 2008 Experiment 2  Real intrusion detection dataset  KDD 1999  5 classes  41 features  ~ 5 million data points  Learn++ with MLP as base classifier  Classifier combination rules:  Weighted sum rule  Weighted product rule  Cost matrix involved Motivation Related work dtransform Results Conclusion

15 © Devi Parikh 2008 Experiment 2 Motivation Related work dtransform Results Conclusion

16 © Devi Parikh 2008 Conclusion  Parametric transformation to estimate posterior probabilities from classifier outputs  Straightforward to implement and gives significant classification performance boost  Independent of classifier type  Post-training  Incorporates statistical properties of trained classifier  Brings diverse classifiers to common grounds for meaningful comparisons and combinations Motivation Related work dtransform Results Conclusion

17 © Devi Parikh 2008 Thank you! Questions? Motivation Related work dtransform Results Conclusion