Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost Vikas Raykar Balaji Krishnapuram Shipeng Yu Siemens Healthcare KDD 2010 TexPoint.

Slides:



Advertisements
Similar presentations
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.
Advertisements

AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Computer vision: models, learning and inference
Spatial Information in DW- and DCE-MRI Parametric Maps in Breast Cancer Research Hakmook Kang Department of Biostatistics Center for Quantitative Sciences.
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.
1 Using Biostatistics to Evaluate Vaccines and Medical Tests Holly Janes Fred Hutchinson Cancer Research Center.
The Viola/Jones Face Detector (2001)
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Copyright © Siemens Medical Solutions, USA, Inc.; All rights reserved. Polyhedral Classifier for Target Detection A Case Study: Colorectal Cancer.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
For internal use only / Copyright © Siemens AG All rights reserved. Multiple-instance learning improves CAD detection of masses in digital mammography.
Maximum Likelihood (ML), Expectation Maximization (EM)
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Multimodal Visualization for neurosurgical planning CMPS 261 June 8 th 2010 Uliana Popov.
Viola and Jones Object Detector Ruxandra Paun EE/CS/CNS Presentation
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
Breast Cancer 101 Barbara Lee Bass, MD, FACS Professor of Surgery
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
APPLICATION : DIAGNOSTIC CODING 1 SIEMENS  Coding is the translation of diagnosis terms describing patients diagnosis or treatment into a coded number.
Face Alignment Using Cascaded Boosted Regression Active Shape Models
197 Case Study: Predicting Breast Cancer Invasion with Artificial Neural Networks on the Basis of Mammographic Features MEDINFO 2004, T02: Machine Learning.
Data Mining to Aid Beam Angle Selection for IMRT Stuart Price-University of Maryland Bruce Golden- University of Maryland Edward Wasil- American University.
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
Efficient Model Selection for Support Vector Machines
Learning Based Hierarchical Vessel Segmentation
On ranking in survival analysis: Bounds on the concordance index
Learning Classifiers For Non-IID Data
Lecture 2: Bayesian Decision Theory 1. Diagram and formulation
Learning a Fast Emulator of a Binary Decision Process Center for Machine Perception Czech Technical University, Prague ACCV 2007, Tokyo, Japan Jan Šochman.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Robust Real-time Face Detection by Paul Viola and Michael Jones, 2002 Presentation by Kostantina Palla & Alfredo Kalaitzis School of Informatics University.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Mixed models. Concepts We are often interested in attributing the variability that is evident in data to the various categories, or classifications, of.
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Hybrid Classifiers for Object Classification with a Rich Background M. Osadchy, D. Keren, and B. Fadida-Specktor, ECCV 2012 Computer Vision and Video Analysis.
“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Multi-view Traffic Sign Detection, Recognition and 3D Localisation Radu Timofte, Karel Zimmermann, and Luc Van Gool.
Developing outcome prediction models for acute intracerebral hemorrhage patients: evaluation of a Support Vector Machine based method A. Jakab 1, L. Lánczi.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Munther Abualkibash University of Bridgeport, CT.
Automatic Classification for Pathological Prostate Images Based on Fractal Analysis Source: IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 28, NO. 7, JULY.
Private Data Management with Verification
Machine Learning Logistic Regression
Supervised learning from multiple experts
CS 698 | Current Topics in Data Science
Machine Learning Logistic Regression
Knowledge-Based Organ Identification from CT Images
Machine learning analysis for predicting survival in stage III non-small cell lung cancer patients receiving definitive chemotherapy and proton radiation.
Presentation transcript:

Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost Vikas Raykar Balaji Krishnapuram Shipeng Yu Siemens Healthcare KDD 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AAAAA A A A

Features incur a cost Features are acquired on demand. A set of features can be acquired as a group. Each feature group incurs a certain cost. Acquisition cost can be either – Computational | fast detectors – Financial | expensive medical tests – Human discomfort | biopsy

Example: Survival Prediction for Lung Cancer Feature GroupNumber of features examplesCost 1clinical features9gender, age0 no cost 2features before therapy 8lung function creatinine clearance 1 3imaging /treatment features 7gross tumor volume treatment dose 2 4blood bio-markers21Interleukin-8 Osteopontin 5 expensive 2-year survival prediction for lung cancer patients treated with chemo/radiotherapy increasing predictive power … increasing acquisition cost

A cascade of linear classifiers Stage 1 Stage 2Stage 3 increasing predictive power increasing acquisition cost increasing predictive power increasing acquisition cost Training each stage of the cascade Choosing the thresholds for each stage

Sequential Training of cascades Stage 1 Stage 2Stage 3 Conventionally each stage is trained using only examples that pass through all the previous stages. Training depends on the choice of the thresholds. For each choice of threshold we have to retrain.

Contributions of this paper Joint training of all stages of the cascade. – Notion of probabilistic soft cascades A knob to control the tradeoff between accuracy vs cost – Modeling the expected feature cost Decoupling the classifier training and threshold selection. – Post-selection of thresholds

Notation Stage 1 Stage 2Stage K

Soft Cascade Probabilistic version of the hard cascade. An instance is classified as positive if all the K stages predict it as positive. An instance is classified as negative if at least one of the K classifiers predicts it as negative.

Some properties of soft cascades Sequential ordering of the cascade is not important. Order definitely matters during testing. A device to ease the training process. We use a maximum a-posteriori (MAP) estimate with Laplace prior on the weights.

Joint cascade training Once we have a probabilistic cascade we can write the log-likelihood. We impose a Laplacian prior. Maximum a-posteriori (MAP) estimate

Accuracy vs Cost We would like to find the MAP estimate subject to the constraint that the expected cost for a new instance The expectation is over the unknown test distribution. Since we do not know the test distribution we estimate this quantity based on the training set.

Modeling the expected cost Stage 1 Stage 2Stage 3 For a given instanceCost Stage 1 Stage 2 Stage 3 We optimize using cyclic coordinate descent

Experiments Medical Datasets – Personalized medicine Survival prediction for lung cancer Tumor response prediction for rectal cancer – Computer aided diagnosis for lung cancer

Survival Prediction for Lung Cancer Feature GroupNumber of features examplesCost 1clinical features9gender, age0 no cost 2features before therapy 8lung function creatinine clearance 1 3imaging /treatment features 7gross tumor volume treatment dose 2 4blood bio-markers21Interleukin-8 Osteopontin 5 expensive 2-year survival prediction for advanced non-small cell lung cancer (NSCLC) patients treated with chemo/radiotherapy. 82 patients treated at MAASTO clinic among which 24 survived two years

Pathological Complete Response (pCR) Prediction for Rectal Cancer Feature GroupNumber of features Cost 1Clinical features60 2CT/PET scan features before treatment 21 3CT/PET scan features after treatment 210 Predict tumor response after chemo/radiotherapy for locally advanced rectal cancer 78 patients (21 had pCR)

Methods compared Single stage classifier Proposed soft cascade – With beta = 0 – Varying beta Sequential Training – Logistic Regression – AdaBoost [Viola-Jones cascade] – LDA

Evaluation Procedure 70 % for training 30 % for testing Area under the ROC Curve Normalized average cost per patient – Using all the features has a cost of 1 Results averages over 10 repetitions Thresholds for each stage chosen using a two- level hierarchical grid search

Results

Computer aided diagnosis Feature GroupNumber of features Average Cost secs secs secs Motivation here is to reduce the computational cost 196 CT scans with 923 positive candidates and negative candidates.

Test set FROC Curves

Conclusions Joint training of all stages of the cascade. – Notion of probabilistic soft cascades A knob to control the tradeoff between accuracy vs cost – Modeling the expected feature cost

Related work

Some open issues Order of the cascade The accuracy vs cost knob is not sensitive in all problem domains