Active Cost-sensitive Learning (Intelligent Test Strategies)

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Harun Ug˘uz 2011.KBS A two-stage feature selection method for text categorization by.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology.
Mining databases with different schema: Integrating incompatible classifiers Andreas L Prodromidis Salvatore Stolfo Dept of Computer Science Columbia University.
Part2 AI as Representation and Search
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Transductive Reliability Estimation for Kernel Based Classifiers 1 Department of Computer Science, University of Ioannina, Greece 2 Faculty of Computer.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Planning under Uncertainty
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
AOI-ags Algorithms and inside Stories the School of Computing and Engineering of the University of Huddersfield Lizhen Wang July 2008.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.
Inspection Policies for Diagnostic Classification Brandon Blakeley Advisers: Dr. MK Jeong and Dr. Endre Boros.
Decision Trees with Minimal Costs Charles Ling, Qiang Yang and Jianning Wang 2004.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,
Using Value of Information to Learn and Classify under Hard Budgets Russell Greiner, Daniel Lizotte, Aloak Kapoor, Omid Madani Dept of Computing Science,
DISCUSSION Alex Sutton Centre for Biostatistics & Genetic Epidemiology, University of Leicester.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University.
Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) Prague Sept. 04.
Machine Learning Chapter 3. Decision Tree Learning
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Mohammad Ali Keyvanrad
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
ACTIVE LEARNING USING CONFORMAL PREDICTORS: APPLICATION TO IMAGE CLASSIFICATION HypHyp Introduction HypHyp Conceptual overview HypHyp Experiments and results.
RECENT DEVELOPMENTS OF INDUCTION MOTOR DRIVES FAULT DIAGNOSIS USING AI TECHNIQUES 1 Oly Paz.
Latent Tree Models & Statistical Foundation for TCM Nevin L. Zhang Joint Work with: Chen Tao, Wang Yi, Yuan Shihong Department of Computer Science & Engineering.
Agent-Based Hybrid Intelligent Systems and Their Dynamic Reconfiguration Zili Zhang Faculty of Computer and Information Science Southwest University
Genetic Algorithms K.Ganesh Reasearch Scholar, Ph.D., Industrial Management Division, Humanities and Social Sciences Department, Indian Institute of Technology.
1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Improving Decisions in Clinical Medicine Chapter 8.
K-Means clustering accelerated algorithms using the triangle inequality Ottawa-Carleton Institute for Computer Science Alejandra Ornelas Barajas School.
ISQS 7342 Dr. zhangxi Lin By: Tej Pulapa. DT in Forecasting Targeted Marketing - Know before hand what an online customer loves to see or hear about.
CSE & CSE6002E - Soft Computing Winter Semester, 2011 Course Review.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.
Decision Trees with Minimal Test Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Etc.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, Febuary 2, 2001 Presenter:Ajay.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Introduction to Machine Learning, its potential usage in network area,
Feature Selection: Algorithms and Challenges
Alan P. Reynolds*, David W. Corne and Michael J. Chantler
OPTIMIZATION OF MODELS: LOOKING FOR THE BEST STRATEGY
Fast Effective Rule Induction
Rule Induction for Classification Using
Information Management course
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Data Mining (and machine learning)
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
On applying pattern recognition to systems management
Tschandl P1,2, Argenziano G3, Razmara M4, Yap J4
Presentation transcript:

Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario, Canada cling@csd.uwo.ca http://www.csd.uwo.ca/faculty/cling Joint work with Victor Sheng, Qiang Yang, …

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Everything has a cost/benefit! Materials, products, services Disease, working/living condition, waiting, … Happiness, love, life, … Money, Sex and Happiness: An Empirical Study, by David G. Blanchflower & Andrew J. Oswald, in Journal The Scandinavian Journal of Economics. 106:3, 2004. Pages: 393-415 Lasting/happy marriage is worth about $100,000 in happiness Utility-based learning: optimization; unifies many issues & is ultimate goal

Everything has a cost/benefit! In medical diagnosis… Tests have costs: temperature ($1), X-ray ($30), biopsy ($900) Diseases have costs: flu ($100), diabetes (100k), cancer (108) Misdiagnosis has (different) costs Cost of false alarm ($500) << cost of missing a cancer ($500,000) Doctors: balance the cost of tests and misdiagnosis Our goal: to minimize the total cost Many other similar applications… Model this process Cost-sensitive learning Intelligent test strategies Patient Test 1 Test 2 … Test n Cancer? (Cost) $1 $30 ... $900 FP/FN= 100/300k 001 39 Low High 1 002 35 Med ? 003 42 New1 ? Med …

Review of Previous Work Cost-sensitive learning: a survey (Turney 2000) Active research, also for imbalanced data problem CS meta learning (wrapper): thresholding, sampling, weighting, … CS learning algorithms. CSNB, our CS trees …but all consider misclassification costs only Some work considers test costs only A few previous works consider both test costs and misclassification costs (Turney 1995, Zubek and Dietterich 2002, Lizotte et al 2003); all computationally expensive

Review of Previous Work Active learning: actively seeking for extra info Pool-based: a pool of unlabeled examples, which ones to label Membership query: Is this instance positive? Feature value acquisition During training. But “missing is useful!” During testing: our work Human learning is active in many ways

Review of Previous Work Diagnosis: wide applications in medicine, mechanical systems, software, … Most previous AI-based diagnosis systems… Manually built (partially) Does not incorporate costs/benefit Cannot actively suggest the processes Our work: cost-sensitive and active; useful for diagnosis and policy setting

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Cost-sensitive Decision Tree Patient Test 1 Test 2 … Test n Cancer? (Cost) $1 $30 ... $900 FP/FN= 100/300k 001 39 Low High 1 002 35 Med ? 003 42 1 T1 T6 T2 T3 Low Med <36 >=36 2 a c b Advantages: tree structure, comprehensiblity Objective: minimizing the total cost of tests and misclassification.

Attribute Splitting Criteria Previous methods: C4.5 reduces the entropy (randomness), performs badly on cost sensitive tasks New (ICML’04): we reduce the total expected cost E E3 E2 E1 1 2 3 Choose T such that E – (E1+E2+E3) is max C C3 C2 C1 1 2 3 Choose T such that C – (C1+C2+C3+C_Test) is max

Case Study: Heart Disease Predict coronary artery disease Class 0: less than 50% artery narrowing; Class 1: more than 50% artery narrowing ~300 patients, collected from hospitals 13 non-invasive tests on patients

13 Tests (Heart Disease) Tests Costs Meaning age $1 age of the patient sex cp chest pain type trestbps resting blood pressure chol $7.27 cholesterol in mg/dl fbs $5.20 fasting blood sugar restecg $15.50 resting electrocardiography results thalach $102.90 maximum heart rate thal maximum heart rate reached exang $87.30 exercise induced angina oldpeak ST depression induced by exercise slope slope of the peak exercise ST segment ca $100.90 number of major vessels colored by fluoroscopy

Cost-sensitive tree for Heart Disease 1 2 3 4 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) age Naturally prefer tests with small cost Balance cost and discriminating power Local heart-failure specialist thinks this tree is reasonable.

Considering Group Discount Tests Costs Meaning age $1 age of the patient sex cp chest pain type trestbps resting blood pressure chol $7.27 cholesterol in mg/dl fbs $5.20 fasting blood sugar restecg $15.50 resting electrocardiography results thalach $102.90 maximum heart rate thal finishing heart rate exang $87.30 exercise induced angina oldpeak ST depression induced by exercise slope slope of the peak exercise ST segment ca $100.90 number of major vessels colored by fluoroscopy Discount: $2.10 Discount: $101.90 Discount: $86.30

Different trees without/with group discount 1 2 3 4 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) thalach age 1 2 3 4 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) age individual cost: $102.9 Before After

Algorithm of Cost-sensitive Decision Tree CSDT(Examples, Attributes, TestCosts) If all examples are positive, return root with label=+ If all examples are negative, return root with label=- If maximum cost reduction <0, return root with label according to min(PTP+ NFP, NTN+ PFN) Let A be an attribute with maximum cost reduction root  A Update TestCosts if discount applies For each possible value vi of the attribute A Add a new branch A=vi below root Segment the training examples Example_vi into the new branch Call CSDT(examples_vi, Attributes-A, TestCosts) to build subtree

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Three categories of intelligent test strategies Patient Test 1 Test 2 … Test n Cancer? (Cost) $1 $30 ... $900 FP/FN= 100/300k 001 39 Low High 1 002 35 Med ? 003 42 1 T1 T6 T2 T3 Low Med <36 >=36 2 a c b New1 ? … Three categories of intelligent test strategies Sequential Test: one test, wait, … then predict Single Batch Test: one batch of tests, then predict Sequential Batch Test: batch 1, batch 2, … then predict Minimize total cost of tests and misclassification, not trivial Our methods: utilizing the minimum-cost tree structure

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Sequential Test Use tree structure to guide test sequence “Optimal” because tree is (locally) optimal

Sequential Test 4 1 2 3 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) thalach age

Experimental Comparison Using 10 datasets from UCI No. of Attributes No. of Examples Class dist. (N/P) Ecoli 6 332 230/102 Breast 9 683 444/239 Heart 8 161 98/163 Thyroid 24 2000 1762/238 Australia 15 653 296/357 Tic-tac-toe 958 332/626 Mushroom 21 8124 4208/3916 Kr-vs-kp 36 3196 1527/1669 Voting 16 232 108/124 Cars 446 328/118

Comparing Sequential Test Eager learning: Sequential Test (OST) (ICML’04) Lazy learning: Lazy Sequential Test (LazyOST) (TKDE’05) Cost-sensitive Naïve Bayes (CSNB) (ICDM’04)

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Single Batch Test Only one batch – not an easy task If too few, important tests not requested; prediction is not accurate; total cost high If too many, some tests are wasted; total cost high The test example may not be classified by a leaf

Single Batch Test Expected cost reduction: if a test is done, what are the possible outcomes and cost reduction R(.): all reachable unknown nodes and leaves i j3 j2 j1 1 2 3

Single Batch Test A*-like search algorithm Form a candidate list (L) and a batch list (B) Choose a test with maximum positive expected cost reduction from L, add it to B Update L: add all reachable unknowns to L Efficient with tree structure until expected cost reduction is 0

Single Batch Test L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is empty Output B as the batch of tests

Single Batch Test 4 1 2 3 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) thalach age ]

Single Batch Test 1 2 3 4 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) thalach age cp is unknown. cp has positive expected cost reduction. cp is added to the batch. cp’s reachable unknown nodes are added into the candidate list. ]

Single Batch Test 1 2 3 4 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) thalach age From the candidate list, choose one with maximum positive expected cost reduction. Add it to the batch, and update the candidate list. Repeat. After 7 steps, expected cost reduction is 0. ]

Single Batch Test Do all tests in the batch 4 1 2 3 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) thalach age Do all tests in the batch ]

Predict by internal node Single Batch Test 1 2 3 4 thal ($102.9) fbs ($5.2) restecg ($15.5) sex ($1) chol ($7.27) cp slope ($87.3) thalach age Make a prediction. Some tests are wasted. ] Predict by internal node

Comparing Single Batch Tests Naïve Single Batch (NSB) (ICML’04) Cost-sensitive Naïve Bayes Single Batch (CSNB-SB) (ICDM’04) Greedy Single Batch (GSB) (TKDE’05) Single Batch Test (OSB) (TKDE’05)

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Sequential Batch Batch 1, batch 2, … , prediction Must include the cost of waiting in tests Wait cost of a batch: maximum wait cost in the batch Less than the sum Combines Sequential Test and Single Batch Test If all waiting costs =0, it becomes Sequential Test If all waiting costs very large, Single Batch

Sequential Batch The wait cost is derived from wait time Test wait time in hours age sex cp trestbps chol fbs restecg thalach exang oldpek slope ca thal 0.001 0.01 4 0.5 1

Sequential Batch Extending the Single Batch to include the batch cost An additional constraint: cumulative ROI No more batches!

Sequential Batch Loop L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 & ROI increases then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is empty If (B is not empty) then Output B as the current batch of tests; obtain their values at a cost Classify the test example further, until encountering another unknown test Else exit the first Loop

Comparing Sequential Batch Test

Outline Introduction Cost-sensitive decision trees Test strategies Sequential Test Single Batch Test Sequential Batch Test Conclusions and future work

Future Work Deal with different test examples differently Consider more costs: acquiring new examples If $10 for each new example, how many do I need? For $10, tell me if this patient has cancer If test is not accurate (e.g. 90%), how to build trees and how to do tests (will I do it again)? From cost-sensitive trees, derive medical policy for expensive/risky or cheap/effective tests

Conclusions Cost-sensitive decision tree: effective for learning with minimal total cost Can be used to model learning from data with costs Design and compare various test strategies Sequential Test: one test, wait, …: low cost but long wait Single Batch Test: one batch of tests: quick but higher cost Sequential Batch Test: batch, wait, batch, …: best tradeoff Our methods perform better than previous ones Can be readily applied to real-world diagnoses

References C.X. Ling, Q. Yang, J. Wang, and S. Zhang. Decision Trees with Minimal Costs. ICML'2004. X. Chai, L. Deng, Q. Yang, and C.X. Ling. Test-Cost Sensitive Naive Bayes Classification. ICDM'2004. C.X. Ling, S. Sheng, Q. Yang. “Intelligent Test Strategies for Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. S. Zhang, Z. Qin, C.X. Ling, S. Sheng. "Missing is Useful": Missing Values in Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. Turney, P.D. 2000. Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML’2000. Zubek, V.B., and Dietterich, T. 2002. Pruning improves heuristic search for cost-sensitive learning. ICML’2002. Turney, P.D. 1995. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. JAIR, 2:369-409. Lizotte, D., Madani, O., and Greiner R. 2003. Budgeted Learning of Naïve-Bayes Classifiers. In Uncertainty in AI.