Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Longin Jan Latecki Temple University
Feature Selection Presented by: Nafise Hatamikhah
1. Abstract 2 Introduction Related Work Conclusion References.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Reduced Support Vector Machine
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,
Evaluating Hypotheses
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Chapter 5 Data mining : A Closer Look.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine
Inductive learning Simplest form: learn a function from examples
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Additive Data Perturbation: the Basic Problem and Techniques.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
1 Evaluation of Learning Models Literature: Literature: T. Mitchel, Machine Learning, chapter 5 T. Mitchel, Machine Learning, chapter 5 I.H. Witten and.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Discriminative Frequent Pattern Analysis for Effective Classification
Stratified Sampling for Data Mining on the Deep Web
Ensemble learning.
A task of induction to find patterns
A task of induction to find patterns
Machine Learning: Lecture 5
Presentation transcript:

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia

Ying Hu 2 Outline 1.An example 2.Background Review 3.TAR2 Treatment Learner TARZAN: Tim Menzies TAR2: Ying Hu & Tim Menzies 4.TAR3: improved tar2 TAR3: Ying Hu 5.Evaluation of treatment learning 6.Application of Treatment Learning 7.Conclusion

Ying Hu 3 First Impression low high 6.7 <= rooms < 9.8 and 12.6 <= parent teacher ratio < <= nitric oxide < 1.9 and <= living standard < 39 C4.5’s decision tree: Treatment learner:  Boston Housing Dataset (506 examples, 4 classes)

Ying Hu 4 Review: Background  What is KDD ? –KDD = Knowledge Discovery in Database [fayyad96] –Data mining: one step in KDD process –Machine learning: learning algorithms  Common data mining tasks –Classification Decision tree induction (C4.5) [quinlan86] Nearest neighbors [cover67] Neural networks [rosenblatt62] Naive Baye’s classifier [duda73] –Association rule mining APRIORI algorithm [agrawal93] Variants of APRIORI

Ying Hu 5 Treatment Learning: Definition –Input: classified dataset Assume: classes are ordered –Output: Rx=conjunction of attribute-value pairs Size of Rx = # of pairs in the Rx –confidence(Rx w.r.t Class) = P(Class|Rx) –Goal: to find Rx that have different level of confidence across classes –Evaluate Rx: lift –Visualization form of output

Ying Hu 6 Motivation: Narrow Funnel Effect  When is enough learning enough? –Attributes: < 50%, accuracy: decrease 3-5% [shavlik91] –1-level decision tree is comparable to C4 [Holte93] –Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97] –Scheduling: random sampling outperforms complete search (depth-first) [crawford94]  Narrow funnel effect –Control variables vs. derived variables –Treatment learning: finding funnel variables

Ying Hu 7 TAR2: The Algorithm  Search + attribute utility estimation –Estimation heuristic: Confidence1 –Search: depth-first search Search space: confidence1 > threshold  Discretization: equal width interval binning  Reporting Rx –Lift(Rx) > threshold  Software package and online distribution

Ying Hu 8 The Pilot Case Study  Requirement optimization –Goal: optimal set of mitigations in a cost effective manner Risks Mitigations Requirements Cost reduce relates Benefit incur achieve  Iterative learning cycle

Ying Hu 9 The Pilot Study (continue)  Cost-benefit distribution (30/99 mitigations)  Compared to Simulated Annealing

Ying Hu 10 Problem of TAR2  Runtime vs. Rx size  To generate Rx of size r:  To generate Rx from size [1..N]

Ying Hu 11 TAR3: the improvement  Random sampling –Key idea: Confidence1 distribution = probability distribution sample Rx from confidence1 distribution –Steps: Place item (a i ) in increasing order according to confidence1 value Compute CDF of each a i Sample a uniform value u in [0..1] The sample is the least a i whose CDF>u –Repeat till we get a Rx of given size

Ying Hu 12 Comparison of Efficiency  Runtime vs. Data size  Runtime vs. Rx size  Runtime vs. TAR2

Ying Hu 13 Comparison of Results  Mean and STD in each round  Final Rx: TAR2=19, TAR3=20  10 UCI domains, identical best Rx  pilot2 dataset (58 * 30k )

Ying Hu 14 External Evaluation All attributes (10 UCI datasets) learning  FSS framework some attributes learning Compare Accuracy C4.5 Naive Bayes Feature subset selector TAR2less

Ying Hu 15 The Results  Accuracy using Naïve Bayes (Avg increase = 0.8% )  Number of attributes  Accuracy using C4.5 (avg decrease 0.9%)

Ying Hu 16 Compare to other FSS methods  # of attribute selected (C4.5 )  # of attribute selected (Naive Bayes)  17/20, fewest attributes selected  Another evidence for funnels

Ying Hu 17 Applications of Treatment Learning  Downloading site:  Collaborators: JPL, WV, Portland, Miami  Application examples –pair programming vs. conventional programming –identify software matrix that are superior error indicators –identify attributes that make FSMs easy to test –find the best software inspection policy for a particular software development organization  Other applications: –1 journal, 4 conference, 6 workshop papers

Ying Hu 18 Main Contributions  New learning approach  A novel mining algorithm  Algorithm optimization  Complete package and online distribution  Narrow funnel effect  Treatment learner as FSS  Application on various research domains