Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Slides:

Advertisements

Similar presentations

Recap: Mining association rules from large datasets

Advertisements

Data Mining Techniques Association Rule

Identifying Interesting Association Rules with Genetic Algorithms

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.

Mining Multiple-level Association Rules in Large Databases

10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi.

WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.

Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.

Data Mining: Discovering Information From Bio-Data Present by: Hongli Li & Nianya Liu University of Massachusetts Lowell.

Research Project Mining Negative Rules in Large Databases using GRD.

Mining Association Rules

Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology

Introduction to machine learning

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.

Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.

Machine Learning CSE 681 CH2 - Supervised Learning.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Mining High Utility Itemset in Big Data

Data Mining: Potentials and Challenges Rakesh Agrawal IBM Almaden Research Center.

1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Multi-Relational Data Mining: An Introduction Joe Paulowskey.

The Volcano Optimizer Generator Extensibility and Efficient Search.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.

Limitations of Cotemporary Classification Algorithms Major limitations of classification algorithms like Adaboost, SVMs, or Naïve Bayes include, Requirement.

Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classiﬁcation models with taxonomy.

COT6930 Course Project. Outline Gene Selection Sequence Alignment.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.

Data Mining  Association Rule  Classification  Clustering.

Association Rules Carissa Wang February 23, 2010.

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering.

Mining Dependent Patterns

Fast Kernel-Density-Based Classification and Clustering Using P-Trees

Frequent Pattern Mining

SAD: 6º Projecto.

EECS 647: Introduction to Database Systems

Waikato Environment for Knowledge Analysis

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining II: Association Rule mining & Classification

Transactional data Algorithm Applications

Data Mining Association Analysis: Basic Concepts and Algorithms

Revision (Part II) Ke Chen

Association Rule Mining

Revision (Part II) Ke Chen

Discriminative Frequent Pattern Analysis for Effective Classification

Department of Electrical Engineering

Exploiting the Power of Group Differences to Solve Data Analysis Problems Classification Guozhu Dong, PhD, Professor CSE

Presentation transcript:

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia

Ying Hu 2 Outline 1.An example 2.Background Review 3.TAR2 Treatment Learner TARZAN: Tim Menzies TAR2: Ying Hu & Tim Menzies 4.TAR3: improved tar2 TAR3: Ying Hu 5.Evaluation of treatment learning 6.Application of Treatment Learning 7.Conclusion

Ying Hu 3 First Impression low high 6.7 <= rooms < 9.8 and 12.6 <= parent teacher ratio < <= nitric oxide < 1.9 and <= living standard < 39 C4.5’s decision tree: Treatment learner:  Boston Housing Dataset (506 examples, 4 classes)

Ying Hu 4 Review: Background  What is KDD ? –KDD = Knowledge Discovery in Database [fayyad96] –Data mining: one step in KDD process –Machine learning: learning algorithms  Common data mining tasks –Classification Decision tree induction (C4.5) [quinlan86] Nearest neighbors [cover67] Neural networks [rosenblatt62] Naive Baye’s classifier [duda73] –Association rule mining APRIORI algorithm [agrawal93] Variants of APRIORI

Ying Hu 5 Treatment Learning: Definition –Input: classified dataset Assume: classes are ordered –Output: Rx=conjunction of attribute-value pairs Size of Rx = # of pairs in the Rx –confidence(Rx w.r.t Class) = P(Class|Rx) –Goal: to find Rx that have different level of confidence across classes –Evaluate Rx: lift –Visualization form of output

Ying Hu 6 Motivation: Narrow Funnel Effect  When is enough learning enough? –Attributes: < 50%, accuracy: decrease 3-5% [shavlik91] –1-level decision tree is comparable to C4 [Holte93] –Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97] –Scheduling: random sampling outperforms complete search (depth-first) [crawford94]  Narrow funnel effect –Control variables vs. derived variables –Treatment learning: finding funnel variables

Ying Hu 7 TAR2: The Algorithm  Search + attribute utility estimation –Estimation heuristic: Confidence1 –Search: depth-first search Search space: confidence1 > threshold  Discretization: equal width interval binning  Reporting Rx –Lift(Rx) > threshold  Software package and online distribution

Ying Hu 8 The Pilot Case Study  Requirement optimization –Goal: optimal set of mitigations in a cost effective manner Risks Mitigations Requirements Cost reduce relates Benefit incur achieve  Iterative learning cycle

Ying Hu 9 The Pilot Study (continue)  Cost-benefit distribution (30/99 mitigations)  Compared to Simulated Annealing

Ying Hu 10 Problem of TAR2  Runtime vs. Rx size  To generate Rx of size r:  To generate Rx from size [1..N]

Ying Hu 11 TAR3: the improvement  Random sampling –Key idea: Confidence1 distribution = probability distribution sample Rx from confidence1 distribution –Steps: Place item (a i ) in increasing order according to confidence1 value Compute CDF of each a i Sample a uniform value u in [0..1] The sample is the least a i whose CDF>u –Repeat till we get a Rx of given size

Ying Hu 12 Comparison of Efficiency  Runtime vs. Data size  Runtime vs. Rx size  Runtime vs. TAR2

Ying Hu 13 Comparison of Results  Mean and STD in each round  Final Rx: TAR2=19, TAR3=20  10 UCI domains, identical best Rx  pilot2 dataset (58 * 30k )

Ying Hu 14 External Evaluation All attributes (10 UCI datasets) learning  FSS framework some attributes learning Compare Accuracy C4.5 Naive Bayes Feature subset selector TAR2less

Ying Hu 15 The Results  Accuracy using Naïve Bayes (Avg increase = 0.8% )  Number of attributes  Accuracy using C4.5 (avg decrease 0.9%)

Ying Hu 16 Compare to other FSS methods  # of attribute selected (C4.5 )  # of attribute selected (Naive Bayes)  17/20, fewest attributes selected  Another evidence for funnels

Ying Hu 17 Applications of Treatment Learning  Downloading site:  Collaborators: JPL, WV, Portland, Miami  Application examples –pair programming vs. conventional programming –identify software matrix that are superior error indicators –identify attributes that make FSMs easy to test –find the best software inspection policy for a particular software development organization  Other applications: –1 journal, 4 conference, 6 workshop papers

Ying Hu 18 Main Contributions  New learning approach  A novel mining algorithm  Algorithm optimization  Complete package and online distribution  Narrow funnel effect  Treatment learner as FSS  Application on various research domains

Ying Hu 19 ======================  Some notes follow

Ying Hu 20 Rx Definition example  Input example –classified dataset –Output example: Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx)

Ying Hu 21 TAR2 in practice  Domains containing narrow funnels –A tail in the confidence1 distribution –A small number of variables that have disproportionally large confidence1 value –Satisfactory Rx of small size (<6)

Ying Hu 22 Background: Classification  2-step procedure –The learning phase –The testing phase  Strategies employed –Eager learning Decision tree induction (e.g. C4.5) Neural Networks (e.g. Backpropagation) –Lazy learning Nearest neighbor classifiers (e.g. K-nearest neighbor classifier)

Ying Hu 23 Background: Association Rule Possible Rule: B => C,E [support=2%, confidence= 80%] Where support(X->Y) = P(X) confidence(X->Y) = P(Y|X)  Representative algorithms –APRIORI Apriori property of large itemset –Max-Miner More concise representation of the discovered rules Different prune strategies. IDTransactions 1A, B, C,E,F 2B,C,E 3B,C,D,E 4…

Ying Hu 24 Background: Extension  CBA classifier –CBA = Classification Based on Association –X=>Y, Y = class label –More accurate than C4.5 (16/26)  JEP classifier –JEP = Jumping Emerging Patterns Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0 Model: collection of JEPs Classify: maximum collective impact –More accurate than both C4.5 & CBA (15/25)

Ying Hu 25 Background: Standard FSS Method  Information Gain attribute ranking  Relief  Principle Component Analysis (PCA)  Correlation based feature selection  Consistency based subset evaluation  Wrapper subset evaluation

Ying Hu 26 Comparison  Relation to classification –Class boundary / class density –Class weighting  Relation to association rule mining –Multiple classes / no class –Confidence-based pruning  Relation to change detecting algorithm –support: |P(X|y=c1)-P(X|y=c2)| –confidence: |P(y=c1|X)-P(y=c2|X)| –Baye’s rule

Ying Hu 27 Confidence Property  Universal-extential upward closure R1: Age.young -> Salary.low R2: Age.young, Gender.m -> Salary.low R2: Age.young, Gender.f -> Salary.low  Long rule tend to have high confidence  Large Rx tend to have high lift value

Ying Hu 28 TAR3: Usability  Usability: more user-friendly –Intuitive, default setting