Predicting Unix Commands With Decision Tables and Decision Trees Kathleen Durant Third International Conference on Data Mining Methods and Databases September.

Slides:



Advertisements
Similar presentations
Projects Data Representation Basic testing and evaluation schemes
Advertisements

Florida International University COP 4770 Introduction of Weka.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
On-line learning and Boosting
ICML Linear Programming Boosting for Uneven Datasets Jurij Leskovec, Jožef Stefan Institute, Slovenia John Shawe-Taylor, Royal Holloway University.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Sparse vs. Ensemble Approaches to Supervised Learning
Decision Tree Algorithm
Ensemble Learning: An Introduction
Three kinds of learning
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Chapter 5 Data mining : A Closer Look.
Ensemble Learning (2), Tree and Forest
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
A Few Answers Review September 23, 2010
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
CS 391L: Machine Learning: Ensembles
1 Comparison of Principal Component Analysis and Random Projection in Text Mining Steve Vincent April 29, 2004 INFS 795 Dr. Domeniconi.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
Benk Erika Kelemen Zsolt
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
An Exercise in Machine Learning
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
Machine Learning: Ensemble Methods
Learning to Detect and Classify Malicious Executables in the Wild by J
Machine Learning: Ensembles
Data Mining Concept Description
Introduction to Data Mining, 2nd Edition
Machine Learning with Weka
Ensemble learning Reminder - Bagging of Trees Random Forest
Lecture 10 – Introduction to Weka
Decision trees One possible representation for hypotheses
Presentation transcript:

Predicting Unix Commands With Decision Tables and Decision Trees Kathleen Durant Third International Conference on Data Mining Methods and Databases September 25, 2002 Bologna, Italy

How Predictable Are a User’s Computer Interactions? Command sequences The time of day The type of computer your using Clusters of command sequences Command typos

Characteristics of the Problem Time sequenced problem with dependent variables Not a standard classification problem Predicting a nominal value rather than a Boolean value Concept shift

Dataset Davison and Hirsh – Rutgers university Collected history sessions of 77 different users for 2 – 6 months Three categories of users: professor, graduate, undergraduate Average number of commands per sessions: 2184 Average number of distinct commands per session : 77

Rutgers Study 5 different algorithms implemented C4.5 a decision-tree learner An omniscient predictor The most recent command just issued The most frequently used command of the training set The longest matching prefix to the current command Most successful – C4.5 Predictive accuracy 38%

Typical History Session :13:31 green-486 vs100 BLANK :13:31 green-486 vs100 vi :13:31 green-486 vs100 ls :13:47 green-486 vs100 lpr :13:57 green-486 vs100 vi :14:10 green-486 vs100 make :14:33 green-486 vs100 vis :14:46 green-486 vs100 vi

WEKA System Provides Learning algorithms Simple format for importing data –ARFF format Graphical user interface

History Session in ARFF ct-2 ct-1 ct0 BLANK,vi,ls vi, ls, lpr ls,lpr, make lpr, make, vis make, vis, vi

Learning Techniques Decision tree using 2 previous commands as attributes Minimize size of the tree Maximize information gain Boosted decision trees - AdaBoost Decision table Match determined by k nearest neighbors Verification by 10-fold cross validation Verification by splitting data into training/test sets Match determined by majority

emacs time = 0 emacsls pwd more make pine man ls emacs pwd gcc ls more vi make gcc time = -1 Learning a Decision Tree time = -2 lss makes dir Command Values

Boosting a Decision Tree Decision Tree Solution Set

Example Learning a Decision Table K - Nearest Neighbors (IBk)

Prediction Metrics Macro-average – average predictive accuracy per person What was the average predictive accuracy for the users in the study ? Micro-average – average predictive accuracy for the commands in the study What percentage of the commands in the study did we predict correctly?

Macro-average Results

Micro-average Results

Results: Decision Trees Decision trees – expected results Compute-intensive algorithm Predictability results are similar to simpler algorithms No interesting findings Duplicated the Rutger’s study results

Results: AdaBoost AdaBoost – very disappointing Unfortunately none or few boosting iterations performed Only 12 decision trees were boosted Boosted trees predictability only increased by 2.4% on average Correctly predicted 115 more commands than decision trees ( out of 118,409 wrongly predicted commands) Very compute intensive and no substantial increase in predictability percentage

Results: Decision Tables Decision table – satisfactory results good predictability results relatively speedy Validation is done incrementally Potential candidate for an online system

Summary of Prediction Results Ibk decision table produced the highest micro-average Boosted decision trees produced the highest macro-average Difference was negligible 1.37% - micro-average 2.21% - macro-average

Findings Ibk decision tables can be used in an online system Not a compute-intensive algorithm Predictability is better or as good as decision trees Consistent results achieved on fairly small log sessions (> 100 commands) No improvement in prediction for larger log sessions (> 1000 commands) due to concept shift

Summary of Benefits Automatic typo correction Savings in keystrokes is on average 30% Given an average command length is 3.77 characters Predicted command can be issued with 1 keystroke

Questions

The algorithm. Let D t (i) denote the weight of example i in round t. Initialization: Assign each example (x i, y i ) E the weight D 1 (i) := 1/n. For t = 1 to T: Call the weak learning algorithm with example set E and weight s given by D t. Get a weak hypothesis h t : X. Update the weights of all examples. Output the final hypothesis, generated from the hypotheses of rounds 1 to T. AdaBoost Description

Complete Set of Results Decision table using Ibk Decision table using majority match Decision table using percentage split Decision treesAdaBoost Macro-averageMicro-average

Learning a Decision Tree Command at time = t-2 lsmake Command at t-1 make dir grep dir … grep ls pwd Command at t-1 emacs ls pwd grep Predicted Commands time = t