Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final Project Tel-Aviv University.

Slides:

Advertisements

Similar presentations

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Advertisements

Decision Tree Learning - ID3

Decision Trees Decision tree representation ID3 learning algorithm

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.

Classification Algorithms

Decision Tree Approach in Data Mining

ICS320-Foundations of Adaptive and Learning Systems

Classification Techniques: Decision Tree Learning

Decision Tree Example MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.

Decision Tree Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

Decision Tree Rong Jin. Determine Milage Per Gallon.

ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.

Decision Tree Learning

Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.

CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.

Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.

Induction of Decision Trees

Decision Trees Decision tree representation Top Down Construction

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Ch 3. Decision Tree Learning

Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.

Decision Tree Learning

ID3 and Decision tree by Tuan Nguyen May 2008.

NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.

Decision tree learning

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Machine Learning Chapter 3. Decision Tree Learning

Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.

Artificial Intelligence 7. Decision trees

Machine Learning CS 165B Spring 2012

Machine Learning Decision Tree.

Mohammad Ali Keyvanrad

Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.

CpSc 810: Machine Learning Decision Tree Learning.

1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.

Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Decision-Tree Induction & Decision-Rule Induction

Decision Tree Learning

Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.

CS690L Data Mining: Classification

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

机器学习陈昱北京大学计算机科学技术研究所信息安全工程研究中心. 课程基本信息  主讲教师：陈昱 Tel ：  助教：程再兴， Tel ：  课程网页：

Classification Algorithms Decision trees Rule-based induction Neural networks Memory(Case) based reasoning Genetic algorithms Bayesian networks Basic Principle.

Decision Trees, Part 1 Reading: Textbook, Chapter 6.

Decision Tree Learning

Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.

Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.

CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.

DECISION TREES An internal node represents a test on an attribute.

Decision Tree Learning

Machine Learning Lecture 2: Decision Tree Learning.

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Decision Tree Saed Sayad 9/21/2018.

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Decision Trees Decision tree representation ID3 learning algorithm

Machine Learning Chapter 3. Decision Tree Learning

Presentation transcript:

Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final Project Tel-Aviv University

Overview zThe remote monitoring system zThe project database zMachine learning methods: yDecision of Association Rules yInductive Logic Programming yDecision Tree zApplying the methods for project database and comparing the results

Remote Monitoring System - Description zSupport Center has ongoing information on customer’s equipment zSupport Center can, in some situations, know that customer is going to be in trouble zSupport Center initiates a call to the customer zSpecialist connects to site from remote and tries to eliminate problem before it has influence

Remote Monitoring System - Description Gateway Product AIX/NT Customer TCP/IP [FTP] TCP/IP [Mail/FTP] Support Server AIX/NT/95 Modem

Remote Monitoring System - Technique zOne of the machines on site, the Gateway, is able to initiate a PPP connection to the support server or to ISP zAll the Products on site have a TCP/IP connection to the Gateway zBackground tasks on each Product collect relevant information zThe data collected from all Products is transferred to the Gateway via ftp zThe Gateway automatically dials to the support server or ISP, and sends the data to the subsidiary zThe received data is then imported to database

Project Database z12 columns, 300 records zEach record includes failure information of one product at a specific customer site zThe columns are: record no., date, IP address, operating system, customer ID, product, release, product ID, category of application, application, severity, type of service contract

Project Goals zDiscover valuable information from database zImprove the products marketing and the customer support of the company zLearn different learning methods, and use them for the project database zCompare the different methods, based on the results

The Learning Methods zDiscovery of Association Rules zInductive Logic Programming zDecision Tree

Discovery of Association Rules - Goals zFinding relations between products which are bought by the customers yImpacts on product marketing zFinding relations between failures in a specific product yImpacts on customer support (failures can be predicted and handled before influences)

Discovery of Association Rules - Definition zA technique developed specifically for data mining yGiven xA dataset of customer transactions xA transaction is a collection of items yFind xCorrelations between items as rules zExample ySupermarket baskets

Determining Interesting Association Rules zRules have confidence and support yIF x and y THEN z with confidence c xIf x and y are in the basket, then so is z in c% of cases yIF x and y THEN z with support s xThe rule holds in s% of all transactions

Discovery of Association Rules - Example zInput Parameters: confidence=50%; support=50% zIf A then C: c=66.6% s=50% zIf C then A: c=100% s=50% TransactionItems 12345A B C 12346A C 12347A D 12348B E F

Itemsets are Basis of Algorithm zRule A => C zs=s(A, C) = 50% zc=s(A, C)/s(A) = 66.6% TransactionItems 12345A B C 12346A C 12347A D 12348B E F ItemsetSupport A75% B50% C A, C50%

Algorithm Outline zFind all large itemsets ySets of items with at least minimum support yApriori algorithm zGenerate rules from large itemsets yFor ABCD and AB in large itemset the rule AB=>CD holds if ratio s(ABCD)/s(AB) is large enough yThis ratio is the confidence of the rule

Pseudo Algorithm

Relations Between Products

Relations Between Failures Association RulesConfidence ( CF )Item Set ( L ) 4  6 14 / 16 =  4 14 / 15 =  / 18 =  5 15 / 15 = 1

Inductive Logic Programming - Goals zFinding the preferred customers, based on: yThe number of products bought by the customer yThe failures types (i.e severity level) occurred in the products

Inductive Logic Programming - Definition zInductive construction of first-order clausal theories from examples and background knowledge zThe aim is to discover, from a given set of pre- classified examples, a set of classification rules with high predictive power zExamples: yIF Outlook=Sunny AND Humidity=High THEN PlayTennis=No

Horn clause induction Given: P: ground facts to be entailed (positive examples); N: ground facts not to be entailed (negative examples); B: a set of predicate definitions (background theory); L: the hypothesis language; Find a predicate definition (hypothesis) such that 1.for every (completeness) 2.for every (consistency)

Inductive Logic Programming - Example zLearning about the relationships between people in a family circle

Algorithm Outline zA space of candidate solutions and an acceptance criterion characterizing solutions to an ILP problem zThe search space is typically structured by means of the dual notions of generalization (induction) and specialization (deduction) yA deductive inference rule maps a conjunction of clauses G onto a conjunction of clauses S such that G is more general than S yAn inductive inference rule maps a conjunction of clauses S onto a conjunction of clauses G such that G is more general than S. zPruning Principle: yWhen B and H don’t include positive example, then specializations of H can be pruned from the search yWhen B and H include negative example, then generalizations of H can be pruned from the search

Pseudo Algorithm

The preferred customers If ( Total_Products_Types( Customer ) > 5 ) and ( All_Severity(Customer) < 3 ) then Preferred_Customer

Decision Trees - Goals zFinding the preferred customers zFinding relations between products which are bought by the customers zFinding relations between failures in a specific product zCompare the Decision Tree results to the previous algorithms results.

Decision Trees - Definition zDecision tree representation: yEach internal node tests an attribute yEach branch corresponds to attribute value yEach leaf node assigns a classification zOccam’s razor: prefer the shortest hypothesis that fits the data zExamples: yEquipment or medical diagnosis yCredit risk analysis

Algorithm outline zA  the “best” decision attribute for next node zAssign A as decision attribute for node zFor each value of A, create new descendant of node zSort training examples to leaf nodes zIf training examples perfectly classified, Then STOP, Else iterate over new leaf nodes

Pseudo algorithm

Information Measure zEntropy measures the impurity of the sample of training examples S : y is the probability of making a particular decision yThere are c possible decisions zThe entropy is the amount of information needed to identify class of an object in S yMaximized when all are equal yMinimized (0) when all but one is 0 (the remaining is 1)

Information Measure zEstimate the gain in information from a particular partitioning of the dataset zGain(S, A) = expected reduction in entropy due to sorting on A zThe information that is gained by partitioning S is then: zThe gain criterion can then be used to select the partition which maximizes information gain

Decision Tree - Example DayOutlookTemperatureHumidityWindPlayTennis D1sunnyhothighweakNo D2sunnyhothighstrongNo D3overcasthothighweakYes D4rainmildhighweakYes D5raincoolnormalweakYes D6raincoolnormalstrongNo D7overcastcoolnormalstrongYes D8sunnymildhighweakNo D9sunnycoolnormalweakYes D10rainmildnormalweakYes D11sunnymildnormalstrongYes D12overcastmildhighstrongYes D13overcasthotnormalweakYes D14rainmildhighstrongNo

Decision Tree - Example (Continue) humiditywind highweaknormalstrong NP S: [9+,5-] E=0.940 S: [9+,5-] E=0.940 [6+,2-] E=0.811 [3+,3-] E=1.00 Gain (S, Wind) = (8/14) (6/14)1.0 =.048 [3+,4-] E=0.985 [6+,1-] E=0.592 Gain (S, Humidity) = (7/14) (7/14).592 =.151 Which attribute is the best classifier? Gain(S, Outlook) = Gain(S, Temperature) = 0.029

Decision Tree Example – (Continue) outlook ? sunnyovercastrain Yes {D1, D2, …, D14} [9+,5-] {D4,D5,D6,D10,D14} [3+,2-] {D1,D2,D8,D9,D11} [2+,3-] {D3,D7,D12,D13} [4+,0-] ? S sunny = {D1,D2,D8,D9,D11} Gain(S sunny, Humidity) =.970 – (3/5)0.0 – (2/5)0.0 =.970 Gain(S sunny, Temperature) =.970 – (2/5)0.0 – (2/5)1.0 – (1/5)0.0 =.570 Gain(S sunny, Wind) =.970 – (2/5)1.0 – (3/5).918 =.019

Decision Tree Example – (Continue) outlook humiditywind sunnyovercastrain Yes highstrongnormalweak NoYesNoYes

Overfitting zThe tree may not be generally applicable called overfitting zHow can we avoid overfitting? yStop growing when data split not statistically significant yGrow full tree, then post-prun zThe post-pruning approach is more common zHow to select “best” tree: yMeasure performance over training data yMeasure performance over separate validation data set

Reduced-Error Pruning zSplit data into training and validation set zDo until further pruning is harmful: 1.Evaluate impact on validation set of pruning each possible node (plus those below it) 2.Greedily remove the one that most improves validation set accuracy Produces smallest version of most accurate sub-tree

The Preferred Customer NO: 7 YES: 0 NO: 0 YES: 3 NoOfProducts < 2.5>= 2.5 MaxSev < 4.5 >= 4.5 NO: 3 YES: 8 Target attribute is TypeOfServiceContract

Relations Between Products NO: 0 YES: 1 NO: 4 YES: 0 Product2 Product Product6 0 1 NO: 0 YES: 15 NO: 0 YES: 1 Target attribute is Product3

Relations Between Failures NO: 5 YES: 1 NO: 1 YES: 0 Application8 Application Application NO: 0 YES: 11 NO: 2 YES: 2 Target attribute is Application5