Knowledge Discovery via Data mining Enrico Tronci Dipartimento di Informatica, Università di Roma “La Sapienza”, Via Salaraia 113, 00198 Roma, Italy,

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems
Web Usage Mining Classification Fang Yao MEMS Humboldt Uni zu Berlin.
Decision Tree Approach in Data Mining
Decision Tree Algorithm (C4.5)
Classification Techniques: Decision Tree Learning
1 Input and Output Thanks: I. Witten and E. Frank.
Naïve Bayes: discussion
Decision Trees.
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Instance-based representation
1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.
Data Mining with Naïve Bayesian Methods
Classification: Decision Trees
Biological Data Mining (Predicting Post-synaptic Activity in Proteins)
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Machine Learning: finding patterns. 2 Outline  Machine learning and Classification  Examples  *Learning as Search  Bias  Weka.
Algorithms for Classification: Notes by Gregory Piatetsky.
Decision Trees an Introduction.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
K Nearest Neighbor Classification Methods Qiang Yang.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Algorithms for Classification: The Basic Methods.
K Nearest Neighbor Classification Methods Qiang Yang.
Probabilistic techniques. Machine learning problem: want to decide the classification of an instance given various attributes. Data contains attributes.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Evaluation – next steps
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Decision-Tree Induction & Decision-Rule Induction
Classification Techniques: Bayesian Classification
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Classification And Bayesian Learning
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Example: input data outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes.
K Nearest Neighbor Classification Methods. Training Set.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
An Exercise in Machine Learning
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Decision Trees by Muhammad Owais Zahid
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Data Science Algorithms: The Basic Methods
Classification Algorithms
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining CSCI 307, Spring 2019 Lecture 15
Data Mining CSCI 307 Spring, 2019
Data Mining CSCI 307, Spring 2019 Lecture 18
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Knowledge Discovery via Data mining Enrico Tronci Dipartimento di Informatica, Università di Roma “La Sapienza”, Via Salaraia 113, Roma, Italy, Workshop ENEA: I Sistemi di Supporto alle Decisioni Centro Ricerche ENEA Casaccia, Roma, October 28, 2003

2 Data Mining Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. A data miner is a computer program that sifts through data seeking regularities or patterns. Obstructions: noise and computational complexity.

3 Some Applications Decisions involving judgments, e.g. loans. Screening images. Example: detection of oil slicks from satellite images, warning of ecological disasters, illegal dumping. Load forecasting in the electricity supply industry. Diagnosis, e.g. for preventive maintenance of electromechanical devices. Marketing and Sales. … On Thursday customers often purchase beer and diapers together … Stock Market Analysis. Anomaly Detection.

4 Data AgeSpectacle Prescription AstigmatismTear production rate Recommende d lens youngmyopenoreducednone youngmyopenonormalsoft youngmyopeyesreducednone youngmyopeyesnormalhard younghypermetropenoreducednone younghypermetropenonormalsoft younghypermetropeyesreducednone younghypermetropeyesnormalhard Pre- presbyopic myopenoreducednone Pre-presbmyopenonormalsoft Attributes Goal Instance

5 Classification Assume instances have n attributes A 1, … A n-1, A n. Let attribute A n our goal. A classifier is a function f from (A 1 x …x A n-1 ) to A n. That is f looks at the values of the first (n-1) attributes and returns the (estimated) value of the goal. In other words f classifies each instance w.r.t. the goal attribute. The problem of computing a classifier from a set of instances is called the classification problem. Note that in a classification problem the set of classes (i.e. the possible goal value) is known in advance. Note that a classifier works on any possible instance. That is also on instances that were not present in our data set. This is way classification is a form of machine learning.

6 Clustering Assume instances have n attributes A 1, … A n. A clustering function is a function f from the set (A 1 x …x A n ) to some small subset of the natural numbers. That is f splits the set of instances into a small number of classes. The problem of computing a clustering function from our data set is called the clustering problem. Note that, unlink in a classification problem, in a clustering problem the set of classes is not known in advance. Note that a clustering function works on any possible instance. That is also on instances that were not present in our data set. This is way clustering is a form of machine learning. In the following we will focus on classification.

7 Rules for Contact Lens Data (An example of calssification) if ( = ) then = ; if ( = and = and = ) then = if ( = and = and = ) then =.. Attribute recommendation is the attribute we would like to predict. Such attribute is usually called Goal and is typically written on the last column. A possible way of defining a classifier is by using a set of rules as above.

8 Labor Negotiations Data AttributeType Durationyears Wage increase first year percentage2%4%4.3%...4.5% Wage increase second year percentage???...? Working hours per week Number of hours pension{none, r, c}none??...? Education allowance{yes, no}yes??...? Statutory holidaysNun of days vacationBelow-avg, avg, gen avggen...avg... Acceptability of contract {good, bad}badgood...good

9 Classification using Decision Trees (The Labor Negotiations Data Example (1)) Wage increase first year Statutory holidays Wage increase first year > 2.5 <= 10 badgoodbadgood <= 2.5 > 10 <= 4 > 4

10 Classification using Decision Trees (The Labor Negotiations Data Example (2)) Wage increase first year working hours per weekStatutory holidays Health plan contribution bad <= 36 > 36 Wage increase first yeargood > 10<= 10 badgoodbad good none half full <= 4 > 4 <= 2.5 > 2.5

11 Which Classifiers is good for me ? From the same data set we may get many classifiers with different properties. Here are some of the properties usually considered for a classifiers. Note that depending on the problem under consideration, some property may or may not not be relevant. Success rate. That is the percentage of instances classified correctly. Easy of computation. Readability. There are cases in which the definition of the classifier must be read by a human being. In such cases the readability of the classifier definition is an important parameter to judge the goodness of a classifier. Finally we should note that starting from the same data set different classification algorithms may return different classifiers. Usually deciding which one to use requires running some testing experiments.

12 A Classification Algorithm Decision Trees Decision trees are among the most used and more effective classifiers. We will show the decision tree classification algorithm with an example: the weather data.

13 Weather Data OutlookTemperatureHumidityWindyPlay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno

14 Constructing a decision tree for the weather data (1) OutlookTemperatureHumidityWindy yynnnyynnn yyyyyyyy yynnnyynnn yynnyynn yyyynnyyyynn yyynyyyn sunny overcast rainy H([2, 3]) = -(2/5)*log(2/5) – (3/5)*log(3/5) = bits; H([4, 0]) = 0 bits; H([3, 2]) = bits H([2, 3], [4, 0], [3, 2]) = (5/14)*H([2, 3]) + (4/14)*H([4, 0]) + (5/14)*H([3, 2]) = bits Info before any decision tree was created (9 yes, 5 no): H([9, 5]) = Gain(outlook) = H([9, 5]) - H([2, 3], [4, 0], [3, 2]) = Gain: Gain: 0.029Gain: 0.152Gain: hot mild cool yyynnnnyyynnnn yyyyyynyyyyyyn high normal yyyyyynnyyyyyynn yyynnnyyynnn false true H(p 1, … p n ) = -p 1 logp 1 - … -p n logp n H(p, q, r) = H(p. q + r) + (q + r)*H(q/(q + r), r/(q + r))

15 Constructing a decision tree for the weather data (2) Outlook Temperature Humidity Windy nnnn ynyn nnnnnn y yyyy yynnyynn ynyn sunny hot mild cool high normal true false

16 Constructing a decision tree for the weather data (3) Outlook Humidity Windy yes no sunny overcast rainy highnormal false true Computational cost of decision tree construction for a data set with m attributes and n instances: O(mn(log n)) + O(n(log n) 2 )

17 Naive Bayes OutlookTemperatureHumidityWindyPlay yesno sunn y 23 over cast 40 rain y 32 yesno sunn y 2/93/5 over cast 4/90/5 rainy3/92/5 yesno hot22 mild42 cool31 yesno hot2/92/5 mild4/92/5 cool3/91/5 yesno high34 nor mal 61 yesno high3/94/5 nor mal 6/91/5 yesno fals e 62 true33 yesno fals e 6/92/5 true3/93/5 yesno 95 yesno 9/145/14

18 Naive Bayes (2) A new day: OutlooktemperatureHumidityWindyPlay sunnycoolhightrue? E = (sunny and cool and high and true) Bayes: P(yes | E) = (P(E| yes) P(yes)) / P(E). Assuming attributes statistically independent: P(yes | E) = (P(sunny | yes) * P(cool| yes) * P(high | yes) * P(true | yes) * P(yes)) / P(E) = (2/9)*(3/9)*(3/9)*(3/9)*(9/14) / P(E) = / P(E). P(no | E) = / P(E). Since P(yes | E) + P(no | E) = 1 we have that P(E) = = Thus: P(yes | E) = P(no | E) = 0.795; Thus we answer: NO Obstruction: usually attributes are not statistically independent. However naive Bayes works quite well in practice.

19 Performance Evaluation Split data set into two parts: training set and test set. Use training set to compute classifier. Use test set to evaluate classifier. Note: test set data have no been used in the training process. This allows us to compute the following quantites (on the test set). For sake of simplicity we refer to a two-class prediction. yesno yesTP (true positive)FN (false negative) noFP (false positive)TN (true negative) Predicted class Actual class

20 Lift Chart Predicted positive subset size = (TP + FP)/(TP + FP + TN + FN) Number of true positives = TP 100% 1000 Lift charts are typically used in Marketing Applications

21 Receiver Operating Characteristic (ROC) Curve FP rate = FP/(FP + TN) Tp rate = TP/(TP + FN) 100% ROC curves are typically used in Communication Applications

22 A glimpse of the data mining in Safeguard We outline our use of data mining techniques in the safeguard project.

23 On line schema Format Filter Port 2506 TCP Packets Preprocessed TCP payload Classifier 1 (Hash Table based) Classifier 2 (Hidden Markov Models) Cluster Analyzer tcpdump Supervisor Alarm level Format Filter Format Filter Format Filter Sequence of payload bytes Distribution of payload bytes Conditional probabilities of chars and words in payload Statistics info (avg, var, dev) on payload bytes

24 Training schema Format Filter Port 2506 TCP Packets Preprocessed TCP payload log WEKA (Datamining tool) tcpdump HT Classifier Synthesizer Classifier 1 (Hash Table based) Classifier 2 (Hidden Markov Models) HMM Synthesizer Cluster Analyzer Format Filter Format Filter Format Filter Sequence of payload bytes Distribution of payload bytes Conditional probabilities of chars and words in payload Statistics info (avg, var, dev) on payload bytes