Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra.

Similar presentations


Presentation on theme: "Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra."— Presentation transcript:

1 Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra Chapter 5: Evaluation

2 Course’s Out Lines  Introduction  Data Preparation and Preprocessing  Data Representation  Classification Methods  Evaluation  Clustering Methods  Mid Exam  Association Rules  Knowledge Representation  Special Case study : Document clustering  Discussion of Case studies by students 2

3 Out Lines  Definition of Evaluation  Measure of interestingness  Training versus Testing  Cluster evaluation 3

4 Definition of Evaluation  After examining the data and applying automated methods for data mining, we must carefully consider the quality of the end-product of our effort. This step is evaluation.  Evaluation evaluates the performance of the a proposed solution to the data mining task. 4

5 Definition of Evaluation  A large number of patterns and rules exist in database. Many of them has no interest to the user. 5 Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

6 Measure of interestingness 6  Measure of interestingness has two approaches:  Objective: where the interestingness is measured in term of its structure and underlying data used in the discovery process.

7 Measure of interestingness 7  Measure of interestingness has two approaches:  Subjective: Subjective measure do not depended only in the structure of a rule and the data used, but also on the user who examines the pattern. These measures recognize that a pattern of interest to one user, may be no interest to another user.

8 Training versus Testing 8  “Just trust me!” does not work in evaluation.  Error on the training data is not a good indicator of performance on future data.  Simple solution probably not be exactly the same as the training that can be used if lots of (labeled) data is available.  Split data into training and test set.

9 Training versus Testing 9  A strong and effective way to evaluate results is to hide some data and then do a fair comparison of training results to unseen data.  In this way it prevents poor results and gives the developers time to extract the best performance from the application system.  Many kinds of splitting data into training and testing most common holdout and cross validation

10 Cluster evaluation 10  One type of measure allows us to evaluate different sets of clusters without external knowledge and is called an internal quality measure; it is used when we don't have external knowledge about the clustering data.  Overall similarity is an example for internal quality measure and will be discussed below.

11 Cluster evaluation 11  The second type of measures lets us evaluate the quality of clustering by comparing the clusters produced by the clustering techniques to known classes (external knowledge).  This type of measure is called an external quality measure and we will discuss two external qualities which are entropy and F-measure.

12 Cluster evaluation 12  There are many different quality measures and the performance and relative ranking of different clustering algorithms can vary substantially depending on which measure is used.

13 Thanks 13


Download ppt "Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra."

Similar presentations


Ads by Google