Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra.

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra Chapter 5: Evaluation

Course’s Out Lines  Introduction  Data Preparation and Preprocessing  Data Representation  Classification Methods  Evaluation  Clustering Methods  Mid Exam  Association Rules  Knowledge Representation  Special Case study : Document clustering  Discussion of Case studies by students 2

Out Lines  Definition of Evaluation  Measure of interestingness  Training versus Testing  Cluster evaluation 3

Definition of Evaluation  After examining the data and applying automated methods for data mining, we must carefully consider the quality of the end-product of our effort. This step is evaluation.  Evaluation evaluates the performance of the a proposed solution to the data mining task. 4

Definition of Evaluation  A large number of patterns and rules exist in database. Many of them has no interest to the user. 5 Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

Measure of interestingness 6  Measure of interestingness has two approaches:  Objective: where the interestingness is measured in term of its structure and underlying data used in the discovery process.

Measure of interestingness 7  Measure of interestingness has two approaches:  Subjective: Subjective measure do not depended only in the structure of a rule and the data used, but also on the user who examines the pattern. These measures recognize that a pattern of interest to one user, may be no interest to another user.

Training versus Testing 8  “Just trust me!” does not work in evaluation.  Error on the training data is not a good indicator of performance on future data.  Simple solution probably not be exactly the same as the training that can be used if lots of (labeled) data is available.  Split data into training and test set.

Training versus Testing 9  A strong and effective way to evaluate results is to hide some data and then do a fair comparison of training results to unseen data.  In this way it prevents poor results and gives the developers time to extract the best performance from the application system.  Many kinds of splitting data into training and testing most common holdout and cross validation

Cluster evaluation 10  One type of measure allows us to evaluate different sets of clusters without external knowledge and is called an internal quality measure; it is used when we don't have external knowledge about the clustering data.  Overall similarity is an example for internal quality measure and will be discussed below.

Cluster evaluation 11  The second type of measures lets us evaluate the quality of clustering by comparing the clusters produced by the clustering techniques to known classes (external knowledge).  This type of measure is called an external quality measure and we will discuss two external qualities which are entropy and F-measure.

Cluster evaluation 12  There are many different quality measures and the performance and relative ranking of different clustering algorithms can vary substantially depending on which measure is used.

Thanks 13

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra.

Similar presentations

Presentation on theme: "Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra.

Similar presentations

Presentation on theme: "Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra."— Presentation transcript:

Similar presentations

About project

Feedback