Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra.

Slides:



Advertisements
Similar presentations
Managing Knowledge in the Digital Firm (II) Soetam Rizky.
Advertisements

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining By Archana Ketkar.
1 A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions Zhihong Zeng, Maja Pantic, Glenn I. Roisman, Thomas S. Huang Reported.
Extracting Test Cases by Using Data Mining; Reducing the Cost of Testing Andrea Ciocca COMP 587.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Introduction to machine learning
Algorithms for Data Analytics Chapter 3. Plans Introduction to Data-intensive computing (Lecture 1) Statistical Inference: Foundations of statistics (Chapter.
Aligning Course Competencies using Text Analytics
Evaluating Classifiers
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Perception-Based Classification (PBC) System Salvador Ledezma April 25, 2002.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Something Interesting and Horrible about DATA MINING! TEAM 17: Agoritsa Polyzou & Mo Sun.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Data Warehousing Lecture-30 What can Data Mining do? Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Prepared by: Mahmoud Rafeek Al-Farra
Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.
Summary „Data mining” Vietnam national university in Hanoi, College of technology, Feb.2006.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
An Extension of Table Lens CPSC 533 Information Visualization Course Project, Term 2, 2003 Fengdong Du.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
Presented by Khawar Shakeel
DATA MINING © Prentice Hall.
Introduction Machine Learning 14/02/2017.
School of Computer Science & Engineering
Prepared by: Mahmoud Rafeek Al-Farra
Introduction to Data Mining
Chapter 6 Classification and Prediction
MIS 451 Building Business Intelligence Systems
Data Mining 101 with Scikit-Learn
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
CSCI N317 Computation for Scientific Applications Unit Weka
Prepared by: Mahmoud Rafeek Al-Farra
Data Warehousing Data Mining Privacy
Data Mining.
CSE591: Data Mining by H. Liu
Presentation transcript:

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 5: Evaluation

Course’s Out Lines  Introduction  Data Preparation and Preprocessing  Data Representation  Classification Methods  Evaluation  Clustering Methods  Mid Exam  Association Rules  Knowledge Representation  Special Case study : Document clustering  Discussion of Case studies by students 2

Out Lines  Definition of Evaluation  Measure of interestingness  Training versus Testing  Cluster evaluation 3

Definition of Evaluation  After examining the data and applying automated methods for data mining, we must carefully consider the quality of the end-product of our effort. This step is evaluation.  Evaluation evaluates the performance of the a proposed solution to the data mining task. 4

Definition of Evaluation  A large number of patterns and rules exist in database. Many of them has no interest to the user. 5 Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

Measure of interestingness 6  Measure of interestingness has two approaches:  Objective: where the interestingness is measured in term of its structure and underlying data used in the discovery process.

Measure of interestingness 7  Measure of interestingness has two approaches:  Subjective: Subjective measure do not depended only in the structure of a rule and the data used, but also on the user who examines the pattern. These measures recognize that a pattern of interest to one user, may be no interest to another user.

Training versus Testing 8  “Just trust me!” does not work in evaluation.  Error on the training data is not a good indicator of performance on future data.  Simple solution probably not be exactly the same as the training that can be used if lots of (labeled) data is available.  Split data into training and test set.

Training versus Testing 9  A strong and effective way to evaluate results is to hide some data and then do a fair comparison of training results to unseen data.  In this way it prevents poor results and gives the developers time to extract the best performance from the application system.  Many kinds of splitting data into training and testing most common holdout and cross validation

Cluster evaluation 10  One type of measure allows us to evaluate different sets of clusters without external knowledge and is called an internal quality measure; it is used when we don't have external knowledge about the clustering data.  Overall similarity is an example for internal quality measure and will be discussed below.

Cluster evaluation 11  The second type of measures lets us evaluate the quality of clustering by comparing the clusters produced by the clustering techniques to known classes (external knowledge).  This type of measure is called an external quality measure and we will discuss two external qualities which are entropy and F-measure.

Cluster evaluation 12  There are many different quality measures and the performance and relative ranking of different clustering algorithms can vary substantially depending on which measure is used.

Thanks 13