Fall 2004Data Mining1 IE 483/583 Knowledge Discovery and Data Mining Dr. Siggi Olafsson Fall 2003.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems Data Mining outputs: What knowledge can Data Mining learn? By Eric Atwell, School of Computing,
Advertisements

Decision Tree Approach in Data Mining
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining – Output: Knowledge Representation
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining – Input: Concepts, instances, attributes Chapter 2.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.
1 CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Exercise in Machine Learning
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining – Input: Concepts, instances, attributes
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
School of Computer Science & Engineering
Prepared by: Mahmoud Rafeek Al-Farra
CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Prepared by: Mahmoud Rafeek Al-Farra
Classification and Prediction
A task of induction to find patterns
Data Mining CSCI 307 Spring, 2019
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Fall 2004Data Mining1 IE 483/583 Knowledge Discovery and Data Mining Dr. Siggi Olafsson Fall 2003

Fall 2004Data Mining2 What is Data Mining? (… and should I be here?)

Fall 2004Data Mining3 Dilbert Replies...

Fall 2004Data Mining4 Some Definitions “Data mining is the extraction of implicit, previously unknown, and potentially useful information from data.” “Data mining is the process of exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.”

Fall 2004Data Mining5 Classification Prediction Supervised Association discovery ClusteringUnsupervised What can Data Mining Do?

Fall 2004Data Mining6 Applications of Data Mining Manufacturing Process Improvement Sales and Marketing Mapping the Human Genome Diagnosing Breast Cancer Financial Crime Identification Portfolio Management

Fall 2004Data Mining7 Technical Background Machine Learning –Data mining: business-oriented use of AI Statistics –Regression, sampling, DOE, etc Decision Support –Data warehousing, data marts, OLAP, etc Interdisciplinary tools put together to form the process of knowledge discovery in databases …

Fall 2004Data Mining8 Historical Perspective < 40StatBayes theorem, regression, etc. 40sAINeural networks 50sAINearest neighbor, single link, perceptron StatResampling, bias reduction, jackknife 60sStatLinear models for classification, exploratory data analysis (EDA) IRSimilarity measures, clustering DBRelational data model 70sIRSmart IR systems AIGenetic algorithms StatEM algorithm, k-means clustering 80sAIKohonen maps, decision trees 90sDBAssociation rule algorithms, web & search engines, data warehousing, OLAP

Fall 2004Data Mining9 What Changed? Very large databases Increased computational power as enabler Business perspective

Fall 2004Data Mining10 Knowledge Discovery in Databases DatabasesData warehouse Prepared Data Model/StructuresKnowledge Data Warehouse Systems Engineering Knowledge Discovery and Data Mining

Fall 2004Data Mining11 Course Information We assume data is ready for mining Thus, we focus on: –models and structures, and –algorithms More information on course homepage

Fall 2004Data Mining12

Fall 2004Data Mining13 Course Outline Introduction Exploratory Data Mining Supervised Learning Unsupervised Learning Optimization Methods in Learning Selected Advanced Topics –Mining the Web –Customer Relationship Management (CRM) Course Review

Fall 2004Data Mining14 Questions?

Fall 2004Data Mining15 Data Mining Discover patterns in data –automatic or semi-automatic process –meaningful or useful pattern –large amounts of data What does such a pattern look like? Black boxTransparent box

Fall 2004Data Mining16 Describing Structural Patterns Some ways of representing knowledge: –Decision tables –Decision trees –Classification rules –Association rules –Regression trees –Clusters

Fall 2004Data Mining17 The Weather Problem

Fall 2004Data Mining18 A Decision List If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = truethen play = no If outlook = overcastthen play = yes If humidity = normalthen play = yes If none of the abovethen play = yes These are classification rules

Fall 2004Data Mining19 Association Rules Many association rules can be inferred: if temperature = cool then humidity = normal if humidity = normal and windy = false then play = yes if outlook = sunny and play = no then humidity = high

Fall 2004Data Mining20 Three Layers of the Process Inputs Outputs Algorithms

Fall 2004Data Mining21 Inputs Three forms –Concepts concept description - what you want to learn –Instances examples - what you learn from –Attributes features of instances - variables you have values for

Fall 2004Data Mining22 Concepts: Styles of Learning Classification (supervised) learning Association learning Clustering Numeric prediction

Fall 2004Data Mining23 Instances: Learn from Examples Set of instances to be classified, or associated, or clustered Example of concept to be learned Data set: flat file (single relation) –denormalization Family tree example –concept: sister –example: family tree

Fall 2004Data Mining24 Family Tree =

Fall 2004Data Mining25 Denormalizing Relational Data

Fall 2004Data Mining26 Denormalization Problems Computational and storage costs Trivial regularities customersproducts productsupplier suppliersupplier address Infinite relations

Fall 2004Data Mining27 Content of Instances: Attributes Instance characterized by values of its (predefined) set of attributes –Numeric (“continuous”) –Nominal (categorical) –Ordinal (rank) –Interval –Ratio Focus in this class

Fall 2004Data Mining28 Data Preparation Data … –assembly set of instances/denormalizing relational data –integration enterprise-wide database/data warehouse –cleaning missing data –aggregation good information

Fall 2004Data Mining29 ARFF Format Used by JAVA package (Weka) Independent, unordered instances No relationship between instances

Fall 2004Data Mining30 Weather Data

Fall 2004Data Mining31 Features % –Attribute types: Nominal and –List of instances –Missing values represented by ?

Fall 2004Data Mining32 Other Issues Missing data Inaccurate values Look at the data!!!

Fall 2004Data Mining33 Recall the Three Layers of the Data Mining Process Inputs Outputs (structural patterns) Algorithms Done Next

Fall 2004Data Mining34 Describing Structural Patterns Ways of representing knowledge: –Decision tables –Decision trees –Classification rules –Association rules –Regression trees –Clusters

Fall 2004Data Mining35 The Weather Problem

Fall 2004Data Mining36 A Decision List If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = truethen play = no If outlook = overcastthen play = yes If humidity = normalthen play = yes If none of the abovethen play = yes

Fall 2004Data Mining37 A Decision Tree Outlook HumidityWindy Play=No Sunny Rainy Overcast High Play=Yes Play=No TRUE

Fall 2004Data Mining38 Concepts: Styles of Learning Classification (supervised) learning Association learning Clustering Numeric prediction

Fall 2004Data Mining39 Classification Rules Classification easily read off decision trees How? Other direction possible, but not as straightforward If a and b then x If c and d then x

Fall 2004Data Mining40 Corresponding Decision Tree a bc cd d x x x y y y y y y n n n n n n

Fall 2004Data Mining41 Replicated Subtree Problem X=1 Y=1 b y y n n n aab If x=1 and y=0 then a If x=0 and y=1 then a If x=0 and y=0 then b If x=1 and y=1 then b

Fall 2004Data Mining42 Replicated Subtree Problem If x=1 and y=1 then a If z=1 and w=1 then a Otherwise b x,y,z,w take values 1,2,3

Fall 2004Data Mining43 If x and y then a EXCEPT if z then b Rules with exceptions Account for new instances Exceptions from exceptions, etc

Fall 2004Data Mining44 Association Rules Coverage (support): number of instances it predicts correctly Accuracy (confidence): coverage divided by number of instances it applies to Coverage = 4 Accuracy = 100% If temperature = cool then humidity = normal

Fall 2004Data Mining45 Interpretation If windy = false and play = no then outlook = sunny and humidity = high If windy = false and play = no then outlook = sunny If windy = false and play = no then humidity = high If humidity = high and windy = false and play = no then outlook = sunny

Fall 2004Data Mining46 The Shapes Problem Shaded=standing Unshaded=lying

Fall 2004Data Mining47 Instances

Fall 2004Data Mining48 Classification Rules If width  3.5 and height < 7.0 then lying If height  3.5 then standing Work well to classify these instances Problems?

Fall 2004Data Mining49 Relational Rules Rules comparing attributes to constants are called propositional rules Structural patterns? If width > height then lying If height > width then standing

Fall 2004Data Mining50 CPU Performance Example

Fall 2004Data Mining51 Numerical Prediction: regression equation

Fall 2004Data Mining52 Regression Tree CHMIN CACHMMAX  7.5 > 7.5 MMAX 64.6 MMAX  8.5  (8.5,28] >28 - Accuracy? - Large and possibly awkward

Fall 2004Data Mining53 Model Trees CHMIN CACHMMAX  7.5 > 7.5 MMAX LM4  8.5 >8.5 LM5LM6  > 28000

Fall 2004Data Mining54 Instance-Base Representation Store actual instances New instance: algorithm finds “most similar” stored instance Features –What is a similar instance? –Need store (all?) instances –Really a black box method

Fall 2004Data Mining55 Clusters: d e a j c k h f b i g d e a j c k h f b i g

Fall 2004Data Mining56 Next: Algorithms