© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,

Slides:



Advertisements
Similar presentations
DATA MINING Introductory
Advertisements

Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
DATA MINING Introductory and Advanced Topics Part I
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Data Mining By Archana Ketkar.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CIS 674 Introduction to Data Mining
Data Mining: An Introduction Wing Kee Ho Xiaohua Luan.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
DATA MINING Part I IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275,
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Southern Methodist University
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
1 DATA MINING Source : Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
© Prentice Hall1 CIS 674 Introduction to Data Mining Srinivasan Parthasarathy Office Hours: TTH 4:30-5:25PM DL693.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
1 DATA MINING Introductory and Advanced Topics Part I References from Dunham.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING LECTURE 1 INTRODUCTION TO DATA MINING.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
Data Mining ICCM
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
DATA MINING CSE 8331 Spring 2002 Part I
Introduction to Data Mining
DATA MINING Introductory and Advanced Topics Part I
Sangeeta Devadiga CS 157B, Spring 2007
DATA MINING Introductory and Advanced Topics Part I
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
DATA MINING Introductory and Advanced Topics Part II - Clustering
Supporting End-User Access
Data Mining: Concepts and Techniques
DATA MINING Introductory and Advanced Topics Part I
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING Introductory and Advanced Topics Part I
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
DATA MINING Source : Margaret H. Dunham
Presentation transcript:

© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall,

© Prentice Hall2 Data Mining Outline –Introduction –Related Concepts –Data Mining Techniques

© Prentice Hall3 Introduction Outline Define data mining Define data mining Data mining vs. databases Data mining vs. databases Basic data mining tasks Basic data mining tasks Data mining issues Data mining issues Goal: Provide an overview of data mining.

© Prentice Hall4 Introduction Data is growing at a phenomenal rate ( read “How Much Information Is There In the World?” By Michael Lesk ) Data is growing at a phenomenal rate ( read “How Much Information Is There In the World?” By Michael Lesk ) Users expect more sophisticated information Users expect more sophisticated information How? How? UNCOVER HIDDEN INFORMATION DATA MINING

© Prentice Hall5 Data Mining Definition Finding hidden information in a database Finding hidden information in a database Data Mining has been defined as Data Mining has been defined as “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data”. Similar terms Similar terms –Exploratory data analysis –Data driven discovery –Deductive learning –Discovery Science –Knowledge Discovery

© Prentice Hall6 Database Processing vs. Data Mining Processing Query Query –Well defined –SQL Query Query –Poorly defined –No precise query language Output Output – Subset of database Output Output –Not a subset of database

© Prentice Hall7 Query Examples Database Database Data Mining Data Mining – Find all customers who have purchased milk – Find all items which are frequently purchased with milk. (association rules) – Find all credit applicants with last name of Smith. – Identify customers who have purchased more than $10,000 in the last month. – Find all credit applicants who are poor credit risks. (classification) – Identify customers with similar buying habits. (Clustering)

© Prentice Hall8 Data Mining Models and Tasks

© Prentice Hall9 Basic Data Mining Tasks I Classification maps data into predefined groups or classes Classification maps data into predefined groups or classes –Supervised learning –Pattern recognition –Prediction Regression is used to map a data item to a real valued prediction variable. Regression is used to map a data item to a real valued prediction variable. Clustering groups similar data together into clusters. Clustering groups similar data together into clusters. –Unsupervised learning –Segmentation –Partitioning H =1.31 (Fem + Fib)

© Prentice Hall10 Basic Data Mining Tasks II Summarization maps data into subsets with associated simple descriptions. Summarization maps data into subsets with associated simple descriptions. –Characterization –Generalization Link Analysis uncovers relationships among data. Link Analysis uncovers relationships among data. –Affinity Analysis –Association Rules –Sequential Analysis determines sequential patterns.

© Prentice Hall11 KDD Process Selection: Obtain data from various sources. Selection: Obtain data from various sources. Preprocessing: Cleanse data. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner. Interpretation/Evaluation: Present results to user in meaningful manner. Modified from [FPSS96C]

© Prentice Hall12 KDD Process Ex: Shuttle Data Selection: Selection: –Select data (which missions etc) to use Preprocessing: Preprocessing: – Remove Spikes Transformation: Transformation: –DFT, DWT, PAA etc Data Mining: Data Mining: –Look for Rules… Interpretation/Evaluation: Interpretation/Evaluation: –Show rules to domain experts Potential User Applications: Potential User Applications: –Prediction of Failures

© Prentice Hall13 Data Mining Development Similarity Measures Hierarchical Clustering IR Systems Imprecise Queries Textual Data Web Search Engines Bayes Theorem Regression Analysis EM Algorithm K-Means Clustering Time Series Analysis Neural Networks Decision Tree Algorithms Algorithm Design Techniques Algorithm Analysis Data Structures Relational Data Model SQL Association Rule Algorithms Data Warehousing Scalability Techniques

© Prentice Hall14 KDD Issues Human Interaction Human Interaction Overfitting Overfitting Outliers Outliers Interpretation Interpretation Visualization Visualization Large Datasets Large Datasets High Dimensionality High Dimensionality

© Prentice Hall15 KDD Issues (cont’d) Multimedia Data Multimedia Data Missing Data Missing Data Irrelevant Data Irrelevant Data Noisy Data Noisy Data Changing Data (streams) Changing Data (streams) Integration Integration Application Application

© Prentice Hall16 Social Implications of DM Privacy Privacy Profiling Profiling Unauthorized use Unauthorized use

© Prentice Hall17 Data Mining Metrics Usefulness Usefulness Return on Investment (ROI) Return on Investment (ROI) Accuracy Accuracy Space/Time Complexity Space/Time Complexity