Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.

Slides:



Advertisements
Similar presentations
DATA MINING Introductory
Advertisements

Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining By Archana Ketkar.
Lecture14: Association Rules
Classification.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CIS 674 Introduction to Data Mining
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5: Data Mining for Business Intelligence
DATA MINING Part I IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275,
Southern Methodist University
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data Mining – Day 2 Fabiano Dalpiaz Department of Information and Communication Technology University of Trento - Italy
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
© Prentice Hall1 CIS 674 Introduction to Data Mining Srinivasan Parthasarathy Office Hours: TTH 4:30-5:25PM DL693.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Decision Tree (Rule Induction)
DATA MINING By Cecilia Parng CS 157B.
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Academic Year 2014 Spring Academic Year 2014 Spring.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Copyright KEYSOFT Solutions.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Chapter 2: Data Mining Dr. Goutam Sarker,
Oracle Advanced Analytics
An Introduction to WEKA
Data Mining – Intro.
Data Mining ICCM
Data Mining Motivation: “Necessity is the Mother of Invention”
DATA MINING © Prentice Hall.
Chapter 6 Classification and Prediction
Data Mining 101 with Scikit-Learn
Waikato Environment for Knowledge Analysis
Adrian Tuhtan CS157A Section1
Classification and Prediction
Sangeeta Devadiga CS 157B, Spring 2007
Data Analysis.
I don’t need a title slide for a lecture
Prepared by: Mahmoud Rafeek Al-Farra
DATA MINING Introductory and Advanced Topics Part II - Clustering
Supporting End-User Access
Classification and Prediction
©Jiawei Han and Micheline Kamber
Decision Tree (Rule Induction)
Presentation transcript:

Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization

Data Mining Outline Introduction Classification Clustering Association Rules

Data Mining Outline Introduction Classification Clustering Association Rules

Introduction Data is growing at a phenomenal rate Users expect more sophisticated information How? UNCOVER HIDDEN INFORMATION DATA MINING

Data Mining Definition Finding hidden information in a database Fit data to a model: descriptive or predictive Similar terms –Exploratory data analysis –Data driven discovery –Deductive learning

But it isn’t Magic You must know what you are looking for You must know how to look for it Suppose you knew that a specific cave had gold: What would you look for? How would you look for it? Might need an expert miner

“ If it looks like a duck, walks like a duck, and quacks like a duck, then it’s a duck.” Description BehaviorAssociations Classification Clustering Link Analysis “ If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then it’s a terrorist.”

Query Examples Database Data Mining – Find all customers who have purchased milk – Find all items which are frequently purchased with milk. (association rules) – Find all credit applicants with last name of Smith. – Identify customers who have purchase more than $10,000 in last month. – Find all credit applicants who are poor credit risks. (classification) – Identify customers with similar buying habits. (Clustering)

KDD Process Selection: Obtain data from various sources. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner. © Prentice Hall

Data Mining Outline Introduction Classification – Assign data to a predefined class –Decision Trees –Neural Networks –Distance Based Clustering Association Rules

Insect ID Abdomen Length Antennae Length Insect Class Grasshopper Katydid Grasshopper Grasshopper Katydid Grasshopper Katydid Grasshopper Katydid Katydid ??????? The classification problem can now be expressed as: Given a training database predict the class label of a previously unseen instance Given a training database predict the class label of a previously unseen instance previously unseen instance =

Classification Process (1): Model Construction Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Training Dataset This follows an example from Quinlan’s ID3

Output: A Decision Tree for “ buys_computer ” age? overcast student?credit rating? noyes fair excellent <=30 >40 no yes

Neural Network Example Tuple Input Output

Data Mining Outline Introduction Classification Clustering – Place data into groups –Hierarchical –K-Means –Partitional Association Rules

Clustering Examples Segment customer database based on similar buying patterns. Group houses in a town into neighborhoods based on similar features. Identify new plant species Identify similar Web usage patterns

Clustering vs. Classification No prior knowledge –Number of clusters –Meaning of clusters Unsupervised learning

Data Mining Outline Introduction Classification Clustering Association Rules – Find relationships between data –Apriori

Association Rules Example I = { Beer, Bread, Jelly, Milk, PeanutButter} Support of {Bread,PeanutButter} is 60%

Association Rules Ex (cont’d)

AR & Market Baskets Determine items often purchased together (Marketbasket Data) Determine optimal placement of data on store floor Determine items for sales and/or specials Increase sales of items

Summary Data Mining is a fast growing area with many applications. Data Mining algorithms are usually computationally expensive. Data Mining tools may be difficult to use effectively.