Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

(Some slides courtesy of Rich Caruana, Cornell University)
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Data Mining 198:541. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Definition Data mining is the exploration and analysis of large.
ICS 421 Spring 2010 Data Mining 1 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/6/20101Lipyeow Lim.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Normalization and Data Mining R&G Chapter 19 Lecture 27 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
1 Data Mining, Database Tuning Tuesday, Feb. 27, 2007.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Chapter 5: Data Mining for Business Intelligence
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Definition Data mining is the exploration and analysis of large quantities of data.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining Functionalities
Data Mining – Intro.
Data Mining ICCM
DATA MINING © Prentice Hall.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Adrian Tuhtan CS157A Section1
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
I don’t need a title slide for a lecture
CSE591: Data Mining by H. Liu
Presentation transcript:

Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Definition Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. Example pattern: 62% of customers who bought milk bought cheese as well

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Definition (Cont.) Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. Valid: The patterns hold in general. Novel: We did not know the pattern beforehand. Useful: We can devise actions from the patterns. Understandable: We can interpret and comprehend the patterns.

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Why Use Data Mining Today? Human analysis skills are inadequate: Volume and dimensionality of the data High data growth rate Availability of: Data Storage Computational power Off-the-shelf software Expertise

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Sources of Data Supermarket scanners, POS data Credit card transactions Direct mail response Call center records ATM machines Demographic data Sensor networks Cameras Web server logs Customer web site trails

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Why Use Data Mining Today? Competitive pressure! “ The secret of success is to know something that nobody else knows. ” Aristotle Onassis Competition on service, not only on price (Banks, phone companies, hotel chains, rental car companies) Personalization, The real-time enterprise

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. The Knowledge Discovery Process Steps: lIdentify business problem lData mining lAction lEvaluation and measurement lDeployment and integration into businesses processes

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Data Mining Step in Detail 2.1 Data preprocessing Data selection: Identify target datasets and relevant fields Data cleaning Remove noise and outliers Data transformation Create common units Generate new fields 2.2 Data mining model construction 2.3 Model evaluation – present to the end user in understandable form (visually)

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Preprocessing and Mining Original Data Target Data Preprocessed Data Patterns Knowledge Data Integration and Selection Preprocessing Model Construction Interpretation

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. What is a Data Mining Model? A data mining model is a description of a specific aspect of a dataset. It produces output values for an assigned set of input values. Examples: Linear regression model Classification model Clustering

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Data Mining: Types of Data Relational data and transactional data Spatial and temporal data, spatio-temporal observations Time-series data Text Images, video Mixtures of data Sequence data Features from processing other data sources

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Types of Variables Numerical: Domain is ordered and can be represented on the real line (e.g., age, income) Nominal or categorical: Domain is a finite set without any natural ordering (e.g., occupation, marital status, race) Ordinal: Domain is ordered, but absolute differences between values is unknown (e.g., preference scale, severity of an injury)

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Applications of Frequent Itemsets Market Basket Analysis Association Rules Classification (especially: text) Seeds for construction of Bayesian Networks Web log analysis Collaborative filtering

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Frequent Itemset Itemset – set of items purchased {pen}, {pen, milk}, … Support of the itemset is the fraction of transactions in database that contain all the items in the itemset. Frequent Itemset - If support is higher than the user-specified minimal support The a Priori property – Every subset of a frequent itemset is also a frequent itemset

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Frequent Itemset – refined algorithm Min support 70% Level 1 – finds that the items {pen}, {milk}, and {ink} are frequent itemset Level 2 following the a Priori Property set of two items can be only from frequent itemset: {pen, milk}, {pen, ink} and {ink, milk}. We find that the itemsets {pen, milk}, {pen, ink} are frequent. Level 3 in not required as item (ink, milk} is not frequent so therefore itemset {pen, ink, milk} is not frequent as well.

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Iceberg queries We can apply logic from the refined frequent itemset algorithm to the Iceberg queries. Consider example: SELECT custID, Item, sum(qty) FROM Purchase GROUP BY custID, Item HAVIN SUM(qty)> 5 This query would perform better if we look only for customers or items that satisfy the criteria: OR SELECT custID, sum(qty) SELECT Item, sum(qty)FROM Purchase GROUP BY custID GROUP BY Item HAVIN SUM(qty)> 5

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Association Analysis Consider shopping cart filled with several items Market basket analysis tries to answer the following questions: Who makes purchases? What do customers buy together? In what order do customers purchase items? When do customers purchase the most and what?

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Market Basket Analysis Given: A database of customer transactions Each transaction is a set of items Example: Transaction with TID 111 contains items {Pen, Ink, Milk, Juice}

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Market Basket Analysis (Contd.) Coocurrences 80% of all customers purchase items X, Y and Z together. Association rules 60% of all customers who purchase X and Y also buy Z. Sequential patterns 60% of customers who first buy X also purchase Y within three weeks.

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Confidence and Support We prune the set of all possible association rules using two interestingness measures: Support of a rule: X  Y has support s if P(XY) = s Represents percentage of the transactions that contain all these items Confidence of a rule: X  Y has confidence c if P(sup(LHS U RHS) | sup (LHS)) = c Confidence for a rule X  Y is the percentage of such transactions that also contain all items in Y We can also define Support of an itemset (a coocurrence) XY: XY has support s if P(XY) = s

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Example Examples: {Pen} => {Milk} Support: 100% Confidence: 75% {Ink} => {Pen} Support: 75% Confidence: 100%

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Example Find all itemsets with support >= 75%?

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Example Can you find all association rules with support >= 50%?

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Market Basket Analysis: Applications Sample Applications Direct marketing Fraud detection for medical insurance Floor/shelf planning Web site layout Cross-selling

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Association Rules and ISA Hierarchies Or Category hierarchy, can be imposed on group of items in same hierarchy such as, Pen and Ink belong to Stationary while Juice and milk belong to Beverages. When applying Assoc. Rules on hiearchy it allows us to detect relationship between different levels of hierarchies.

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Generalised Association Rules “On a day when a pen is purchased, it is likely that the milk is also purchased” If we use the date field as group we can consider more general problem called calendric market basket analysis. Every Thursday, First Sunday every Month, First Monday every Semester, etc

Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. The use of Assoc. Rules for prediction Are widely used for prediction, however such predictive usage is not justified without additional analysis and domain knowledge.