Copyright © Curt Hill 2003-2013 Data Mining A Brief Overview.

Slides:



Advertisements
Similar presentations
Data Warehousing and Data Mining J. G. Zheng May 20 th 2008 MIS Chapter 3.
Advertisements

Data Mining: What? WHY? HOW?
Chapter 1 Business Driven Technology
Supporting End-User Access
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
1 Data Warehousing. 2 Data Warehouse A data warehouse is a huge database that stores historical data Example: Store information about all sales of products.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Week 9 Data Mining System (Knowledge Data Discovery)
Introduction to Data Warehousing. From DBMS to Decision Support DBMSs widely used to maintain transactional data Attempts to use of these data for analysis,
Data Mining Knowledge Discovery in Databases Data 31.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Data Mining: A Closer Look
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Business Intelligence
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Warehousing by Industry Chapter 4 e-Data. Retail Data warehousing’s early adopters Capturing data from their POS systems  POS = point-of-sale Industry.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Data mining: some basic ideas Francisco Moreno Excerpts from Fundamentals of DB Systems, Elmasri & Navathe and other sources.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
DATA MINING Using Association Rules by Andrew Williamson.
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining.
Data Mining – Intro.
By Arijit Chatterjee Dr
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS2502: Data Analytics Advanced Analytics - Introduction
Data and Applications Security Introduction to Data Mining
Adrian Tuhtan CS157A Section1
Supporting End-User Access
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Copyright © Curt Hill Data Mining A Brief Overview

Copyright © Curt Hill The Problem Huge volumes of data overwhelm traditional methods of data analysis such as: Spreadsheets Ad hoc queries Multidimensional analysis tools Statistical analysis packages

Copyright © Curt Hill What is Data Mining? Exploratory data analysis based on a data warehouse –Knowledge Discovery in Databases (KDD) Data Mining extracts previously unknown and potentially useful information –Rules, constraints, correlations, patterns, signatures and irregularities The goal is to automate the methods for finding these in the data

Copyright © Curt Hill Data Warehouse A database usually separated from the operational database(s) Used as a base for decision support systems –Upper and middle management –Not used for day to day management but for spotting trends and making path decisions Typically very large and composed of recent copies from the operational database(s) Data Mining is one of the applications that could use

Goals of Data Mining Prediction of future behaviors –Seasonal or non-seasonal trends –How will consumers respond to discounts? –Allows the enterprise to be ready Identification of item, event or activity –Intruders may be identified by the files they access or programs they use Copyright © Curt Hill

Goals Again Classification of categories of users or products –Shoppers may be categorized as: Discount seeking Rush Regular Attached to certain brand names –The store may be made more friendly to such Optimize the use of time, space, materials and money Copyright © Curt Hill

Knowledge Discovery There are several types of discoverable knowledge –Association Rules –Classification hierarchies –Sequential patterns –Time series patterns –Clustering Each of these needs more information Copyright © Curt Hill

Association Rules What we are looking for is knowledge of associations that are not obvious This has gained traction in market basket research –Very profitable information If a MRI has characteristic a and b then if often has c –This is an association rule Copyright © Curt Hill

Market Basket Model Premise: the items in a checkout transaction are not random Thus we analyze customer transactions for patterns or association rules These patterns may guide decisions on –Sale items –Shelf arrangement or product placement

Copyright © Curt Hill Retail Example A young father goes to the store to buy disposable diapers On his way through the store he sees a Sports Illustrated and buys it In general, people do not impulse buy disposable diapers, but while buying these, they may buy something else on impulse Can we examine retail transaction records and perceive the connection?

Association Rule Is of the form: X => Y –Where both X and Y could be sets of items The support of this rule is the percent of total transactions that have both The confidence of this rule is the number of transactions which have the first one divided by the number of transactions that have both High support and high confidence indicates rules that business decisions may be based upon this rule –Put magazine rack on the route to the diapers Copyright © Curt Hill

Agriculture Example LandSat are in polar orbits They record data on all land every 18 days A pixel is approximately 31 yards on a side Seven bands from near infrared to ultraviolet are recorded for each pixel Each produce a 1 byte value Can you get this data in a spreadsheet?

Copyright © Curt Hill Agriculural rule In middle summer a near infrared value in the range 48 to 255 and red in red in range 0 to 31 suggests that the yield will be 128 to 255 bushels acre If the support and confidence are high this suggests that the farmer should apply nitrogen to the areas where near infrared was less than 47 and red was greater than 32

Computational Difficulties Consider how many tickets a supermarket or department store might generate? In general, most of these tickets have more than two or three items The store carries thousands of items Discovering these association rules become computationally taxing One good reason to keep this off of the operational database Copyright © Curt Hill

Algorithm Properties There are a number of algorithms for finding these rules These typically exploit two properties: Downward closure The subset of a large itemset should also have large support Removing a few items does not hurt Antimonotocity The superset of a small itemset should have small support Copyright © Curt Hill

Classification Classifying data into predetermined groups Then we can deal with the groups in different ways AKA supervised learning –Developed by Artificial Intelligence The process of clustering is attempting to classify data in groups that are not predetermined Copyright © Curt Hill

Models The two typical models are decision trees and a set of rules We look at the data to build the model and then use the model for new data Consider in the next slide a decision tree for granting a credit card to an applicant Copyright © Curt Hill

Example: Decision Tree Copyright © Curt Hill Married SalaryBalance Age YesNo <25K >75K <5K GoodFairPoor >5K <25 Fair >25 Good

Clustering AKA unsupervised learning Classify the data into groups that you are not aware of to begin with A distance function must be supplied that describes the distance between two points –The points are often not purely numeric –They are often not in 2 dimensions or even 3 which makes things interesting Copyright © Curt Hill

Applications Marketing –Determine advertising, store placement, segmentation of customers Finance –Analysis of performance of securities Manufacturing –Optimizing resources, designing the manufacturing process Health Care –Discovery of items in X-Ray and MRI images Copyright © Curt Hill

Example Certain diseases switch on genes characteristic to that disease Drugs often switch off a gene In 2011 database of genes and what affected them was mined The result was that mice infected with small cell lung cancer were treated with an antidepressant, imipramine –The tumors were reduced Copyright © Curt Hill

Telco Example A local telephone company mines its connection data for possible marketing opportunities A phone very busy in the 3PM to 6PM range suggests a teenager –Pitch a teen phone Busy in the 9AM to 5PM suggests a home business –Pitch a business line Copyright © Curt Hill

Social Media Publicly viewable social media presents a very large quantity of data However it is: –Noisy –Unstructured –Dynamic It is of great interest in political campaigns, marketing, health care –This is where people express things first Copyright © Curt Hill

Finally Much of the analysis done in data mining has been done for centuries –What is different now is the amount and types of captured data There are a number of commercial tools for mining Many large companies have substantial investment and return on their mining activities Copyright © Curt Hill