Download presentation
Presentation is loading. Please wait.
2
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business
3
2 Outline Introduction –Why data mining? –What is data mining? –Data mining process Types of Data Mining Tasks Main Data Mining Tools Reading – T2, Ch.1
4
3 Why Business Intelligence Systems? Knowledge Management Problems (Drowning in data, starving for knowledge) 1.Can’t access data (easily) E.g., data from different branches, years, functional areas, etc. 2.Give me only what’s important (knowledge) E.g., which products do customers tend to buy together? 3.I need to reduce data to what’s important by slicing and dicing. E.g., by branch, product, year, etc.
5
4 Why Business Intelligence Systems? 4.Data inconsistency and poor data quality E.g., the 2001 PC sales amount in SLC from the CFO and the SLC Account Manager are not the same. 5.Need to improve the practices of making informed decisions. E.g., Did the VP for Marketing decide on the advertising budgets for branches in the SW region based on their sales performances over the last five years? 6.Hard and slow to query the database? E.g., VP for Marketing, CFO and Account Manager had to wait for the MIS Department to generate sales performance reports and analyses.
6
5 Why Business Intelligence Systems? ROI Problems 7.Can I get more value out of my data? Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time. 8.Can I do this cost-effectively? 9.Can I easily scale up or change how I get knowledge out of my data? Options: manually versus automatically identifying knowledge
7
6 Why data mining? OLAP can only provide shallow data analysis -- what –Ex: sales distribution by product
8
7 Why data mining? Shallow data analysis is not sufficient to support business decisions -- how –Ex: how to boost sales of other products –Ex: when people buy product 6 what other products do they are likely to buy? – cross selling
9
8 Why data mining? OLAP can only do shallow data analysis –OLAP is based on SQL SELECT PRODUCTS.PNAME, SUM(SALESFACTS.SALES_AMT) FROM DBSR.PRODUCTS PRODUCTS, DBSR.SALESFACTS SALESFACTS WHERE ( ( PRODUCTS.PRODUCT_KEY = SALESFACTS.PRODUCT_KEY ) ) GROUP BY PRODUCTS.PNAME; –The nature of SQL decides that complicated algorithm cannot be implemented with SQL. Complicated algorithms need to be developed to support deep data analysis – data mining
10
9 Why Data Mining? Walmart (!?) Diaper + Beer = ?$$$
11
10 Market Basket (Association Rule) Analysis market basket A market basket is a collection of items purchased by a customer in an individual customer transaction, which is a well-defined business activity Ex: a customer’s visit a grocery store an online purchase from a virtual store such as ‘Amazon.com’
12
11 Market Basket (Association Rule) Analysis Market basket analysis Market basket analysis is a common analysis run against a transaction database to find sets of items, or itemsets, that appear together in many transactions. Each pattern extracted through the analysis consists of an itemset and the number of transactions that contain it.Applications: improve the placement of items in a store the layout of mail-order catalog pages the layout of Web pages others?
13
12 Degenerate Key: ORDER_NO Degenerate key provides additional grouping of fact records Impractical to view market baskets using OLAP tools
14
13 Why data mining? OLAP results generated from data sets with large number of attributes are difficult to be interpreted –Ex: cluster customers of my company --- target marketing –Pick two attributes related to a customer: income level and sales amount
15
14 Why data mining? –Ex: cluster customers of my company --- target marketing –Pick three attributes related to a customer: income level, education level and sales amount
16
15 What is data mining? Data mining is a process to extract hidden and interesting patterns from data. Data mining is a step in the process of Knowledge Discovery in Database (KDD).
17
16 What is NOT Data Mining? Not SQL language –SQL : extraction of detailed data Not OLAP –OLAP : summary,trends, forecasts Not Magic: –Data Mining: Based on algorithms that can discover hidden patterns. It is interactive, not fully automated
18
17 Major data mining tasks Association rule mining – e.g., to cross sell, identify other items that a customer tends to buy if the customer has already purchased item A Clustering – e.g., for target marketing identify clusters of similar customers Classification – e.g., for fraud detection, identify which customer or transaction is fraudulent
19
18 Steps of the KDD Process Data Step 1: Selection Step 2: Cleaning Step 3: Transformation Step 4: Data Mining Step 5: Interpretation & Evaluation Target Data Preprocessed Data Transformed Data Patterns Knowledge
20
19 Steps of the KDD Process Step 1: select interested columns (attributes) and rows (records) to be mined. Step 2: clean errors from selected data Step 3: data are transformed to be suitable for high performance data mining Step 4: data mining Step 5: filter out non-interesting patterns from data mining results
21
20 Data mining – on what kind of data Transactional Database Data warehouse Flat file Web data –Web content –Web structure –Web log
22
21 Raw data Target data for DW Preprocessed data for DW Transformed data for DW Data warehouse Step 2: Selection Step 3: Cleaning & preprocessing Step 4: Transformation Patterns Target data for DM Preprocessed data for DM Transformed data for DM Step 1: Selection Step 2: Cleaning & preprocessing Step 3: Transformation Step 4: Data mining Step 5: Interpretation & evaluation OLAP & reporting Discovered knowledge Step 1: Acquisition Domain expert
23
22 Data Mining Tools Over 100 commercial data mining tools available, new entries keep arriving Tools offer a variety of functionality and features, making evaluation and comparison difficult
24
23 Evaluation Criteria Database or Flat files Data Mover (Data Access) Data Mining Engine Tool Manager (Often GUI) Server Side Visualization Tools Client Side End Users 2.Data Access 1. System Requirements 4.User Interface 3. Mining Performance 5. Visualization
25
24 Data Mining Tools: Market Leaders Class choice
26
25 Web Analytics Software Providers http://surfaid.dfw.ibm.com/web/home/index.html http://pro.blogger.com/ http://www.clickstream.com/ http://www.deepmetrix.com/index.asp?source=google&keyword=web+analytics http://www.eloqua.com/srch/analytics.asp http://surfaid.dfw.ibm.com/web/home/index.html http://www.intellitracker.com/ http://www.maxamine.com/ http://www.mediahouse.com/ http://www.netiq.com/webtrends/default.asp http://www.omniture.com/products.html http://www.sitebrand.com/?source=jan http://www.statsoftinc.com/ http://www.urchin.com/ http://www.webabacus.com/ http://www.websidestory.com/ http://www.databeacon.com/index_IE.html http://www.sane.com/ads/whoiscoming.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.