Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.

Similar presentations


Presentation on theme: "1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential."— Presentation transcript:

1

2 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business

3 2 Outline Introduction –Why data mining? –What is data mining? –Data mining process Types of Data Mining Tasks Main Data Mining Tools Reading – T2, Ch.1

4 3 Why Business Intelligence Systems? Knowledge Management Problems (Drowning in data, starving for knowledge) 1.Can’t access data (easily) E.g., data from different branches, years, functional areas, etc. 2.Give me only what’s important (knowledge) E.g., which products do customers tend to buy together? 3.I need to reduce data to what’s important by slicing and dicing. E.g., by branch, product, year, etc.

5 4 Why Business Intelligence Systems? 4.Data inconsistency and poor data quality E.g., the 2001 PC sales amount in SLC from the CFO and the SLC Account Manager are not the same. 5.Need to improve the practices of making informed decisions. E.g., Did the VP for Marketing decide on the advertising budgets for branches in the SW region based on their sales performances over the last five years? 6.Hard and slow to query the database? E.g., VP for Marketing, CFO and Account Manager had to wait for the MIS Department to generate sales performance reports and analyses.

6 5 Why Business Intelligence Systems? ROI Problems 7.Can I get more value out of my data? Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time. 8.Can I do this cost-effectively? 9.Can I easily scale up or change how I get knowledge out of my data? Options: manually versus automatically identifying knowledge

7 6 Why data mining? OLAP can only provide shallow data analysis -- what –Ex: sales distribution by product

8 7 Why data mining? Shallow data analysis is not sufficient to support business decisions -- how –Ex: how to boost sales of other products –Ex: when people buy product 6 what other products do they are likely to buy? – cross selling

9 8 Why data mining? OLAP can only do shallow data analysis –OLAP is based on SQL SELECT PRODUCTS.PNAME, SUM(SALESFACTS.SALES_AMT) FROM DBSR.PRODUCTS PRODUCTS, DBSR.SALESFACTS SALESFACTS WHERE ( ( PRODUCTS.PRODUCT_KEY = SALESFACTS.PRODUCT_KEY ) ) GROUP BY PRODUCTS.PNAME; –The nature of SQL decides that complicated algorithm cannot be implemented with SQL. Complicated algorithms need to be developed to support deep data analysis – data mining

10 9 Why Data Mining? Walmart (!?) Diaper + Beer = ?$$$

11 10 Market Basket (Association Rule) Analysis market basket A market basket is a collection of items purchased by a customer in an individual customer transaction, which is a well-defined business activity Ex: a customer’s visit a grocery store an online purchase from a virtual store such as ‘Amazon.com’

12 11 Market Basket (Association Rule) Analysis Market basket analysis Market basket analysis is a common analysis run against a transaction database to find sets of items, or itemsets, that appear together in many transactions. Each pattern extracted through the analysis consists of an itemset and the number of transactions that contain it.Applications: improve the placement of items in a store the layout of mail-order catalog pages the layout of Web pages others?

13 12 Degenerate Key: ORDER_NO Degenerate key provides additional grouping of fact records Impractical to view market baskets using OLAP tools

14 13 Why data mining? OLAP results generated from data sets with large number of attributes are difficult to be interpreted –Ex: cluster customers of my company --- target marketing –Pick two attributes related to a customer: income level and sales amount

15 14 Why data mining? –Ex: cluster customers of my company --- target marketing –Pick three attributes related to a customer: income level, education level and sales amount

16 15 What is data mining? Data mining is a process to extract hidden and interesting patterns from data. Data mining is a step in the process of Knowledge Discovery in Database (KDD).

17 16 What is NOT Data Mining? Not SQL language –SQL : extraction of detailed data Not OLAP –OLAP : summary,trends, forecasts Not Magic: –Data Mining: Based on algorithms that can discover hidden patterns. It is interactive, not fully automated

18 17 Major data mining tasks Association rule mining – e.g., to cross sell, identify other items that a customer tends to buy if the customer has already purchased item A Clustering – e.g., for target marketing identify clusters of similar customers Classification – e.g., for fraud detection, identify which customer or transaction is fraudulent

19 18 Steps of the KDD Process Data Step 1: Selection Step 2: Cleaning Step 3: Transformation Step 4: Data Mining Step 5: Interpretation & Evaluation Target Data Preprocessed Data Transformed Data Patterns Knowledge

20 19 Steps of the KDD Process Step 1: select interested columns (attributes) and rows (records) to be mined. Step 2: clean errors from selected data Step 3: data are transformed to be suitable for high performance data mining Step 4: data mining Step 5: filter out non-interesting patterns from data mining results

21 20 Data mining – on what kind of data Transactional Database Data warehouse Flat file Web data –Web content –Web structure –Web log

22 21 Raw data Target data for DW Preprocessed data for DW Transformed data for DW Data warehouse Step 2: Selection Step 3: Cleaning & preprocessing Step 4: Transformation Patterns Target data for DM Preprocessed data for DM Transformed data for DM Step 1: Selection Step 2: Cleaning & preprocessing Step 3: Transformation Step 4: Data mining Step 5: Interpretation & evaluation OLAP & reporting Discovered knowledge Step 1: Acquisition Domain expert

23 22 Data Mining Tools Over 100 commercial data mining tools available, new entries keep arriving Tools offer a variety of functionality and features, making evaluation and comparison difficult

24 23 Evaluation Criteria Database or Flat files Data Mover (Data Access) Data Mining Engine Tool Manager (Often GUI) Server Side Visualization Tools Client Side End Users 2.Data Access 1. System Requirements 4.User Interface 3. Mining Performance 5. Visualization

25 24 Data Mining Tools: Market Leaders Class choice

26 25 Web Analytics Software Providers http://surfaid.dfw.ibm.com/web/home/index.html http://pro.blogger.com/ http://www.clickstream.com/ http://www.deepmetrix.com/index.asp?source=google&keyword=web+analytics http://www.eloqua.com/srch/analytics.asp http://surfaid.dfw.ibm.com/web/home/index.html http://www.intellitracker.com/ http://www.maxamine.com/ http://www.mediahouse.com/ http://www.netiq.com/webtrends/default.asp http://www.omniture.com/products.html http://www.sitebrand.com/?source=jan http://www.statsoftinc.com/ http://www.urchin.com/ http://www.webabacus.com/ http://www.websidestory.com/ http://www.databeacon.com/index_IE.html http://www.sane.com/ads/whoiscoming.html


Download ppt "1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential."

Similar presentations


Ads by Google