MIS 451 Building Business Intelligence Systems Introduction to Data Mining
Why data mining? OLAP can only provide shallow data analysis -- what Ex: sales distribution by product
Why data mining? Shallow data analysis is not sufficient to support business decisions -- how Ex: how to boost sales of other products Ex: when people buy product 6 what other products do they are likely to buy? – cross selling
Why data mining? OLAP can only do shallow data analysis OLAP is based on SQL SELECT PRODUCTS.PNAME, SUM(SALESFACTS.SALES_AMT) FROM DBSR.PRODUCTS PRODUCTS, DBSR.SALESFACTS SALESFACTS WHERE ( ( PRODUCTS.PRODUCT_KEY = SALESFACTS.PRODUCT_KEY ) ) GROUP BY PRODUCTS.PNAME; The nature of SQL decides that complicated algorithm cannot be implemented with SQL. Complicated algorithms need to be developed to support deep data analysis – data mining
Why data mining? OLAP results generated from data sets with large number of attributes are difficult to be interpreted Ex: cluster customers of my company --- target marketing Pick two attributes related to a customer: income level and sales amount
Why data mining? Ex: cluster customers of my company --- target marketing Pick three attributes related to a customer: income level, education level and sales amount
What is data mining? Data mining is a process to extract hidden and interesting patterns from data. Data mining is a step in the process of Knowledge Discovery in Database (KDD).
Steps of the KDD Process Step 5: Interpretation & Evaluation Step 4: Data Mining Knowledge Patterns Step 3: Transformation Step 2: Cleaning Transformed Data Preprocessed Data Step 1: Selection Target Data Data
Steps of the KDD Process Step 1: select interested columns (attributes) and rows (records) to be mined. Step 2: clean errors from selected data Step 3: data are transformed to be suitable for high performance data mining Step 4: data mining Step 5: filter out non-interesting patterns from data mining results
Data mining – on what kind of data Transactional Database Data warehouse Flat file Web data Web content Web structure Web log
Major data mining tasks Association rule mining – cross selling Clustering – target marketing Classification – potential customer identification, fraud detection
Reading : data mining book chapter 1