Download presentation
Presentation is loading. Please wait.
Published byDavid Bradley Modified over 9 years ago
1
1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003
2
2 Oracle Relational Approach Customer Sale SaleItem Item Relational Tables Fact Measure Dimension Star Design Meta-Data Sale + Customer Materialized Views
3
3 Desired Sales Cube Dimensions Sales Dimensions State (ship) Month Category Style SkillLevel Size Color Manufacturer BindingStyle WeightMax? ItemMaterial? WaistWidth?
4
4 Early Data: Spreadsheets
5
5 External Tables: Attach to CSV create table OldSale_Ext ( SaleIDINTEGER, SaleDateDATE, ShipStateVARCHAR2(50), ShipZIPVARCHAR2(50), PaymentMethodVARCHAR2(50), SKUVARCHAR2(50), QuantitySoldINTEGER, SalePriceNUMBER(10,2) ModelIDVARCHAR2(250), ItemSizeNUMBER, ManufacturerIDINTEGER, create or replace directory csv_dir as ‘D:\students\BuildAllPowder\AllPowderSampleDataCSV'; CategoryVARCHAR2(50), ColorVARCHAR2(50), ModelYearINTEGER, GraphicsVARCHAR2(50), ItemMaterialVARCHAR2(50), ListPriceNUMBER(10,2), StyleVARCHAR2(50), SkillLevelINTEGER, WeightMaxNUMBER, WaistWidthNUMBER, BindingStyleVARCHAR2(50) ) Continued on next slide Warning: currency columns cannot have $ symbols or commas
6
6 External File Definition organization external ( type oracle_loader default directory csv_dir access parameters ( records delimited by newline fields terminated by ',' optionally enclosed by '"' lrtrim missing field values are null ( SaleID, SaleDate char date_format date mask "mm/dd/yyyy", ShipState, ShipZIP, PaymentMethod, SKU, QuantitySold, SalePrice, odelID, ItemSize, ManufacturerID, Category, Color, ModelYear, Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax, WaistWidth, BindingStyle ) location ('Lab 08-01 Early Sales.csv') ) reject limit unlimited;
7
7 Create Customer and Employee CustomerID and EmployeeID are missing from the old data. Instead of relying on blank cell values, create a new customer called “Walk-in” and a new employee called “Employee” Write down the ID numbers generated for these anonymous entries. If you use SQL, you can assign a value of zero to these entries. INSERT INTO Customer (CustomerID, LastName) Values (0,'Walk-in') INSERT INTO Employee (EmployeeID, LastName) Values (0,'Staff')
8
8 Extract Model Data SELECT DISTINCT OldSale_ext.ModelID, OldSale_ext.ManufacturerID, OldSale_ext.Category, OldSale_ext.Color, OldSale_ext.ModelYear, OldSale_ext.Graphics, OldSale_ext.ItemMaterial, OldSale_ext.ListPrice, OldSale_ext.Style, OldSale_ext.SkillLevel, OldSale_ext.WeightMax, OldSale_ext.WaistWidth, OldSale_ext.BindingStyle FROM OldSale_ext;
9
9 UNION Query for Models SELECT DISTINCT ModelID, ManufacturerID, Category, … FROM OldSales_ext UNION SELECT DISTINCT ModelID, ManufacturerID, Category, … FROM OldRentals_ext
10
10 Insert Model Data into ItemModel INSERT INTO ItemModel (ModelID, ManufacturerID, Category, Color, ModelYear, Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax, WaistWidth, BindingStyle) SELECT DISTINCT qryOldModels.ModelID, qryOldModels.ManufacturerID, qryOldModels.Category, qryOldModels.Color, qryOldModels.ModelYear, qryOldModels.Graphics, qryOldModels.ItemMaterial, qryOldModels.ListPrice, qryOldModels.Style, qryOldModels.SkillLevel, qryOldModels.WeightMax, qryOldModels.WaistWidth, qryOldModels.BindingStyle FROM qryOldModels;
11
11 Insert SKU Data into Inventory INSERT INTO Inventory (ModelID, SKU, ItemSize, QuantityOnHand) SELECT DISTINCT qryOldInventory.ModelID, qryOldInventory.SKU, qryOldInventory.ItemSize, 0 As QuantityOnHand FROM qryOldInventory; Note the use of the column alias to force a zero value for QuantityOnHand for each row CREATE VIEW qryOldInventory AS SELECT DISTINCT ModelID, SKU, ItemSize FROM OldSale_ext UNION SELECT DISTINCT ModelID, SKU, ItemSize FROM OldRental_ext;
12
12 Copy Sales Data INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod) SELECT DISTINCT OldSales_ext.SaleID, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethod FROM OldSales_ext; Note that if you have added data to your Sales table, your existing SaleID values might conflict with these You can solve the problem by adding a number to these values so they are all larger than your highest ID INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod) SELECT DISTINCT OldSales_ext.SaleID+5000, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethod FROM OldSales_ext;
13
13 Copy SaleItem Rows INSERT INTO SaleItem (SaleID, SKU, QuantitySold, SalePrice) SELECT DISTINCT OldSale_ext.SaleID+5000, OldSale_ext.SKU, OldSale_ext.QuantitySold, OldSale_ext.SalePrice FROM OldSale_ext; If you transformed the SaleID in the prior step for the Sale data, you must do the exact same calculation for SaleID in the SaleItem table
14
14 Copy Rental Data INSERT INTO Rental (RentID, RentDate, ExpectedReturn, PaymentMethod) SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.RentDate, OldRental_ext.ExpectedReturn, OldRental_ext.PaymentMethod FROM OldRental_ext; INSERT INTO RentItem (RentID, SKU, RentFee, ReturnDate) SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.SKU, OldRental_ext.RentFee, OldRental_ext.ReturnDate FROM OldRental_ext;
15
15 Discoverer Administrator: Load Business Area Schema Select tables Tables and views
16
16 Load Wizard Options: LOV Most options are selected by default Select the LOV option to have Discoverer build lookup lists
17
17 Discoverer: Business Area Tables shown as folders and named so managers understand them Columns shown as items Add a calculated item
18
18 Create a Data Hierarchy Select Category and Style from the SkiBoardStyle lookup table
19
19 Discoverer Desktop: New Workbook Select the dimensions and the fact item
20
20 Initial Crosstab Layout Row area Column area Page area
21
21 Discoverer Crosstab Browser Select all items Format options Totals
22
22 Time Series Analysis: Moving Average
23
23 Time Series Analysis: Discoverer
24
24 Sales by State for Regression Note that some states are missing from the list.
25
25 Regression Data Query CREATE VIEW StateSales2004 AS SELECT StateName, Income2001, Pop2002, Sum(SalePrice*QuantitySold) AS Sales2004 FROM Sale INNER JOIN StateDemographics ON Sale.ShipState = StateDemographics.StateCode INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID WHERE ShipState IS NOT NULL AND SaleDate Between '01-Jan-2004' And '31-Dec-2004' GROUP BY StateName, Income2001, Pop2002 ORDER BY StateName;
26
26 Regression Setup You should include the label row but be sure to check the box to show you included it
27
27 Regression Results Relatively high R-square Population is a significant predictor, Income is not
28
28 Association Rules/Market Basket Item to findPossible location Data mining samplesD:\Oracle\ora92\dm\demo\sample ORACLE_HOMED:\Oracle\ora92 JAVA_HOMEC:\OracleData\Ora92DS\jdk Locate folders
29
29 Copy Files to Protect Original compileSampleCode.bat executeSampleCode.bat Sample_AssociationRules.java Sample_AssociationRules_Transactional.property Sample_Global.property
30
30 Edit Sample_Global.property File miningServer.url=jdbc:oracle:thin:@YourServerName:1521:DBName miningServer.userName=odm miningServer.password=password inputDataSchemaName=powder outputSchemaName=powder timeout=120 If necessary, use enterprise manager to unlock and assign new passwords to accounts: odm and odm_mtr
31
31 Create New Table To Hold Transaction Basket Data CREATE TABLE MARKET_BASKET_TX_BINNED ( SEQUENCE_IDINTEGER, ATTRIBUTE_NAMEVARCHAR2(35), VALUENUMBER ); GRANT SELECT ON MARKET_BASKET_TX_BINNED TO odm; commit; If you use these names, you do not have to edit the Transactional.property file
32
32 Copy SaleItem Data INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE) SELECT SaleID, ItemModel.Category || '_' || ItemModel.Style AS AName, 1 As Value FROM SaleItem Inner Join Inventory ON SaleItem.SKU = Inventory.SKU Inner Join ItemModel ON Inventory.ModelID = ItemModel.ModelID GROUP BY SaleID, ItemModel.Category || '_' || ItemModel.Style;
33
33 Copy Sale Data INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE) SELECT SaleID, 'ID', SaleID FROM Sale; commit;
34
34 Remove Dashes from Attribute UPDATE MARKET_BASKET_TX_BINNED SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,instr(ATTRIBUTE_NAME,'-')-1) || '_' || substr(ATTRIBUTE_NAME,instr(ATTRIBUTE_NAME,'-')+1) WHERE instr(ATTRIBUTE_NAME,'-') > 0; commit; Run at least twice—until you get zero changes. Because a row might have more than one dash.
35
35 Limit Size of Attribute_Name UPDATE MARKET_BASKET_TX_BINNED SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,20); commit; This is critical—but is probably due to a bug in Oracle’s code. There is a slight chance it arises because of the 30 character name limitation in Oracle.
36
36 Compile and Run the Code SET ORACLE_HOME = D:\Oracle\ora92 SET JAVA_HOME = C:\OracleData\ora92DS\jdk compileSampleCode.bat Sample_AssociationRules.java executeSampleCode.bat Sample_AssociationRules Sample_AssociationRules_Transactional.property Type as all one line—do not hit until the end To redirect the output to a file, at the end, add: >myfile.txt
37
37 Sample Results Getting top 5 rules for model: Sample_AR_Model_tx sorted by support. Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816] Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967] Rule 101: If Board_Half_Pipe=1 then Clothes_=1 [support: 0.11357143, confidence: 0.4622093] Rule 9: If Clothes_=1 then Board_Half_Pipe=1 [support: 0.11357143, confidence: 0.23177843] Rule 100: If Ski_Freestyle=1 then Clothes_=1 [support: 0.09785714, confidence: 0.48070174] Get rules by support: Sample_AR_Model_tx, with minimum support of 0.16. Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816] Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967] Get rules by confidence: Sample_AR_Model_tx, with confidence of 0.56 or more. Investigate and think about the results. Do you have too many clothes targeted to half-pipe boards and freestyle skiers, or not enough?
38
38 GIS: Microsoft MapPoint The Discoverer worksheet places the data into rows and columns A dynamic copy of this sheet is used to remove the top rows
39
39 MapPoint Data Wizard
40
40 GIS Analysis of Sales
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.