Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 1Data Mining: Concepts and.

Similar presentations


Presentation on theme: "Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 1Data Mining: Concepts and."— Presentation transcript:

1 Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 1Data Mining: Concepts and Techniques

2 OLTP it design for optimal transaction 2Data Mining: Concepts and Techniques

3 OLAP It design to give overview analysis of what happened! It is uses to built report answer the following : – Q1: who the supervisor that gave most discount? – Q2:in which Zip code did product a sell the most? To answer the questions, OLAP Cube are created. 3Data Mining: Concepts and Techniques

4 Example Assume we have made a record of the weather conditions during a two-week period, along with the decisions of a tennis player whether or not to play tennis on each particular day. We have values of four independent variables (outlook, temperature, humidity, windy) and one dependent variable (play) Consider our data stored in a relational table as follows: 4Data Mining: Concepts and Techniques

5 Example (cont.) DayoutlookTemperaturehumiditywindyPlay 1Sunny85 falseNo 2Sunny8090trueno 3overcast8386falseYes 4Rainy7096falseYes 5Rainy6880falseYes 6Rainy6570trueNo 7overcast6465trueyes 8Sunny7295falseno 9Sunny6970falseyes 10Rainy7580falseyes 11Sunny7570trueyes 12overcast7290trueyes 13overcast8175falseyes 14Rainy7191trueno 5Data Mining: Concepts and Techniques

6 Example (cont.) By querying a DBMS containing the above table we may answer questions like:  What was the temperature in the sunny days? {85, 80, 72, 69, 75}  Which days the humidity was less than 75? {6, 7, 9, 11}  Which days the temperature was greater than 70? {1, 2, 3, 8, 10, 11, 12, 13, 14}  Which days the temperature was greater than 70 and the humidity was less than 75? The intersection of the above two: {11} 6Data Mining: Concepts and Techniques

7 Example (cont.) OLAP: Using OLAP we can create a Multidimensional Model of our data (Data Cube). For example using the dimensions: time, outlook and play we can create the following model. 7Data Mining: Concepts and Techniques

8 Example (cont.) Obviously here time represents the days grouped in weeks (week 1 - days 1, 2, 3, 4, 5, 6, 7; week 2 - days 8, 9, 10, 11, 12, 13, 14) over the vertical axis. The outlook is shown along the horizontal axis and the third dimension play is shown in each individual cell as a pair of values corresponding to the two values along this dimension - yes / no. Thus in the upper left corner of the cube we have the total over all weeks and all outlook values. Yes/Nosunnyrainyovercast Week 1 0 / 22 / 12 / 0 Week 2 2 / 11 / 12 / 0 8Data Mining: Concepts and Techniques

9 Example (cont.) By apply "Drill-down" to our data cube over the time dimension. This assumes the existence of a concept hierarchy for this attribute. We can show this as a horizontal tree as follows: 9Data Mining: Concepts and Techniques

10 Example (cont.) Time week1 day1 day2 day3 day4 day5 day6 day7 week2 day1 day2 day3 day4 day5 day6 day7 10Data Mining: Concepts and Techniques

11 Example (cont.) The drill-down operation is based on climbing down the concept hierarchy, so that we get the following data cube: Yes/ Nosunnyrainyovercast 10 / 10 / 0 20 / 10 / 0 3 1 / 0 40 / 01 / 00 / 0 5 1 / 00 / 0 6 0 / 10 / 0 7 1 / 0 80 / 10 / 0 91 / 00 / 0 100 / 01 / 00 / 0 111 / 00 / 0 120 / 0 1 / 0 130 / 0 1 / 0 140 / 00 / 10 / 0 11Data Mining: Concepts and Techniques

12 Multidimensional data model By using same example and change some values: play has just two values - yes and no, it can replace them by 1 and 0  This will allows us to add up values and thus get the total number of days when tennis was played and at the same time the number of days tennis was not played Rename the day attribute into time, which is more general and will allow us to use other time units (e.g. weeks). Thus we get the following relational table: 12Data Mining: Concepts and Techniques

13 Multidimensional data model (cont.) timeoutlooktemperaturehumiditywindyplay 1sunny85 false0 2sunny8090true0 3overcast8386false1 4rainy7096false1 5rainy6880false1 6rainy6570true0 7overcast6465true1 8sunny7295false0 9sunny6970false1 10rainy7580false1 11sunny7570true1 12overcast7290true1 13overcast8175false1 14rainy7191true0 13Data Mining: Concepts and Techniques

14 Concept hierarchies 1- attributes day, temperature and humidity we can group values in subsets and name these subsets as following : Day: all ______|_________ | | week 1 week 2 _____|_____ _______|_______ | | | | | | | | | | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14Data Mining: Concepts and Techniques

15 Concept hierarchies (cont.) Temperature: all ____________|_____________ | | | hot mild cool _ |___ __|____ ___|____ | | | || | | | | | | | 80 81 83 85 70 71 72 7564 65 68 69 15Data Mining: Concepts and Techniques

16 Concept hierarchies (cont.) Humidity: all ___|___________ | | high normal ______|_______ ___|____ | | | | | | | | | | 85 86 90 91 95 96 65 70 75 80 16Data Mining: Concepts and Techniques

17 Concept hierarchies (cont.) We may also extend the sets of numbers or replace them with intervals, which will make the hierarchy complete (covering all possible values). For example, humidity may look like this: all ____|____ | | high normal | | [85,96] [65,84] 17Data Mining: Concepts and Techniques

18 Concept hierarchies (cont.) 2- For the nominal (non numeric) attributes outlook and windy we define one-level hierarchies, as their values cannot be ordered or grouped. outlook: all _______|________ | | | sunny rainy overcast 18Data Mining: Concepts and Techniques

19 Concept hierarchies (cont.) windy: all ___|____ | | true false 19Data Mining: Concepts and Techniques

20 Data cube The number of dimensions define the total number of data cubes that can be created. number of elements is 2 N elements; N is an number attributes 20Data Mining: Concepts and Techniques

21 Data cube (cont.) To create a data cube we have to: 1- Select dimensions, that is select a subset of attributes.  For example, select time and temperature. Thus we will create a two- dimensional data cube. 2- Select levels in the concept hierarchies.  For example, let us select weeks for time and degrees for temperature. 3- Select a measure to populate the cube. This is the attribute whose values will be aggregated across the dimensions (obviously it has to be numeric).  For example, Let us select play. 21Data Mining: Concepts and Techniques

22 Data cube (cont.) By placing the time values in the rows and the temperature values in the columns we get the following cube: 646568697071727580818385 Week1 101010000010 week2 000100120100 The numbers in the internal cells are obtained by adding up the values of the play attribute, where the time and the temperature attribute are equal to the values in the corresponding row and column For example the value 2 (row 2, column 8) means that tennis was played two days during week 2 when the temperature was 75. 22Data Mining: Concepts and Techniques

23 OLAP operations Rollup: assume we want to change the level that we selected for the temperature hierarchy to the intermediate level (hot, mild, cool). Roll up produces the following cube: coolmildhot week 1211 week 2131 23Data Mining: Concepts and Techniques

24 OLAP operations (cont.) Drill-down the drill down of the pervious data cube over the time dimension produces the following: 24Data Mining: Concepts and Techniques

25 OLAP operations (cont.) coolmildhot day 1000 day 2000 day 3001 day 4010 day 5100 day 6000 day 7100 day 8000 day 9100 day 10010 day 11010 day 12010 day 13001 day 14000 25Data Mining: Concepts and Techniques

26 Lattice of cubes, slice and dice operations Lattice : there are five dimension: Time, outlook, temperature, humidity, windy. 26Data Mining: Concepts and Techniques

27  0-D (apex) cuboids : { all}  1-D cuboids:{ Time}, {Outlook}, {Temperature}, {Humidity}, { Windy}  2-D cuboids: { {Time, Outlook}, {Time, Temperature}, { Time, Humidity}, {Time, Windy}, {Outlook, Temperature}, {Outlook, Humidity}, {Outlook, Windy}, {Temperature, Humidity}, { Temperature, Windy}, {Humidity, Windy} } Lattice of cubes, slice and dice operations (cont.) 27Data Mining: Concepts and Techniques

28 Lattice of cubes, slice and dice operations (cont.)  3- D Cuboids : { { Time, Outlook, Temperature}, {Time, Outlook, Humidity}, {Time, Outlook, Windy}, {Time, Temperature, Humidity}, {Time, Temperature, Windy}, {Time, Humidity, Windy} {Outlook, Temperature, Humidity}, { Outlook, Temperature, Windy}, {Outlook, Humidity, Windy} {Temperature, Humidity, Windy} } 28Data Mining: Concepts and Techniques

29  4-D cuboids: { { Time, Outlook, Temperature, Humidity}, {Time, Outlook, Temperature, Windy}, {Time, Outlook, Humidity, Windy}, {Time, Temperature, Humidity, Windy} {Outlook, Temperature, Humidity, Windy} }  5- D cuboids { Time, Outlook, Temperature, Humidity, Windy} Lattice of cubes, slice and dice operations (cont.) 29Data Mining: Concepts and Techniques

30 There are two other OLAP operations that are related to the selection of a cube - slice and dice. Slice : performs a selection on one dimension of the given cube, thus resulting in a subcube.  For example, if we make the selection (temperature=cool) we will reduce the dimensions of the cube from two to one, resulting in just a single column from the pervious tables. So, the result will be as following: Lattice of cubes, slice and dice operations (cont.) 30Data Mining: Concepts and Techniques

31 Cool day 10 day 20 day 30 day 40 day 51 day 60 day 71 day 80 day 91 day 100 day 110 day 120 day 130 day 140 Lattice of cubes, slice and dice operations (cont.) 31Data Mining: Concepts and Techniques

32 The dice operation works similarly and performs a selection on two or more dimensions.  For example, applying the selection (time = day 3 OR time = day 4) AND (temperature = cool OR temperature = hot) to the original cube we get the following subcube (still two-dimensional): CoolHot day 301 day 400 Lattice of cubes, slice and dice operations (cont.) 32Data Mining: Concepts and Techniques

33 The End Data Mining: Concepts and Techniques33


Download ppt "Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 1Data Mining: Concepts and."

Similar presentations


Ads by Google