Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

Similar presentations


Presentation on theme: "1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database."— Presentation transcript:

1 1 9 Data Warehouse CSC5301 Hachim Haddouti

2 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database Management Systems) at Technical University of Munich under Supervision of Prof. Bayer (Inventor of B-Tree) u Master in Computer Science (Knowledge Management Systems) at Techical University of Berlin u Project Manager at BMW Munich Germany u Senior Consultant and Project Manager at DaimlerChrysler Services (now called T-Systems, Deutsche Telekom) u Research Scientist with Prof. R. Bayer in Technical University of Munich u UNESCO Consultant u Visiting Scientist at Tsukuba University, Japan, University of Sta. Barbara University, California; University of Catania, Italy; Beijing Univ China … u Area of Interest: DBMS, Digital Libraries, Document & Content & Knowledge Management, XML databases and Web technologies, Multilinguality etc. u More at www.haddouti.de

3 3 9 Do You Remember? OLTP DSS MD drill down RollUp Slice/dice MOLAP ROLAP Star schema Data mining Data cube Data extraction Fact table

4 4 9 Why DW? Mining of mobile phone calls: (Caller, Callee, Time, Duration, Geogr. Location) ~ 100 B/tuple In Germany 10 7 users * 10 calls/(day*user) * 100 B/call = = 10 10 B/day ~ 3*10 12 B/year = 3 TB/year Scanning data at 10 7 B/s takes 3*10 12 /10 7 = 3*10 5 s > 3 days

5 5 9 Data Warehouses  “Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process” Inmon ( AP = analytical processing is missing) u Used for analysis of existing data u Resolves performance issues suffered by operational RDBMSs and OLTPs

6 6 9 Data Warehouse Architecture Figure 9.7

7 7 9 Model need abstract model with above operations suitable datastructures very large databases Relational Model? one-dimensional access via primary key n*m „relationships“ are 2-dimensional: (FK1, FK2)

8 8 9 The Multidimensional Data Model Requirements: must support typical analyses, queries like Sales of a product group digital cameras in Nov, Dec Jan Feb in Munich area u sorted by sales of each product in € u sorted by sales in numbers u sorted by shops

9 9 9 Data model ER Model u a disaster for querying a huge amount of data (time) u not understandable for users and they can not be navigated usefully by DBMS software. u hard to visualize; many possible connections between tables, u To avoid redundancy MD Model u better performance u Better data organisation u Better visualization u Business queries (why, what if)

10 10 9 Typical DWH Analyses/Queries u What are the consequences of new orders for production capacity w.r. to investment, personnel, maintenance, extra hours,... u Seasonal adaptions, e.g. when to produce how many skis, bikinis, convertibles,... u Influence of external financing on profits

11 11 9 Operations: aggregation slice dice (cube) rollup to coarser level drill down to more detailed level grouping sorting

12 12 9 Data Cube Representation

13 13 9 Slicing on Time Dimension

14 14 9 Dicing on Part Dimension

15 15 9 Steps to build a DWH u Acquisition of data u Data cleansing u Storage u Processing: AP u Maintenance,... Not possible with classical DB-technology alone

16 16 9 On-Line Analytical Processing u OLTP (online transaction processing) for operational data of enterprise, e.g. in relational DBMS, IMS, SAP/R3,... u DSS: Decision Support System to store data/information for strategic management decisions: aggregations, summaries, etc. u Optimized to work with data warehouses u Used to answer questions u Allows users to perceive data as a multidimensional data cube u Data mining

17 17 9 OLTP versus OLAP Thematic focus u OLTP: many small transactions (microscopic view of business processes, individual steps at lowest level, single order, delivery) u OLAP: finances in general, personnel in general,... u OLAP requires integration and unification of many detailed data into big picture u Time orientation u Durability: data extracted once, no updates

18 18 9 Technical Comparison OLTP vs OLAP u OLTP: high rate of updates, several thousand t/s u OLAP: read only transactions, very complex, DWH is loaded at certain time intervals, e.g. after the end of the month, quarter l Compute intensive l Special systems with new access methods, e.g. multidimensional data organization and access methods l Special OLAP systems necessary to offload OLTP systems

19 19 9 ROLAP and MOLAP Solution 1: ROLAP relational online analytical processing, built on top of relational DBS, additional middleware or client front end (star schema) Solution 2: MOLAP: multidimensional online analytical processing u new model u new data organizations u new algorithms u new query languages u new optimization techniques

20 20 9 Data Warehouse Structure

21 21 9 Rules for OLAP Systems u Multidimensional conceptual view u Transparency u Accessibility u Consistent reporting performance u Client/server architecture u Generic dimensionality

22 22 9 Rules for OLAP Systems u Dynamic sparse matrix handling u Multiuser support u Unrestricted, cross-dimensional operations u Intuitive data manipulation u Flexible reporting u Unlimited dimensions and aggregation levels


Download ppt "1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database."

Similar presentations


Ads by Google