Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.

Similar presentations


Presentation on theme: "Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing."— Presentation transcript:

1 Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing (DW) Week 2 Data Warehouse Architecture

2 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 2 DW System Architecture DW Characteristics DW Schema Outline

3 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 DW Architecture 1 2 3 The design steps: (1) analyse requirements for DW  (2) analyse source data  (3) design ETL

4 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 4 Summary data are very valuable in data warehouses because they pre-compute long operations in advance. A summary in Oracle is called a materialized view DW Architecture : Basic Architecture

5 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 5 Staging area simplifies building summaries and general warehouse management.  Integrate, clean, and process operational data before putting it into the warehouse DW Architecture : +Staging area

6 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 6 Data marts are systems designed for a particular line of business.  customize your warehouse's architecture for different groups within your organization.  For example, a financial analyst might want to analyse historical data for purchases and sales. DW Architecture : Staging + Data Marts

7 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 7 Data marts are systems designed for a particular line of business.  customize your warehouse's architecture for different groups within your organization.  For example, a financial analyst might want to analyse historical data for purchases and sales. DW Architecture : Data Marts + Cubes

8 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 8 DW is an  integrated,  subject-oriented,  time-variant,  non-volatile collection of data that provides support for decision marking DW Data Characteristics

9 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 9 Integrated  DW is centralized, consolidated database  DW integrates data derived from the entire organization from multiple sources with diverse data formats, for example  Customer’s country can be “ New Zealand ”, “ Australia ” in one database while they are “NZ”, “AU” in another database  Status of an “ Order ” for one department can be “ open ”, “ received ”, “ cancelled ” while for another department can be “ 1 ”, “ 2 ”, “ 3 ”  Some store data in relational databases, some store data in file systems DW Data Characteristics

10 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 10 Subject-oriented  DW data are organized and summarized by topics  Sales by Product, Sales by Customer, Sales by Region, Sales by Team  DW data are arranged and optimized to provide answers to questions (queries) from diverse functional areas  How much profit sales by product and region ?  Which top 5 products that were best sold in the US ?  List the name of customers who are living in Australia and have bought the same product every month? DW Data Characteristics

11 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 11 DW Data Characteristics Data mart is a small, single-subject DW that provides decision support to a small group of people

12 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 12 Time-variant  DW Data represent the flow of data through time  DW Data are recorded with a historical perspective  How much profit sales by product and region in the last 5 years ?  Which top 5 products that were best sold in the US during last Christmas ?  List customers who are living in Australia and have monthly spent more than $1,000 per month in the last two quarters DW Data Characteristics

13 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 13 Non-volatile  DW data are never updated or removed!! (read-only)  New data are also continually added  Requires a “ really huge space” to store all the data  Requires “ high performance databases ” for faster, reliable access  Requires “ scalable systems ” (as the data grow very quickly)  Think about DW data in Ebay  90PB of data (90,000 TB) just for customer transactions  100TB per day new data (as of May 2013) DW Data Characteristics

14 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 14 Underestimation of resources of data loading  Underestimate the time required to extract, clean, and load the data into the warehouse. Hidden problems with source systems  Hidden problems associated with the source systems feeding the data warehouse may be identified after years of being undetected. Required data not captured  The required data is not captured by the source systems which may be very important for the data warehouse purpose. Disadvantages of DW

15 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 15 Data homogenization  Dealing with similarity of data formats between different data sources results in to lose of some important value of the data. Data ownership  Sensitive data that owned by one department has to be loaded in data warehouse for decision making purpose but it may hesitate to share it with others. Disadvantages of DW (cont.)

16 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 16 High maintenance  High maintenance costs when business processes or source systems change Long-duration projects  The building of DW can take several years Complexity of integration  A significant amount of time needed to determining how well the various different data warehousing tools can be integrated into the overall solution that is needed Disadvantages of DW (cont.)

17 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 17 DW Schema => OLAP Schema  Snowflake schema  Fact constellation schema  Star schema DW Schema

18 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 18 Snowflake Schema A fact table with normalized dimensions

19 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 19 Fact Tables  contain related measures  Usually the largest tables  Usually appended to  Can contain detail or summary data  Measures are usually additive Dimension Tables  Contain descriptors  Utilize business terminology  Textual and discrete data  Attributes through which the table measures are analyzed Fact tables vs. Dimension tables

20 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 20 Fact Constellation Schema Multiple fact tables sharing many dimensions

21 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 21 Star Schema A data-modeling technique used to map multidimensional support data into a relational database Has only one fact table All dimensions de- normalized!!

22 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 22 Normalized tables  No duplicate (redundant) data  Good for Insert, Update, Delete (No inconsistencies)  Slow for Select (requires complex join) De-normalized tables  Faster access (less complex join)  But has duplicate data Normalized vs. De-normalized tables

23 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 23 Querying in Star Schema Querying data from fact table in just one level join!!! SELECT product_name, sum(sales_price) FROM sales_fact f, date_dm d, product_dim, p, store_dim s, region_dim r WHERE f.date_id=d.date_id AND f.store_id= s.store_id AND f.region_id = r.region_id AND f.product_id = p.product_id AND store_city = “Auckland” AND region_name = “North Island” GROUP BY product_name Question: Why does OLTP schema usually 3’rd normalized? Why does Star Schema try to do the reverse?

24 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 24 Time dimension is always there in OLAP !  Columns can be very specific  Every OLAP schema has time dimension table, but structure varies  Most of queries are related to “time”  Recall that time-variant is one of the four characteristics of DW  Normally time is used by: “ where” and “ group by” clauses  E.g., group by week or where day=“Monday” Time dimension table

25 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 25 What will we do in DW Assignment 1: 1.Analysis of data sources (OLTP databases) 2.Design OLAP schema 3.Write SQL to do ETL

26 ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 26 Question?


Download ppt "Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing."

Similar presentations


Ads by Google