Download presentation
Presentation is loading. Please wait.
Published byPriscilla Marlene Perry Modified over 9 years ago
1
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing (DW) Week 2 Data Warehouse Architecture
2
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 2 DW System Architecture DW Characteristics DW Schema Outline
3
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 DW Architecture 1 2 3 The design steps: (1) analyse requirements for DW (2) analyse source data (3) design ETL
4
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 4 Summary data are very valuable in data warehouses because they pre-compute long operations in advance. A summary in Oracle is called a materialized view DW Architecture : Basic Architecture
5
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 5 Staging area simplifies building summaries and general warehouse management. Integrate, clean, and process operational data before putting it into the warehouse DW Architecture : +Staging area
6
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 6 Data marts are systems designed for a particular line of business. customize your warehouse's architecture for different groups within your organization. For example, a financial analyst might want to analyse historical data for purchases and sales. DW Architecture : Staging + Data Marts
7
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 7 Data marts are systems designed for a particular line of business. customize your warehouse's architecture for different groups within your organization. For example, a financial analyst might want to analyse historical data for purchases and sales. DW Architecture : Data Marts + Cubes
8
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 8 DW is an integrated, subject-oriented, time-variant, non-volatile collection of data that provides support for decision marking DW Data Characteristics
9
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 9 Integrated DW is centralized, consolidated database DW integrates data derived from the entire organization from multiple sources with diverse data formats, for example Customer’s country can be “ New Zealand ”, “ Australia ” in one database while they are “NZ”, “AU” in another database Status of an “ Order ” for one department can be “ open ”, “ received ”, “ cancelled ” while for another department can be “ 1 ”, “ 2 ”, “ 3 ” Some store data in relational databases, some store data in file systems DW Data Characteristics
10
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 10 Subject-oriented DW data are organized and summarized by topics Sales by Product, Sales by Customer, Sales by Region, Sales by Team DW data are arranged and optimized to provide answers to questions (queries) from diverse functional areas How much profit sales by product and region ? Which top 5 products that were best sold in the US ? List the name of customers who are living in Australia and have bought the same product every month? DW Data Characteristics
11
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 11 DW Data Characteristics Data mart is a small, single-subject DW that provides decision support to a small group of people
12
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 12 Time-variant DW Data represent the flow of data through time DW Data are recorded with a historical perspective How much profit sales by product and region in the last 5 years ? Which top 5 products that were best sold in the US during last Christmas ? List customers who are living in Australia and have monthly spent more than $1,000 per month in the last two quarters DW Data Characteristics
13
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 13 Non-volatile DW data are never updated or removed!! (read-only) New data are also continually added Requires a “ really huge space” to store all the data Requires “ high performance databases ” for faster, reliable access Requires “ scalable systems ” (as the data grow very quickly) Think about DW data in Ebay 90PB of data (90,000 TB) just for customer transactions 100TB per day new data (as of May 2013) DW Data Characteristics
14
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 14 Underestimation of resources of data loading Underestimate the time required to extract, clean, and load the data into the warehouse. Hidden problems with source systems Hidden problems associated with the source systems feeding the data warehouse may be identified after years of being undetected. Required data not captured The required data is not captured by the source systems which may be very important for the data warehouse purpose. Disadvantages of DW
15
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 15 Data homogenization Dealing with similarity of data formats between different data sources results in to lose of some important value of the data. Data ownership Sensitive data that owned by one department has to be loaded in data warehouse for decision making purpose but it may hesitate to share it with others. Disadvantages of DW (cont.)
16
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 16 High maintenance High maintenance costs when business processes or source systems change Long-duration projects The building of DW can take several years Complexity of integration A significant amount of time needed to determining how well the various different data warehousing tools can be integrated into the overall solution that is needed Disadvantages of DW (cont.)
17
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 17 DW Schema => OLAP Schema Snowflake schema Fact constellation schema Star schema DW Schema
18
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 18 Snowflake Schema A fact table with normalized dimensions
19
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 19 Fact Tables contain related measures Usually the largest tables Usually appended to Can contain detail or summary data Measures are usually additive Dimension Tables Contain descriptors Utilize business terminology Textual and discrete data Attributes through which the table measures are analyzed Fact tables vs. Dimension tables
20
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 20 Fact Constellation Schema Multiple fact tables sharing many dimensions
21
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 21 Star Schema A data-modeling technique used to map multidimensional support data into a relational database Has only one fact table All dimensions de- normalized!!
22
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 22 Normalized tables No duplicate (redundant) data Good for Insert, Update, Delete (No inconsistencies) Slow for Select (requires complex join) De-normalized tables Faster access (less complex join) But has duplicate data Normalized vs. De-normalized tables
23
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 23 Querying in Star Schema Querying data from fact table in just one level join!!! SELECT product_name, sum(sales_price) FROM sales_fact f, date_dm d, product_dim, p, store_dim s, region_dim r WHERE f.date_id=d.date_id AND f.store_id= s.store_id AND f.region_id = r.region_id AND f.product_id = p.product_id AND store_city = “Auckland” AND region_name = “North Island” GROUP BY product_name Question: Why does OLTP schema usually 3’rd normalized? Why does Star Schema try to do the reverse?
24
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 24 Time dimension is always there in OLAP ! Columns can be very specific Every OLAP schema has time dimension table, but structure varies Most of queries are related to “time” Recall that time-variant is one of the four characteristics of DW Normally time is used by: “ where” and “ group by” clauses E.g., group by week or where day=“Monday” Time dimension table
25
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 25 What will we do in DW Assignment 1: 1.Analysis of data sources (OLTP databases) 2.Design OLAP schema 3.Write SQL to do ETL
26
ISCG6425 Data Warehousing (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 26 Question?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.