Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify.

Similar presentations


Presentation on theme: "Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify."— Presentation transcript:

1 Data Warehousing

2 Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify and predict future trends The construction of data warehouse –Involve data cleaning and data integration –Provide on-line analytical processing (OLAP) tools for the interactive analysis of data W.H. Inmon –A data warehouse is a subject oriented, integrated, time-dependent and non-volatile collection of data in support of management’s decision making process

3 Characteristics of Data Warehouse Subject-oriented –Data warehouse is designed for decision support and around major subject, such as customer and sales –Not all information in the operational database is useful Integrated –Integrate multiple heterogeneous sources and make it consistent –The data from different sources may use different names for the same entities

4 Characteristics of Data Warehouse Time dependent –Record the information and the time when it was entered –Data mining can be done from the data in some period of time Non-volatile –Data in a data warehouse is never updated

5 Data Warehousing Data warehousing –The process of constructing and using data warehouse Two types of databases –Operational database Large database in operation Built for high speed and large number of users –Data warehouse Designed for decision support Contain vast amounts of historical data Data mart –A department subset of the data warehouse that focuses on selected subjects, and its scope is department-wide

6 OLTP & OLAP System OLTP (On-Line Transaction Processing) System –The major task of operational database is to perform on-line transaction and query processing OLAP (On-Line Analytical Processing) System –Data warehouse system serves users on data analysis and decision making

7 Differences ~ OLTP & OLAP Characteristic –OLTP: operational processing –OLAP: informational processing Orientation –OLTP: transaction-oriented –OLAP: analysis-oriented User –OLTP: customer, DBA –OLAP: manager, analyst Function –OLTP: day-to-day operations –OLAP: information requirement, decision support

8 Differences ~ OLTP & OLAP DB design –OLTP: ER based, application-oriented –OLAP: star/snowflake, subject-oriented Data –OLTP: current; guaranteed up-to date –OLAP: historical Unit of work –OLTP: short, simple query –OLAP: complex query Access –OLTP: read/write –OLAP: mostly read DB size –OLTP: 100 MB to GM –OLAP: 100 GB to TB

9 Differences ~ OLTP & OLAP

10 Data Warehousing Multidimensional Data Model Star Schema or Snowflake Schema Relational Data Model Relational Schema

11 Model & Schema for Relational Database Relational Schema Relational Data Model

12 Multidimensional Data Model Example: AllElectronics creates a sales data warehouse in order to keep records of the store’s sales –Fact Table sales amount in dollars and number of units sold (measure) –Dimension Tables time, item, branch, and location Multidimensional data model views data in the form of a data cube

13 Two Dimensions 2-D view of sales data for item sold per quarter in the city of Vancouver. The measure is dollars_sold (in thousands) Measures Dimensions

14 Three Dimensions 3-D view of sales data according to the dimensions time, item and location. The measure is dollars_sold (in thousands) Dimensions Measures

15 Three Dimensions 3-D data cube representation according to the dimensions time, item and location. The measure is dollars_sold (in thousands)

16 Four Dimensions 4-D data cube of sales data according to the dimensions supplier, time, item and location. The measure is dollars_sold (in thousands)

17 Schemas for Multidimensional Data Model Star Schema Snowflake Schema Fact Constellation Schema

18 Star Schema

19 Snowflake Schema

20 Some dimension tables are normalized to reduce redundancies and save storage space Reduce the effectiveness of browsing since more join will be needed to execute a query This saving of space is negligible in comparison to the magnitude of the fact table Snowflake schema is not as popular as the start schema in data warehouse design

21 Fact Constellation Schema Multiple fact tables share dimension tables

22 OLAP Technologies

23 Concept Hierarchies

24

25

26 Three-Tier DW Architecture

27 Case Study in Data Warehousing

28 公司簡介

29

30

31 背景資料 A 公司利用傳統的 E-R Model 來建立其關 聯式資料庫系統 A 公司發現此種資料庫系統無法即時地滿 足高階主管對有效資訊的取得與分析, 進而做出決策 – 傳統的 E-R Model 資料模型的設計對資料的 一致性 (Consistency) 及避免資料的重複 (Duplication) 上有最佳的效率 – 對於 Multi-constraint 及 Multi-join 的多維度查 詢除了會拉長查詢的時間外,還會搶奪系統 資源,造成系統負荷過重而產生瓶頸

32 背景資料 A 公司決定利用多維度資料模型 (Multidimensional Data Model) 所設計的 資料庫系統來解決上述的問題 – 建立資料倉儲 (Data Warehousing) – 一次滿足所有的限制,而不需大量的合併動 作,同時使用者介面也較為和善

33 建立多維度資料庫的步驟 了解作業流程與需求,以作為設計時的基礎知 識,此部份可藉由與客戶的訪談、閱讀交易系 統文件、分析現有作業流程而得知 界定 Fact Table 內要有哪些組成?此部份要注意 到是否能滿足第一步驟所定義的需求 找出用戶的思考觀點及每一個思考觀點的層級 關係,也就是 Dimension Table 定義 Fact Table 的 Measure ,這些 Measure 是各個 維度所可能會取用的值

34

35

36 因果關係圖

37

38 多維度資料庫的建立

39

40

41

42

43

44

45

46

47 其餘表格依此類推。 最後共產生共 20 個 Fact Tables 及數十個 Dimension Tables 。 這些表格為 OLAP 系統或資料探勘 (Data Mining) 系統的輸入 (Input) 。 利用這些系統我們才能得到更進一步的 統計及知識的輸出 (Output) 。 多維度資料庫的建立

48 Design of Data Warehouse How can I design a data warehouse ? –Top-down approach –Bottom-up approach –Combination of both In general, the warehouse design process consists of the following steps –Choose a business process to model –Choose the gain of the business process –Choose the dimensions –Choose the measures

49 其它應用實例

50

51


Download ppt "Data Warehousing. Make the right decisions for your organization –Rapid access to all kinds of information –Research and analyze the past data –Identify."

Similar presentations


Ads by Google