Download presentation
Presentation is loading. Please wait.
Published byOwen Whitehead Modified over 9 years ago
1
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel
2
13 The Need for Data Analysis 4Constant pressure from external and internal forces requires prompt tactical and strategic decisions. 4The decision-making cycle time is reduced, while problems are increasingly complex with a growing number of internal and external variables. 4Managers need support systems for facilitating quick decision making in a complex environment. 4Decision support systems (DSS).
3
13
4
The Data Warehouse 4The Data Warehouse is an integrated, subject-oriented, time-variant, non-volatile database that provides support for decision making. u Integrated l The Data Warehouse is a centralized, consolidated database that integrates data retrieved from the entire organization. u Subject-Oriented l The Data Warehouse data is arranged and optimized to provide answers to questions coming from diverse functional areas within a company.
5
13 The Data Warehouse u Time Variant l The Warehouse data represent the flow of data through time. It can even contain projected data. u Non-Volatile l Once data enter the Data Warehouse, they are never removed. l The Data Warehouse is always growing.
6
13 Table 13.6A Comparison Of Data Warehouse And Operational Database Characteristics
7
13 Creating A Data Warehouse Figure 13.3
8
13 The Data Warehouse 4Data Mart u A data mart is a small, single-subject data warehouse subset that provides decision support to a small group of people. u Data Marts can serve as a test vehicle for companies exploring the potential benefits of Data Warehouses. u Data Marts address local or departmental problems, while a Data Warehouse involves a company-wide effort to support decision making at all levels in the organization.
9
13 DSS Architectural Styles Table 13.7
10
13 The Data Warehouse 4Twelve Rules That Define a Data Warehouse 1.The Data Warehouse and operational environments are separated. 2.The Data Warehouse data are integrated. 3.The Data Warehouse contains historical data over a long time horizon. 4.The Data Warehouse data are snapshot data captured at a given point in time. 5.The Data Warehouse data are subject-oriented. 6.The Data Warehouse data are mainly read-only with periodic batch updates from operational data. No online updates are allowed. 7.The Data Warehouse development life cycle differs from classical systems development. The Data Warehouse development is data driven; the classical approach is process driven.
11
13 8.The Data Warehouse contains data with several levels of detail; current detail data, old detail data, lightly summarized, and highly summarized data. 9.The Data Warehouse environment is characterized by read-only transactions to very large data sets. The operational environment is characterized by numerous update transactions to a few data entities at the time. 10.The Data Warehouse environment has a system that traces data resources, transformation, and storage. 11.The Data Warehouse’s metadata are a critical component of this environment. The metadata identify and define all data elements. The metadata provide the source, transformation, integration, storage, usage, relationships, and history of each data element. 12.The Data Warehouse contains a charge-back mechanism for resource usage that enforces optimal use of the data by end users. The Data Warehouse
12
13 On-Line Analytical Processing 4On-Line Analytical Processing (OLAP) is an advanced data analysis environment that supports decision making, business modeling, and operations research activities. 4Four Main Characteristics of OLAP u Use multidimensional data analysis techniques u Provide advanced database support u Provide easy-to-use end user interfaces u Support client/server architecture
13
13 On-Line Analytical Processing 4Multidimensional Data Analysis Techniques u The processing of data in which data are viewed as part of a multidimensional structure. u Multidimensional view allows end users to consolidate or aggregate data at different levels. u Multidimensional view allows a business analyst to easily switch business perspectives. u Figure 13.4
14
13 Figure 13.4 Operational Vs. Multidimensional View Of Sales
15
13 On-Line Analytical Processing 4Additional Functions of Multidimensional Data Analysis Techniques u Advanced data presentation functions u Advanced data aggregation, consolidation, and classification functions u Advanced computational functions u Advanced data modeling functions
16
13 Figure 13.5 Integration Of OLAP With A Spreadsheet Program
17
13 On-Line Analytical Processing 4Advanced Database Support u Access to many different kinds of DBMSs, flat files, and internal and external data sources. u Access to aggregated Data Warehouse data as well as to the detail data found in operational databases. u Advanced data navigation features such as drill-down and roll-up. u Rapid and consistent query response times. u The ability to map end user requests, expressed in either business or model terms, to the appropriate data source and then to the proper data access language. u Support for very large databases.
18
13 On-Line Analytical Processing 4Easy-to-Use End User Interface u Easy-to-use graphical user interfaces make sophisticated data extraction and analysis tools easily accepted and readily used. 4Client/Server Architecture u The client/server environment enables us to divide an OLAP system into several components that define its architecture.
19
13 On-Line Analytical Processing 4OLAP Architecture u Three Main Modules l OLAP Graphical User Interface (GUI) l OLAP Analytical Processing Logic l OLAP Data Processing Logic u OLAP systems are designed to use both operational and Data Warehouse data.
20
13 Figure 13.7 OLAP Server Arrangement
21
13 Figure 13.8 OLAP Server With Multidimensional Data Store Arrangement
22
13 Figure 13.9 OLAP Server With Local Mini Data-Marts
23
13 On-Line Analytical Processing 4Relational OLAP u Relational On-Line Analytical Processing (ROLAP) provides OLAP functionality by using relational database and familiar relational query tools. u Extensions to RDBMS l Multidimensional data schema support within the RDBMS l Data access language and query performance optimized for multidimensional data l Support for very large databases
24
13 On-Line Analytical Processing u Multidimensional Data Schema Support within the RDBMS l Normalization of tables in relational technology is seen as a stumbling block to its use in OLAP systems. l DSS data tend to be non-normalized, duplicated, and pre- aggregated. l ROLAP uses a special design technique to enable RDBMS technology to support multidimensional data representations, known as star schema. l Star schema creates the near equivalent of a multidimensional database schema from the existing relational database.
25
13 On-Line Analytical Processing u Data Access Language and Query Performance Optimized for Multidimensional Data l Most decision support data requests require the use of multiple-pass SQL queries or multiple nested SQL statements. l ROLAP extends SQL so that it can differentiate between access requirements for data warehouse data and operational data. u Support for Very Large Databases l Decision support data are normally loaded in bulk (batch) mode from the operational data. l RDBMS must have the proper tools to import, integrate, and populate the data warehouse with operational data. l The speed of the data-loading operations is important.
26
13 Figure 13.10 A Typical ROLAP Client/Server Architecture
27
13 On-Line Analytical Processing 4Multidimensional OLAP (MOLAP) u MOLAP extends OLAP functionality to multidimensional databases (MDBMS). u MDBMS end users visualize the stored data as a multidimensional cube known as a data cube. u Data cubes are created by extracting data from the operational databases or from the data warehouse. u Data cubes are static and require front-end design work. u To speed data access, data cubes are normally held in memory, called cube cache. u MOLAP is generally faster than their ROLAP counterparts. It is also more resource-intensive. u MDBMS is best suited for small and medium data sets. u Multidimensional data analysis is also affected by how the database system handles sparsity.
28
13 MOLAP Client/Server Architecture Figure 13.11
29
13 Relational Vs. Multidimensional OLAP Table 13.8
30
13 Star Schema 4The star schema is a data-modeling technique used to map multidimensional decision support into a relational database. 4Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structure of the operational database. 4Four Components: u Facts u Dimensions u Attributes u Attribute hierarchies
31
13 A Simple Star Schema Figure 13.12
32
13 Star Schema 4Facts u Facts are numeric measurements (values) that represent a specific business aspect or activity. u The fact table contains facts that are linked through their dimensions. u Facts can be computed or derived at run-time (metrics). 4Dimensions u Dimensions are qualifying characteristics that provide additional perspectives to a given fact. u Dimensions are stored in dimension tables.
33
13 Star Schema 4Attributes u Each dimension table contains attributes. Attributes are often used to search, filter, or classify facts. u Dimensions provide descriptive characteristics about the facts through their attributes. Table 13.9 Possible Attributes For Sales Dimensions
34
13 Three Dimensional View Of Sales Figure 13.13
35
13 Slice And Dice View Of Sales Figure 13.14
36
13 Star Schema 4Attribute Hierarchies u Attributes within dimensions can be ordered in a well-defined attribute hierarchy. u The attribute hierarchy provides a top-down data organization that is used for two main purposes: l Aggregation l Drill-down/roll-up data analysis
37
13 A Location Attribute Hierarchy Figure 13.15
38
13 Attribute Hierarchies In Multidimensional Analysis Figure 13.16
39
13 Star Schema 4Star Schema Representation u Facts and dimensions are normally represented by physical tables in the data warehouse database. u The fact table is related to each dimension table in a many-to-one (M:1) relationship. u Fact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.
40
13 Figure 13.17 Star Schema For Sales
41
13 Figure 13.18 Orders Star Schema
42
13 Star Schema 4Performance-Improving Techniques u Normalization of dimensional tables u Multiple fact tables representing different aggregation levels u Denormalization of fact tables u Table partitioning and replication
43
13 Figure 13.19 Normalized Dimension Tables
44
13 Figure 13.20 Multiple Fact Tables
45
13 Data Warehouse Implementation 4The Data Warehouse as an Active Decision Support Network u A Data Warehouse is a dynamic support framework. u Implementation of a Data Warehouse is part of a complete database-system-development infrastructure for company-wide decision support. u Its design and implementation must be examined in the light of the entire infrastructure.
46
13 Data Warehouse Implementation 4A Company-Wide Effort that Requires User Involvement and Commitment at All Levels u For a successful design and implementation, the designer must: l Involve end users in the process. l Secure end users’ commitment from the beginning. l Create continuous end user feedback. l Manage end user expectations. l Establish procedures for conflict resolution.
47
13 Data Warehouse Implementation 4Satisfy the Trilogy: Data, Analysis, and Users u For a successful design and implementation, the designer must satisfy: l Data integration and loading criteria. l Data analysis capabilities with acceptable query performance. l End user data analysis needs.
48
13 Data Warehouse Implementation 4Apply Database Design Procedures u Data Warehouse development is a company-wide effort and requires many resources: people, financial, and technical. l The sheer and often mind-boggling quantity of decision support data is likely to require the latest hardware and software. l It is also imperative to have very detailed procedures to orchestrate the flow of data from the operational databases to the Dare Warehouse. l To implement and support the Data Warehouse architecture, we also need people with advanced database design, software integration, and management skills.
49
13 Data Warehouse Implementation Road Map Figure 13.21
50
13 A Sample Of Current Data Warehousing And Data Mining Vendors Table 13.10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.