Download presentation
Presentation is loading. Please wait.
Published byClaire Miller Modified over 8 years ago
1
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management
2
The Dimensional Model Our goal: Develop the physical design.
3
1. Developing The High-Level Dimensional Model The high-level dimensional model is a data model at the entity level. We start straight from the bus matrix. The design process generally follows these four steps: The Four-step Modeling Process a.Refer to the business processes in the bus matrix. Look at the rows for each process and try to identify potential entities for the data warehouse tables. You are not interested in particular attributes of the entities at this point. You just try to identify entities.
4
The Data Warehouse Bus Matrix Value chain
5
1. Developing The High-Level Dimensional Model b.Declare the grain. The grain is the level of detail captured in the fact table. The answer might be one row per order item, one row per customer call, or one row per employee status change.
6
1. Developing The High-Level Dimensional Model c.Choose the dimensions. Most of them will come from your understanding of the business processes and the bus matrix. It helps to refer to your communication with the operational people and their requests to verify your choice of dimensions. It is not a bad idea to start listing attributes for each dimension at this point.
7
1. Developing The High-Level Dimensional Model d.Choose the measures in the fact table. There are usually a set of numbers in the operational system. For example, product quantity, product sold price, discounts, etc. The above numbers support the business process (the order process in this case). From these numbers we can derive a series of facts such as the sales amount, the sales number (how many sales), etc.
8
The High-Level Dimensional Model
9
2. Developing the Detailed Dimensional Model We continue to finalize the development of attributes for the fact and dimensional tables. We assign attributes according to the table (fact or dimension) they belong. We Keep a list of issues as they arise from the design process (with respect to ETL, purpose of attributes) One of the most important decisions: – The assignment of the type 1 and type 2 dimension attributes.
10
3. Building the Physical Model Of course the best tool to build the physical model is a data modeling tool: http://en.wikipedia.org/wiki/Comparison_of_data_mo deling_tools http://en.wikipedia.org/wiki/Comparison_of_data_mo deling_tools These tools can forward engineer, that is, create the DDL statements to actually create your database. For our lab, you will be working directly with the ERD from SQL Server 2012 or with a system that you are willing to download and test.
11
Considerations for building the Physical Model Surrogate keys. The primary key for dimension tables should be a surrogate key assigned and managed by the DW/BI system. Create surrogate keys in SQL Server by enabling the IDENTITY property on the key column. Use integer (number) values for the surrogate PKs.
12
Considerations for building the Physical Model String Columns. You need to define the text column lengths in the physical design. You expect to see most of them in the dimension tables as in the fact table we find mostly measures. Use Unicode data where possible so that you can capture data from multiple heterogeneous data sources.
13
Considerations for building the Physical Model Null Values We avoid null values in the DW/BI database as we do not want them in the TPS database. In the TPS database we avoid null values by using default values. In the DW database we setup the prevention of nulls in the ETL system!
14
Considerations for building the Physical Model Housekeeping columns Every dimension table that has type 2 attributes needs to have additional columns that track the dates for which the dimension row is valid. For example, the RowStartDate and RowEndDate columns indicate the date range for which the dimension row is valid. Another useful attribute to have is the RowChangeReason to capture the reasoning behind the change in the slow changing dimension (SCD) So, practically, we need three columns for the SCDs.
15
Constraints and Supporting Objects Entity and Referential Integrity Constraints. All tables should have a primary key, which is that column or set of columns that will identify a single row when constrained to a single value. This is known as entity integrity. For the dimension tables, the primary key is obviously the surrogate key. For the fact tables, the primary key is usually a combination of all of the foreign keys from each dimension. In practice, data warehouse DBAs often do not create referential integrity constraints. Maintaining these structures is extremely expensive and risky because they depend on the ETL system to do the integrity work, something that might not be accurate and feasible. If you feel it's important, test the options in your environment to understand the cost.
16
Constraints and Supporting Objects Indexing Strategies. Dimension Table Indexing. Dimension tables with a single column integer surrogate primary key should have a clustered primary key index. A clustered index is created automatically for the PK in SQL server. A clustered index determines the physical order of data in a table. There can only be one clustered index per table.
17
Constraints and Supporting Objects Views All business user access to the relational data warehouse should be done through views. The rationale is to provide a protective layer between the users and the underlying database. This layer will be very helpful when you need to modify the DW/BI system after it is in production. The table names shouldn't even show up in a user's list of database objects. You may want to omit some columns from the view, especially some of the housekeeping columns described previously.
18
4. The Metadata Plan Metadata: the Bermuda Triangle of data warehousing.
19
The Purpose of Metadata Technical metadata (The usual metadata everyone refers to) Defines the objects and processes that make up the warehouse itself from a technical perspective. This includes the system metadata that defines the data structures, like tables, columns, data types, dimensions, and measures. Business metadata It tells us what data we have, where it comes from, what it means, and what its relationship is to other data in the warehouse. Business metadata often serves as documentation for the data warehouse. Process metadata Describes the results of various operations in the warehouse. In the ETL process, each task logs key data about its execution, like start time, end time, rows processed, result, and so on. This data is initially valuable for troubleshooting the ETL or query process. After people begin using the system, this data is a critical input to the performance monitoring and improvement process.
20
The (non-existent) Metadata Repository There is a need to store all of this metadata. Ideally, each tool would keep its metadata in a shared repository where it can be easily reused by other tools and integrated for reporting and analysis purposes. For example, when you use your ETL tool to design a package to load your dimensions, the ETL tool would save that package in the repository in a set of structures that at least allow inquiry into the content and structure of the package. If you wanted to know what transforms were applied to the data in a given dimension table, you could query the repository. Unfortunately, this wonderful, integrated, shared repository is rare in the DW/BI world today, and when it does exist, it must be built and maintained with significant custom effort. Each component keeps its own metadata in its own structures and formats.
21
Creating the Metadata Strategy 1.Our primary goal is to concentrate on business metadata first. 2.Educate the DW/BI team and key business users about the importance of metadata and the metadata strategy. 3.Design and implement the delivery approach for getting business metadata out to the user community. 4.Typically, this involves creating metadata access tools, like reports and browsers.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.