Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.

Similar presentations


Presentation on theme: "Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management."— Presentation transcript:

1 Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management

2  We have transactional/operational system in place.  We realize the value of an additional business intelligence (BI) initiative.  We know that a data warehouse is not an archiving system.  We realize we are looking for specific information.  There is not a standard road to follow when building the DW. Assumptions

3 1.The primary goal of the data warehouse is to deliver business value. How do we connect value to the DW? 2.We must listen and understand the operational people. Communication is the top priority. 3.We first need to understand the business data in the operational/transactional system. We can achieve that by using SQL. 4.The next step is to define the business requirements. Business Value

4 The Business Requirements Definition step Kimball Lifecycle (1) (2) (3)

5 The operational Data –Data Profiling (SQL) Data Profiling-auditing from the operational system Conclusions from the operational data More than 80% of the sales volume is bikes. We have a very high percentage increase in clothing and accessories. We steadily go up in revenue year by year.

6 The we conduct a set of interviews to identify the operational processes. Then, we associate analytics requirements. Processes and Business Requirements LETTER BUSINESS PROCESSSUPPORTED BUSINESS ANALYSES AOrders Orders reporting and analysis, orders forecasting, advertising effectiveness, customer satisfaction, production forecasting, product profitability, customer profitability B Orders forecast Sales performance, business planning, production forecasting CCall tracking Call center performance, customer satisfaction, product quality, call center resource planning, customer profitability, product profitability DReturnsCustomer satisfaction, product quality, customer profitability, product profitability, net sales

7  Each row in the matrix is a business process.  The processes down the left side of the matrix follow the organization's value chain.  The columns in the bus matrix are descriptive objects that participate in the various business processes, such as product, and date.  We call these objects dimensions in the dimensional model. The bus matrix is essentially the enterprise dimensional data architecture.  Each dimension participates in one or more business processes. For each business process (row), you can see exactly which dimensions (columns) you need to implement. And for each dimension, you can see which business processes it must support. The Data Warehouse Bus Matrix Value chain

8  The prioritization process is a planning meeting involving the DW/BI team, the DW/BI project business sponsors, and other senior managers from across the organization.  There needs to be a conclusive consensus of what is needed by the organization.  Prioritizing requirements is one of the most important tasks you have. Prioritizing Business Requirements

9 Designing the Business Process Dimensional Model (1) (2) (3)

10 Our goal: The DW dimensional model

11  Our main objective is to make sure DW users get the data they need to meet ongoing business requirements.  The real goal is to create a usable, flexible, and extensible data model that can support the full range of analyses, both now and for the foreseeable future.  The dimensional model:  is the target for the ETL system  the actual structure of the database  the model behind the user query and reporting experience.  presents the needed information to users as simply as possible.  returns query results to the users as quickly as possible. Designing the Business Process Dimensional Model

12  The dimensional model has far fewer tables than the TPS model. Because of the denormalization involved in creating the dimensions. Queries against the SQL Server relational database generally perform better — often far better — against a dimensional structure than against a fully normalized structure.  The dimensional model helps query performance. Information is grouped into business categories we call dimensions that make logical sense to users.  In the OLAP environment, the engine is specifically designed to support dimensional models.  A dimensional model is made up of a central fact table (or tables) and its associated dimensions. The dimensional model is also called a star schema because it looks like a star with the fact table in the middle and the dimensions serving as the points on the star. The dimensional model

13  A record in a fact table is a series of measurements, or numeric values such as quantity ordered or sale amount. These numbers are called facts (or measures in Analysis Services).  The primary key to the fact table is usually a composite key made up of a subset of the foreign keys from each dimension table.  The level of detail contained in the fact table is called the grain. It is recommended to build fact tables with the lowest level of detail possible as the atomic level. Atomic fact tables provide complete flexibility to roll up the data to any level of summary needed across any dimension.  For most transaction-driven organizations, fact tables are the largest tables in the data warehouse database, often making up 95 percent or more of the total relational database size. The Fact table

14  Dimensions are implemented as tables in the dimensional model.  Each table contains a list of homogeneous entities — products in a manufacturing company, patients in a hospital, or vehicles on auto insurance policies.  You can spot dimensions in conversation with the operational people because they are often the "by" words in a report request. For example, a user wants to see sales by month or by product. Dimensions

15 1.Surrogate keys 2.Slow changing dimensions (SCDs) 3.Dates 4.Many to many relationships between facts and dimensions 5.Many to many relationships between dimension tables. Additional Design Concepts and Techniques

16 1.It is highly recommended to use surrogate keys for the dimension tables in the DW (common mistake in implementations). They protect the DW/BI system from changes in the keys coming from the source system. They allow the DW/BI system to integrate the same data, such as customer, from multiple source systems where they have different keys. The real cost of using surrogate keys is the burden it places on the ETL system. 2.Slowly Changing Dimension attribute values (SCDs) Most attribute values like date of birth are fixed. However, other attributes like an employee's title might change over time. Changing attributes allow us to understand the dynamics of the business and constitute one of the major reasons for the existence of the DW/BI system. Type 1 and type 2 SCDs. Type 1, we just replace values. Type 2, we keep historical records. It is a business decision to decide which SCD is type 1 or type 2. Additional Design Concepts and Techniques

17 3.Dates In the data warehouse include a date dimension table to perform analyses across periods though a table date does not exist in the transactional system. Additional Design Concepts and Techniques

18 The standard relationship between a dimension table and fact table is one-to-many. This means one row in the dimension table will join to many rows in the fact table. For example a product has many sales. We can have a many-to-many relationship between the fact table and a dimension. For example when multiple sales reps handle the same order. Additional Design Concepts and Techniques

19  In a banking example, a given account can have one or more customers as signatories, and any given customer can have one or more accounts.  The bank might choose to analyze data by customer or account. Many-to-many relationship between dimension tables.


Download ppt "Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management."

Similar presentations


Ads by Google