The Data Warehouse Environment
Agenda The Structure of the Data Warehouse Subject Orientation Day 1 – day n Phenomenon Granularity Partitioning as a Design Approach Structuring data in the Data Warehouse Data Warehouse: The Standard Manual Auditing and the Data Warehouse Cost Justification
The Structure of the Data Warehouse Older level of detail Current level of detail A level of lightly summarized data A level of highly summarized data
Subject Orientation The data warehouse is oriented to the major subject areas of the corporation that have been defined in the high-level corporate data model. Typical subject areas include the following: –Customer –Product –Transaction or activity –Policy –Claim –Account
Day 1 – Day n Phenomenon
Granularity The single most important aspect of design of a data warehouse is the issue of granularity Indeed, the issue of granularity permeates the entire architecture that surrounds the data warehouse environment. Granularity refers to the level of detail or summarization of the units of data in the data warehouse. The more detail there is, the lower the level of granularity The less detail there is, the higher the level of granularity
Partitioning as a Design Approach A second major design issue of data in the warehouse (after that of granularity) is that of partitioning Partitioning of data refers to the breakup of data into separate physical units that can be handled independently. Proper partitioning can benefit the data warehouse in several ways: –Loading data –Accessing data –Archiving data –Deleting data –Monitoring data –Storing data Partitioning data properly allows data to grow and to be managed. Not partitioning data properly does not allow data to be managed or to grow gracefully
Partitioning of Data The purpose of partitioning of current detail data is to break data up into small, manageable physical units. Below is some of the tasks that cannot easily be performed when data resides in large physical units: –Restructuring –Indexing –Sequential Scanning, if needed –Reorganization –Recovery –Monitoring
Partitioning of data (cont’d) Data can be divided by many criteria, such as: –By date –By line of business –By geography –By organizational unit –By all of the above The choice of partitioning data are strictly up to the developer. As an example of how a life insurance company may choose to partition its data, consider the following physical units of data: 2000 health claims, 2001 health claims, 2002 health claims 1999 life claims, 2000 life claims, 2001 life claims, 2002 life claims 2000 casuality claims, 2001 casuality claims, 2002 casuality claims The insurance company has used the criteria of date, that is, year – and type of claim to partition the data
Partitioning of data (cont’d) Partitioning can be done in many ways: –Partition at the system level –Partition at the application level As a rule, it makes sense to partition data warehouse data at the application level
Structuring data in the Data Warehouse There are many more ways to structure data within the data warehouse. The most common are these: –Simple cumulative –Rolling summary –Simple direct –Continuous
Structuring data in the Data Warehouse (cont’d)