Data Warehouse
Definition Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization.
Need for Data Warehousing Integrated, company-wide view of high-quality information. Separation of operational and informational systems and data. Table 14-1.
Examples of heterogeneous data
Factors Allowing Data Warehousing Relational DBMS. Advances in hardware: speed and storage capacity. End-user computing interfaces and tools.
Data Warehouse Architectures Two-level - Fig. 14-2. Three-level - Fig. 14-3. Operational data. Enterprise data warehouse (EDW)- single source of data for decision making. Data marts - limited scope; data selected from EDW.
Generic data warehouse architecture
Three-layer architecture
Reasons for the Three-Level Architecture EDW and data marts have different purposes and data architectures. Data transformation is complex and is best performed in two steps. Data marts customized decision support for different groups.
Three-Level Data Architecture Fig. 14-4. Operational data. Reconciled data. Derived data.
Three-layer data architecture
Data Characteristics Status vs. Event data. Fig. 14-5. Transient vs. Periodic data. Fig. 14-6,7.
Example of DBMS log entry
Transient operational data
Reconciled Data Characteristics Detailed Historical Normalized Enterprise-wide Quality controlled
The Data Reconciliation Process Fig. 14-8. Capture Static - initial load. Incremental - ongoing update. Scrub or data cleansing Pattern recognition and other artificial intelligence techniques.
Steps in data reconciliation
The Data Reconciliation Process Transform Convert the data format from the source to the target system. Record-Level Functions Selection. Joining. Aggregation (for data marts). Field-Level Functions Single-field transformation, Fig. 14-9. Multi-field transformation, Fig. 14-10.
The Data Reconciliation Process Load and Index Refresh Mode When the warehouse is first created. Static data capture. Update Mode Ongoing update of the warehouse. Incremental data capture.
Derived Data Characteristics Type of data Detailed, possibly periodic. Aggregated. Distributed to departmental servers. Implemented in star schema.
Star Schema Also called the dimensional model. Fact and dimension tables. Fig. 14-11,12, 13. Grain of a fact table - time period for each record. Multiple Fact Table - Fig. 14-14. Snowflake Schema - Fig. 14-15.
Components of a star schema
Star schema example
Star schema with sample data
Star schema with two fact tables
Example of snowflake sample
Types of Data Marts Dependent - Populated from the EDW. Independent - Data taken directly from the operational databases.
The User Interface The role of metadata. Traditional query and reporting tools. On-line analytical processing. The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.
The User Interface Fig. 14-16. Slicing a cube. Pivot Rotate the view for a particular data point to obtain another perspective. E.g. take a value from the units column and obtain by-store values. Drill-down - Fig. 14-17.
Slicing a data cube
The User Interface Data Mining Data Visualization Knowledge discovery. Search for patterns in the data. Table 14-3, 4. Data Visualization