12/6/05 The Data Warehouse from William H. Inmon, Building the Data Warehouse (4 th ed)
12/6/05 Data Warehouse = architecture (not a technology) architecture (not a technology) example of Decision Support System
12/6/05 Data Placement DSS - Decision Support Systems (analytical function) OLTP – Online Transactional Processing (operational function) Archival data – cheaper/slower storage
12/6/05 OLTP DSS primitive data operational day-to-day clerical function non-redundant non-integrated run repetitively derived data analytical historical managerial function redundant data integrated run heuristically
12/6/05 A Definition: “A data warehouse is a subject- oriented, integrated, non-volatile, and time-variant collection of data in support of management’s decisions.” (a sophisticated series of snapshots…)
12/6/05 Design Decisions Granularity - level of detail or summarization of the units of data in the data warehouse (more detail = lower level of granularity) Partitioning – breakup of data into separate physical units that can be handled independently
12/6/05
Major Components Design of Data Warehouse itself Interface from operational systems -role of extract (ETL) software [Extract/Transform/Load] -element of time (compound keys) -data purging
12/6/05 Indirect Use of Data Warehouse Data An analysis program periodically spins off a file to the operational environment that includes specific summarized data Airline commission example Retail personalization example Credit scoring example
12/6/05 Data Warehouse Requirements Manage large amounts of data Manage data on diverse media Easily index and monitor Interface with varying technologies Store and access data in parallel Metadata control (by “user”) Contextual information (vs content) Efficiently use indexes Support compound keys