Download presentation
Presentation is loading. Please wait.
Published byFrank Little Modified over 9 years ago
1
1 Data Warehouses BUAD/American University Data Warehouses
2
2 BUAD/American University Definition Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization.
3
3 Data Warehouses BUAD/American University Need for Data Warehousing Integrated, company-wide view of high- quality information. Separation of operational and informational systems and data –operational system: a system that is used to run a business in real time, based on current data –informational system: systems designed to support decision making based on stable point- in-time or historical data
4
4 Data Warehouses BUAD/American University Factors Allowing Data Warehousing Relational DBMS. Advances in hardware: speed and storage capacity. End-user computing interfaces and tools.
5
5 Data Warehouses BUAD/American University Data Warehouse Architectures Two-level –source system files containing operational data –transformed and integrated data warehouse Three-level –Operational data. –Enterprise data warehouse (EDW)- single source of data for decision making. –Data marts - limited scope; data selected from EDW; customized decision-support for individual user groups
6
6 Data Warehouses BUAD/American University Generic data warehouse architecture
7
7 Data Warehouses BUAD/American University Three-layer architecture
8
8 Data Warehouses BUAD/American University Reasons for the Three-Level Architecture EDW and data marts have different purposes and data architectures. Data transformation is complex and is best performed in two steps. Data marts customized decision support for different groups Architecture –Operational data, reconciled data, Derived data.
9
9 Data Warehouses BUAD/American University Three-layer data architecture
10
10 Data Warehouses BUAD/American University Data Characteristics Status vs. Event data. –A transaction is a business activity that triggers one or more business events: event data captures them Transient vs. Periodic data. –Transient: data in which changes to existing records are written over previous records, thus destroying previous data content –periodic data: data that are never physically altered or deleted once added
11
11 Data Warehouses BUAD/American University Example of DBMS log entry
12
12 Data Warehouses BUAD/American University Transient operational data
13
13 Data Warehouses BUAD/American University Reconciled Data Characteristics Detailed Historical Normalized Enterprise-wide Quality controlled
14
14 Data Warehouses BUAD/American University The Data Reconciliation Process Capture: capture the relevant data from source files to fill EDW –Static - initial load. –Incremental - ongoing update. Scrub or data cleansing –missing data, name reconciliation –Pattern recognition and other artificial intelligence techniques.
15
15 Data Warehouses BUAD/American University Steps in data reconciliation
16
16 Data Warehouses BUAD/American University The Data Reconciliation Process Transform –Convert the data format from the source to the target system. –Record-Level Functions Selection. Joining. Aggregation (for data marts). –Field-Level Functions Single-field transformation Multi-field transformation
17
17 Data Warehouses BUAD/American University The Data Reconciliation Process Load and Index –Refresh Mode When the warehouse is first created. Static data capture. –Update Mode Ongoing update of the warehouse. Incremental data capture.
18
18 Data Warehouses BUAD/American University Derived Data Characteristics Type of data –Detailed, possibly periodic. –Aggregated. Distributed to departmental servers. Implemented in star schema.
19
19 Data Warehouses BUAD/American University Star Schema Also called the dimensional model. Fact and dimension tables. –Fact table: consists of factual or quantitative data about the business –Dimension table: hold descriptive data Grain of a fact table - time period for each record.
20
20 Data Warehouses BUAD/American University Components of a star schema
21
21 Data Warehouses BUAD/American University Star schema example
22
22 Data Warehouses BUAD/American University Star schema with sample data
23
23 Data Warehouses BUAD/American University Example of snowflake sample
24
24 Data Warehouses BUAD/American University Size of the fact table Total number of stores: 1,000 Total number of products: 10,000 Total number of periods: 24 Total rows: 1000 * 10,000 * 24 = 240,000,000 On average 50% items record sales, –no of rows = 120,000,000
25
25 Data Warehouses BUAD/American University Types of Data Marts Dependent - Populated from the EDW. Independent - Data taken directly from the operational databases.
26
26 Data Warehouses BUAD/American University The User Interface The role of metadata. Traditional query and reporting tools. On-line analytical processing (OLAP) The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.
27
27 Data Warehouses BUAD/American University The User Interface –Slicing a cube. –Pivot Rotate the view for a particular data point to obtain another perspective. E.g. take a value from the units column and obtain by-store values. –Drill-down
28
28 Data Warehouses BUAD/American University Slicing a data cube
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.