Download presentation
Presentation is loading. Please wait.
1
Data Warehouse
2
Introduction DW stores large volume of data which was used by DSS.
DW is maintained separately from organization’s operational database. Transaction Database OLTP System Write optimized Recent data System meant to support for decision is called as OLAP System. Read optimized -Historic data
3
DW are relatively static with only infrequent updates.
DW is stand-alone repository of i/f, integrated from several, possibly heterogeneous operational database. It is the enabling technique which facilitates improved business decision-making. CUSTOMER DB SALES DB DW MARKETING FINANCIAL ANALYSIS
4
DEFINITON A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Subject-oriented Data that gives information about a particular subject instead of about a company's ongoing operations. customer sales production
5
Integrated Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Operational database Datawarehouse Saving account Account Loan account Current account
6
Designated Time Frame (3 - 10 Years Key Includes Date
Time-variant All data in the data warehouse is identified with a particular time period. Operational system Datawarehouse Designated Time Frame ( Years Key Includes Date View of The Business Today Key Need Not Have Date
7
Non-volatile Data in the data warehouse are never over-written or deleted — once committed, the data are static, read-only, and retained for future reporting. “CRUD” Actions Operational System Read Insert Update Replace Create Delete No Data Update Data Warehouse Load Read
8
Data Warehouse Concepts
Data Warehouse Environment Architecture Contains Integrated Data From Multiple Legacy Applications A/P Update Data Mart Integration Criteria Insert O/P Load Read Data Mart ODS Pay Replace Delete All Or Part Of System of Record Data Mktg HR Data Warehouse Load Criteria Data Mart Loads A/R D/W Load D/W Best System of Record Data Read
9
NEED FOR DW Difficulty in obtaining Data Integration
Data,Information Information structure is not able to provides full and dynamic analysis of information available. Inconsistent results obtained from queries & reports arising from heterogeneous data store. Increased difficulty in delivering consistent, comprehensive information in a timely fashion. DW holds historical data in transaction system for long period of time could also interfere with their performance. Batter performance of query response time in DW.
10
Data Warehouse Architecture
11
Single-Layer Architecture
There is no physical data warehouse or data mart between the operation data and the analytic tools. The middleware in this type of system should be considered a virtual data warehouse, which consists of a software layer and not a data based layer. The single-layer model is light weight as it minimizes redundancies and thereby the amount of data stored. The analysis are based directly on the operational data.
12
Single-Layer Architecture
DB DB Operational system Analytical Tools DB
13
Two-layer architecture
The two-layer model consists of operational (and external) data in the source layer and a data warehouse layer on top of these. Between the source layer and the data warehouse layer is an ETL system. The analytical part of this architecture bases its analysis on the loaded data in the data warehouse or possibly data marts. The data warehouse layer furthermore adds the possibility to structure data in a way that fits with the multidimensional model of analytical tools, which in turn make them faster. Such an architecture is, however, more resource consuming to build and maintain.
14
Two-Layer Architecture
Data Warehouse Operational data ETL Operational data Source Layer Analytical Tools External data External data
15
Three-layer architecture
The three-layer architecture consists of the source layer (containing multiple source systems), the reconciled layer and the data warehouse layer (containing both data warehouses and data marts). The reconciled layer sits between the source data and data warehouse. It is populated with data from the source systems through an ETL process and the data stored in it is published further through another ETL process. In the reconciled layer the data has been cleaned up once and integrated to a common standardized form from multiple different source systems. The ETL process that feeds the data warehouse then only gets already integrated data that has less need for transformation. This architecture is especially useful for the very large, enterprise-wide systems.
16
Three-layer architecture
17
Operational Data Warehouse
An ODS is an integrated, subject- oriented, volatile (including update), current-valued, enterprise-wide, detailed DB structure designed to serve operational users as they do high performance integrated processing. It serves as staging area for loading data into Enterprise DW.
18
Design of a Data Warehouse
To design an effective data warehouse one needs to understand and analyze business needs, and construct a business analysis framework. Four different views regarding the design of a data warehouse must be considered: top-down view, the data source view, the data warehouse view & the business query view.
20
Bottom Tier: The bottom tier is a ware- house database server which is almost always a relational database system. Back-end tools and utilities are used to feed data into the bottom tier from operational database or external sources. These tools and utilities perform data extraction, cleaning and transformation
21
Middle Tier: It is an OLAP server which is typically implemented using either 1. A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS that maps operations on multidimensional data to standard relational operations; 2. A Multidimensional OLAP (MOLAP) model, i.e., a special purpose server that directly implements multidimensional data and operations. Top tier: The top tier is a client, which contains query and reporting tools, analysis tools, and/or data mining tools (e.g., trend analysis, prediction, and so on).
22
Data Warehouse Models Enterprise warehouse: Data mart:
An enterprise warehouse collects all of the information about subjects spanning the entire organization. It provides corporate-wide data integration, usually from one or more operational systems or external information providers, and is cross-functional in scope Data mart: A data mart contains a subset of corporate-wide data that is of value to a specific group of users. The scope is confined to specific, selected subjects. For example, a marketing data mart may confine its subjects to customer, item, and sales.
23
Depending on the source of data, data marts can be categorized into the following two classes:
Independent data marts are sourced from data captured from one or more operational systems or external information providers, or from data generated locally within a particular department or geographic area. Dependent data marts are sourced directly from enterprise data warehouses. Virtual warehouse: A virtual warehouse is a set of views over operational databases. For efficient query processing, only some of the possible summary views may be materialized.
24
OLAP server architectures
Relational OLAP (ROLAP) servers: These are the intermediate servers that stand in between a relational back-end server and client front-end tools. They use a relational or extended-relational DBMS to store and manage warehouse data, and OLAP middleware to support missing pieces. It has greater scalability. Multidimensional OLAP (MOLAP) servers: These servers support multidimensional views of data through array-based multidimensional storage engines. They map multidimensional views directly to data cube array structures. For example, Essbase of Arbor is a MOLAP server. The advantage of using a data cube is that it allows fast indexing to pre computed summarized data.
25
Hybrid OLAP (HOLAP) servers:
The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. For example, a HOLAP server may allow large volumes of detail data to be stored in a relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL Server 7.0 OLAP Services supports a hybrid OLAP server.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.