Data Warehousing Data warehousing provides architectures & tools for business executives to systematically organize, understand & use their data to make strategic decisions. A data warehouse is a collection of integrated databases designed to support a DSS.
Key Features: Subject Oriented: Data warehouse are designed to help you analyze data. For e.g. To learn more about your company’s sales data, you can build a warehouse that concentrate on sales. Integrated: Integration is closely related to subject orientation. A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as database, flat files &online transaction action records. They must resolve problems like naming conflicts & inconsistencies.
Time Variant: Data are stored to provide information from historical perspective(e.g. The past 5-10 years). It varies from time to time. Non Volatile: This means, once entered into warehouse, data should not change. Summarized : Operational data are mapped into a decision-usable format. Metadata: Data about data are stored. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use.
Need for Data Warehousing Industry has huge amount of operational data. Knowledge worker wants to turn this data into useful information. This information is used by them to support strategic decision making.
Comparison between OLTP & OLAP systems. S. No. OLTP OLAP 1. Characteristics Operational Processing Informational Processing 2. Orientation Transaction Analysis 3. User Clerk , DBA, Database proff. Knowledge worker(e.g. Manager, etc) 4. Function Day to Day operations. Decision support 5. Data Current, up-to-date Historical, maintained over time. 6. DB Size 100 MB to GB 100 GB to TB
6. Priority High performance, High availability High flexibilty 7. Unit of Work Short, Simple transaction Complex query 8. Access Read/Write Mostly Read 9. Focus Data in Info out 10. Example Purchasing, banking,etc Budgeting, sales forcasting,etc
ADVANTAGES & DISADVANTAGES OF DATA WAREHOUSES: High investment: Implementation of DW by an organisation requires a huge investment from Rs.10 lac to Rs.50 lac. Increased productivity of corporate decision-makers: DW improves the productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data. By transforming data into meaningful information a DW allows business managers to perform more accurate & consistent analysis. Most Cost-effective decision making: DW helps to reduce overall cost of the product by reducing the number of channels. DW provides retrieval of data without slowing down operational system. DW facilitates DSS applications such as trend reports,exception reports & reports that show actual performance versus goals.
DISADVANTAGES: Underestimation of resources of data loading: Sometimes we underestimate the time reqd. To extract, clean & load the data into DW. It may take significant proportion of the total development time, although some tools are there which are used to reduce the time & effort spent on this process. Increased end user demands: After satisfying the demands, the user request’s increases. This is b’coz of Increasing awareness of the users on capability & value of DW. Data Homogenization: The concept of DW deals with similarity of data formats b/w different data sources. Thus, results in to lose of some imp. Value of the data. High Maintenance: DW is usually not static and hav high cost. : DW are high maintenance systems. Any reorganisation of the business processes & the source systems may affect the DW & result in high maintenance cost.
DW Architecture: The three fundamental components that are supported by DW are: Load Manager. Warehouse Manager. Data Access Manager. LOAD MANAGER: The components of DW is responsible for collection of data from operational systems & convert them into usable form for the user. This component is responsible for importing & exporting data from operational systems. It performs the following task: Identification of Data. Validation of Data about accuracy. Extraction of Data from original source Cleansing of data by eliminating meaningless values & making iit usable. Data formatting. Data standardisation by getting them into consistent from.
Warehouse Manager: The warehouse manager is the centre of DW system & is the DW itself. It is a large, physical database that holds a vast amount of information from a wide variety of sources. The data within DW is organised such that it becomes easy to find, use & update frequently from its sources. Query Manager: Query Manager component provides the end-users with access to the stored warehouse info. Through the use of specialised end-user tools. Tools like Query & reporting, OLAP, graphical & geographical info. Systems.
3-TIER DATAWAREHOUSE ARCHITECTURE:- Data warehouse adopt a three tier architecture, these are:- 1.Bottom Tier(Data warehouse server) 2.Middle Tier(OLAP server) 3. Top Tier(Front end tools).
BOTTOM TIER: (how data is extracted from sources) It is a warehouse database server Data is fed using Back end tools and utilities. Data extracted using programs called gateways(ODBC,JDBC) It also contains Meta data repository. MIDDLE TIER: The middle tier is an OLAP server that is typically implemented using either A relational OLAP (ROLAP) model, that is, an extended relational DBMS that maps operations on multidimensional data to standard relational operations; A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly implements multidimensional data and operations.
TOP TIER The top tier is a front-end client layer, which contains query and reporting tools, analysis tools, and/or data mining tools.
3-Tier Architecture Diagram
DATAWAREHOUSE BACK-ENDTOOLS AND UTILITIES Data warehouse systems use back-end tools and utilities to populate and refresh their data . These tools and utilities include the following functions: Data extraction which typically gathers data from multiple, heterogeneous, and external sources. Data cleaning which detects errors in the data and rectifies them when possible. Data transformation which converts data from legacy or host format to warehouse format Load which sorts, summarizes, consolidates, computes views, checks integrity, and builds indices and partitions Refresh which propagates the updates from the data sources to the warehouse
Online Analysis Processing(OLAP): It enables analysts, managers and executives to gain insight information data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. Product Data Warehouse Region Time
Multidimensional Data Sales volume as a function of product, month, and region Dimensions: Product, Location, Time Hierarchical summarization paths Region Industry Region Year Category Country Quarter Product City Month Week Office Day Product Month
OLAP Operations Drill down: It navigates from less detailed data to more detail data. It is reverse of roll up. This adds more detail to given data. Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster Region Time
OLAP Operations: Roll Up: Performs Aggregation on data cube,either by climbing up or by dimension reduction. Product Sub Category e.g Kitchen Product e.g Toaster Region Time
OLAP Operations: Slice and Dice: The slice operation performs a selection on one dimension of a given cube, resulting in sub cube. The dice defines a sub cube by performing a selection on two or more dimension. Product Product=Toaster Region Region Time Time
OLAP Operations Pivot: is a visualization operation that rotates the data axes in view in order to provide an alternative presentation of data. Product Product Region Time Region Time
DATA MART DATA MART is generalised term used to describe DW environments that are somehow smaller than others. Data Mart term often used to describe small, single purpose mini DW. “A subset of DW that support the requirements of a particular department or business function” is known as DATA MART.” It is normally in the form of summary data relating to a particular department or business function. A DW may be constructed as a collection of a subset of DATA MART’s. Usually implemented on low cost departmental servers that are UNIX, Windows/NT based. Depending on sources of data, DM’s can be categorised as: Independent DM’s: sourced from data captured from one or more operational systems or external information providers. Dependent DM’s: sourced directly from enterprise Data warehouse.
Issues associated with development and management of data marts: Data Mart Functionality: successfully provide analysis using OLAP and other data mining tools. Data mart size: User expects faster response time from data marts than from data warehouses. Data mart load performances: A data mart has to balance 2 critical components ie. End user response time And data loading performance. Users access to data in multiple data marts. Data mart internet/intranet access: this technology offers users low cost access to data marts and DW’s using web browsers. Data mart installation: ‘Data mart in a box’ & many products are available that can provide a low cost data marts for an organisation.
Data Warehousing Tools: Data Warehouse SQL Server 2000 DTS Oracle 8i Warehouse Builder OLAP tools SQL Server Analysis Services Oracle Express Server Reporting tools MS Excel Pivot Chart VB Applications