Download presentation
Presentation is loading. Please wait.
Published byFrank Wiggins Modified over 8 years ago
1
Introduction to Data Warehousing
2
amira.idrees@yahoo.com Subject: Data Warehousing
3
Why Data Warehouse? Necessity is the mother of invention
4
Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
5
Scenario 1 : ABC Pvt Ltd. Mumbai Delhi Chennai Banglore Sales Manager Sales per item type per branch for first quarter.
6
Solution 1:ABC Pvt Ltd. Extract sales information from each database. Store the information in a common repository at a single site.
7
Solution 1:ABC Pvt Ltd. Mumbai Delhi Chennai Banglore Data Warehouse Sales Manager Query & Analysis tools Report
8
Scenario 2 One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
9
Scenario 2 : One Stop Shopping Operational Database Data Entry Operator Management Wait Report
10
Solution 2 Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.
11
Solution 2 Operational database Data Warehouse Extract data Data Entry Operator Data Entry Operator Manager Report Transaction
12
Scenario 3 Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.
13
Solution 3 Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
14
Solution 3 Query and Analysis tool President Expansio n Improvemen t sales time Data Warehouse
15
What is Data Warehouse??
16
Inmons’s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process.
17
Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
18
Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation
19
Integration In terms of data. – encoding structures. – Measurement of attributes. – physical attribute. of data – naming conventions. – Data type format remarks
20
Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
21
Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access
22
Operational v/s Information System FeaturesOperationalInformation CharacteristicsOperational processingInformational processing OrientationTransactionAnalysis UserClerk,DBA,database professional Knowledge workers FunctionDay to day operationDecision support DataCurrentHistorical ViewDetailed,flat relationalSummarized, multidimensional DB designApplication orientedSubject oriented Unit of workShort,simple transactionComplex query AccessRead/writeMostly read
23
Operational v/s Information System FeaturesOperationalInformation FocusData inInformation out Number of records accessed tensmillions Number of usersthousandshundreds DB size100MB to GB100 GB to TB PriorityHigh performance,high availability High flexibility,end- user autonomy MetricTransaction throughputQuery througput
24
Data Warehousing Architecture Extract Transform Load Refresh Serve External Sources Operational Dbs Analysis Query/Reporting Data Mining Monitoring & Administration Metadata Repository DATA SOURCES TOOLS DATA MARTS OLAP Servers Reconciled data
25
Data Warehouse Architecture Data Warehouse server – almost always a relational DBMS,rarely flat files OLAP servers – to support and operate on multi-dimensional data structures Clients – Query and reporting tools – Analysis tools – Data mining tools
26
Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema
27
Star Schema A single,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.
28
Star Schema (contd..) Store Key Product Key Period Key Units Price Store Dimension Time Dimension Product Dimension Fact Table Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins. Store Key Store Name City State Region Period Key Year Quarter Month Product Key Product Desc
29
SnowFlake Schema Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
30
SnowFlake Schema (contd..) Store Key Product Key Period Key Units Price Time Dimension Product Dimension Fact Table Store Key Store Name City Key Period Key Year Quarter Month Product Key Product Desc City Key City State Region City Dimension Store Dimension Drawbacks: Time consuming joins,report generation slow
31
Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.
32
Fact Constellation (contd..) Store Key Product Key Period Key Units Price Store Dimension Product Dimension Sales Fact Table Store Key Store Name City State Region Product Key Product Desc Shipper Key Store Key Product Key Period Key Units Price Shipping Fact Table
33
Building Data Warehouse Data Selection Data Preprocessing – Fill missing values – Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.
34
Case Study Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda. There products are sold in North,North West and Western region of India. They have sales units at Mumbai, Pune, Ahemdabad,Delhi and Baroda. The President of the company wants sales information.
35
Sales Information Report: The number of units sold. 113 Report: The number of units sold over time JanuaryFebruaryMarchApril 14413325
36
Sales Information Report : The number of items sold for each product with time JanFebMarApr Wheat Bread617 Cheese61668 Swiss Rolls82521 Product Time
37
Sales Information Report: The number of items sold in each City for each product with time JanFebMarApr MumbaiWheat Bread310 Cheese3166 Swiss Rolls4166 PuneWheat Bread37 Cheese38 Swiss Rolls4915 Product Time City
38
Sales Information Report: The number of items sold and income in each region for each product with time. JanFebMarApr RsU U U U MumbaiWheat Bread7.44324.8010 Cheese7.95342.401615.906 Swiss Rolls7.32429.981610.986 PuneWheat Bread7.44317.367 Cheese7.95321.208 Swiss Rolls7.32416.47927.4515
39
Sales Measures & Dimensions Measure – Units sold, Amount. Dimensions – Product,Time,Region.
40
Sales Data Warehouse Model CityProductMonthUnitsRupees MumbaiWheat BreadJanuary37.95 MumbaiCheeseJanuary47.32 PuneWheat BreadJanuary37.95 PuneCheeseJanuary47.32 MumbaiSwiss RollsFebruary1642.40 Fact Table
41
Sales Data Warehouse Model City_IDProd_IDMonthUnitsRupees 15891/1/199837.95 112181/1/199847.32 25891/1/199837.95 212181/1/199847.32 15892/1/19981642.40
42
Sales Data Warehouse Model Product Dimension Tables Prod_IDProduct_NameProduct_Category_ID 589Wheat Bread1 590White Bread1 288Coconut Cookies2 Product_Category_IdProduct_Category 1Bread 2Cookies
43
Sales Data Warehouse Model Region Dimension Table City_IDCityRegionCountry 1MumbaiWestIndia 2PuneNorthWestIndia
44
Sales Data Warehouse Model Sales Fact Region Product Category Time
45
Online Analysis Processing(OLAP) It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. Data Warehouse Time Product Region
46
OLAP Cube CityProductTimeUnitsDollars All 113251.26 MumbaiAll 64146.07 MumbaiWhite BreadAll3898.49 MumbaiWheat BreadAll1332.24 MumbaiWheat BreadQtr137.44 MumbaiWheat BreadMarch37.44
47
OLAP Operations Drill Down Time Region Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster
48
OLAP Operations Drill Up Time Region Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster
49
OLAP Operations Slice and Dice Time Region Product Product=Toaster Time Region
50
Presentation Time Region Product Report Reporting Tool
51
Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation. RDBMS Flat File Presentation Cleaning,Selection & Integration Warehouse & OLAP server Client
52
Need for Data Warehousing Industry has huge amount of operational data Knowledge worker wants to turn this data into useful information. This information is used by them to support strategic decision making.
53
Need for Data Warehousing (contd..) It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions.
54
Need for Data Warehousing (contd..) From business perspective -it is latest marketing weapon -helps to keep customers by learning more about their needs. -valuable tool in today’s competitive fast evolving world.
55
Data Warehousing Tools Data Warehouse – SQL Server 2010 BI – Oracle 8i Warehouse Builder OLAP tools – SQL Server Analysis Services – Data Stage Reporting tools – MS Excel Pivot Chart – SQL Server Reporting Services
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.