Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Data Warehousing. Subject: Data Warehousing.

Similar presentations


Presentation on theme: "Introduction to Data Warehousing. Subject: Data Warehousing."— Presentation transcript:

1 Introduction to Data Warehousing

2 amira.idrees@yahoo.com Subject: Data Warehousing

3 Why Data Warehouse? Necessity is the mother of invention

4 Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

5 Scenario 1 : ABC Pvt Ltd. Mumbai Delhi Chennai Banglore Sales Manager Sales per item type per branch for first quarter.

6 Solution 1:ABC Pvt Ltd. Extract sales information from each database. Store the information in a common repository at a single site.

7 Solution 1:ABC Pvt Ltd. Mumbai Delhi Chennai Banglore Data Warehouse Sales Manager Query & Analysis tools Report

8 Scenario 2 One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

9 Scenario 2 : One Stop Shopping Operational Database Data Entry Operator Management Wait Report

10 Solution 2 Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.

11 Solution 2 Operational database Data Warehouse Extract data Data Entry Operator Data Entry Operator Manager Report Transaction

12 Scenario 3 Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.

13 Solution 3 Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.

14 Solution 3 Query and Analysis tool President Expansio n Improvemen t sales time Data Warehouse

15 What is Data Warehouse??

16 Inmons’s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process.

17 Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.

18 Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation

19 Integration In terms of data. – encoding structures. – Measurement of attributes. – physical attribute. of data – naming conventions. – Data type format remarks

20 Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time

21 Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access

22 Operational v/s Information System FeaturesOperationalInformation CharacteristicsOperational processingInformational processing OrientationTransactionAnalysis UserClerk,DBA,database professional Knowledge workers FunctionDay to day operationDecision support DataCurrentHistorical ViewDetailed,flat relationalSummarized, multidimensional DB designApplication orientedSubject oriented Unit of workShort,simple transactionComplex query AccessRead/writeMostly read

23 Operational v/s Information System FeaturesOperationalInformation FocusData inInformation out Number of records accessed tensmillions Number of usersthousandshundreds DB size100MB to GB100 GB to TB PriorityHigh performance,high availability High flexibility,end- user autonomy MetricTransaction throughputQuery througput

24 Data Warehousing Architecture Extract Transform Load Refresh Serve External Sources Operational Dbs Analysis Query/Reporting Data Mining Monitoring & Administration Metadata Repository DATA SOURCES TOOLS DATA MARTS OLAP Servers Reconciled data

25 Data Warehouse Architecture Data Warehouse server – almost always a relational DBMS,rarely flat files OLAP servers – to support and operate on multi-dimensional data structures Clients – Query and reporting tools – Analysis tools – Data mining tools

26 Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema

27 Star Schema A single,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.

28 Star Schema (contd..) Store Key Product Key Period Key Units Price Store Dimension Time Dimension Product Dimension Fact Table Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins. Store Key Store Name City State Region Period Key Year Quarter Month Product Key Product Desc

29 SnowFlake Schema Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables

30 SnowFlake Schema (contd..) Store Key Product Key Period Key Units Price Time Dimension Product Dimension Fact Table Store Key Store Name City Key Period Key Year Quarter Month Product Key Product Desc City Key City State Region City Dimension Store Dimension Drawbacks: Time consuming joins,report generation slow

31 Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.

32 Fact Constellation (contd..) Store Key Product Key Period Key Units Price Store Dimension Product Dimension Sales Fact Table Store Key Store Name City State Region Product Key Product Desc Shipper Key Store Key Product Key Period Key Units Price Shipping Fact Table

33 Building Data Warehouse Data Selection Data Preprocessing – Fill missing values – Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.

34 Case Study Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda. There products are sold in North,North West and Western region of India. They have sales units at Mumbai, Pune, Ahemdabad,Delhi and Baroda. The President of the company wants sales information.

35 Sales Information Report: The number of units sold. 113 Report: The number of units sold over time JanuaryFebruaryMarchApril 14413325

36 Sales Information Report : The number of items sold for each product with time JanFebMarApr Wheat Bread617 Cheese61668 Swiss Rolls82521 Product Time

37 Sales Information Report: The number of items sold in each City for each product with time JanFebMarApr MumbaiWheat Bread310 Cheese3166 Swiss Rolls4166 PuneWheat Bread37 Cheese38 Swiss Rolls4915 Product Time City

38 Sales Information Report: The number of items sold and income in each region for each product with time. JanFebMarApr RsU U U U MumbaiWheat Bread7.44324.8010 Cheese7.95342.401615.906 Swiss Rolls7.32429.981610.986 PuneWheat Bread7.44317.367 Cheese7.95321.208 Swiss Rolls7.32416.47927.4515

39 Sales Measures & Dimensions Measure – Units sold, Amount. Dimensions – Product,Time,Region.

40 Sales Data Warehouse Model CityProductMonthUnitsRupees MumbaiWheat BreadJanuary37.95 MumbaiCheeseJanuary47.32 PuneWheat BreadJanuary37.95 PuneCheeseJanuary47.32 MumbaiSwiss RollsFebruary1642.40 Fact Table

41 Sales Data Warehouse Model City_IDProd_IDMonthUnitsRupees 15891/1/199837.95 112181/1/199847.32 25891/1/199837.95 212181/1/199847.32 15892/1/19981642.40

42 Sales Data Warehouse Model Product Dimension Tables Prod_IDProduct_NameProduct_Category_ID 589Wheat Bread1 590White Bread1 288Coconut Cookies2 Product_Category_IdProduct_Category 1Bread 2Cookies

43 Sales Data Warehouse Model Region Dimension Table City_IDCityRegionCountry 1MumbaiWestIndia 2PuneNorthWestIndia

44 Sales Data Warehouse Model Sales Fact Region Product Category Time

45 Online Analysis Processing(OLAP) It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. Data Warehouse Time Product Region

46 OLAP Cube CityProductTimeUnitsDollars All 113251.26 MumbaiAll 64146.07 MumbaiWhite BreadAll3898.49 MumbaiWheat BreadAll1332.24 MumbaiWheat BreadQtr137.44 MumbaiWheat BreadMarch37.44

47 OLAP Operations Drill Down Time Region Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster

48 OLAP Operations Drill Up Time Region Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster

49 OLAP Operations Slice and Dice Time Region Product Product=Toaster Time Region

50 Presentation Time Region Product Report Reporting Tool

51 Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation. RDBMS Flat File Presentation Cleaning,Selection & Integration Warehouse & OLAP server Client

52 Need for Data Warehousing Industry has huge amount of operational data Knowledge worker wants to turn this data into useful information. This information is used by them to support strategic decision making.

53 Need for Data Warehousing (contd..) It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions.

54 Need for Data Warehousing (contd..) From business perspective -it is latest marketing weapon -helps to keep customers by learning more about their needs. -valuable tool in today’s competitive fast evolving world.

55 Data Warehousing Tools Data Warehouse – SQL Server 2010 BI – Oracle 8i Warehouse Builder OLAP tools – SQL Server Analysis Services – Data Stage Reporting tools – MS Excel Pivot Chart – SQL Server Reporting Services


Download ppt "Introduction to Data Warehousing. Subject: Data Warehousing."

Similar presentations


Ads by Google