Presentation is loading. Please wait.

Presentation is loading. Please wait.

An overview of Data Warehousing and OLAP Technology

Similar presentations


Presentation on theme: "An overview of Data Warehousing and OLAP Technology"— Presentation transcript:

1 An overview of Data Warehousing and OLAP Technology
Original slides by: Manish Desai Modified and presented by Alice Leung

2 Introduction Essential elements of decision support
Enables The Knowledge Worker to make better and faster decisions Used in many industries like: Manufacturing (for order shipment) Retail (for inventory management) Financial Services (claims and risk analysis) Every major database vendor offers product in this area

3 What is Data Warehouse ? A data warehouse is a “subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making” Typically maintained separately from operational databases

4 Explanation of definition
Subject-Oriented: Designed around subject such as customer, vendor, product and activity Does not includes data that are not needed for Decision support system (DSS) Integrated: Most important feature Consistent naming convention, measurement of variables and so forth The data should be stored in single globally acceptable fashion

5 Explanation (continues…)
Time Varying: All data in the warehouse should be accurate as of some moment in time Data stored over a long time horizon (5 –10 years) Key structure contains element of time (implicitly or explicitly) Data once correctly recorded cant be updated Non Volatile: No Update of data allowed only loading and access of data operations

6 Why Separate Data Warehouse?
High performance for both systems DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation Different functions and different data: missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled April 20, 2017 Data Mining: Concepts and Techniques 6

7 Data Warehouse Vs. Operational Database
User Knowledge worker (Executive, manager, analyst) Clerk, IT professional Function Decision support Day to day operations Data Historical,summarized, multidimensional, integrated Current, up-to-date, detailed Unit of work Complex query Short, simple transaction DB Design Subject-oriented Application-oriented Metric Query throughout, response Transaction throughput

8 Tiered Architecture Data Sources OLAP Engine Data Warehouse
Extract Transform Load Refresh Data Sources Operational Databases External Sources Serve OLAP Engine OLAP Server Tier2: OLAP Server Tier3: Clients Tier1: Data Warehouse Server Data Warehouse Analysis Query/Reports Data mining Data Marts Data Storage Front-End Tools

9 Architecture (continues…)
Distributed Data warehouse Load balancing, scalability,higher availability Meta data replicated and centrally administrated Too expansive Data marts Departmental subset focused on selected subjects example: marketing department includes customer, sales and product tables Has own repository and administration May lead to complex integration problems if not designed properly

10 Back end tools and Utilities
Data cleaning, loading, refreshing tools Cleaning Multiple source, possibility of errors Example: replace string sex by gender Loading Building indices, sorting and making access paths Large amount of data Incremental loading Only updated tuples are inserted ,Process hard to manage Refresh Propagating updates When to refresh ? Set by administrator depending on user needs and traffic

11 Conceptual Model and front end tools
Multi dimensional view Dimensions together uniquely determine the measure Example: Sales can be represented as city,product, data Each dimension is described by set of attribute Example: product consist of Category of product Industry of product Year of introduction Front end tools Multi dimensional spreadsheet Supports Pivoting-reorientation Roll_up - summarized data Drill_down - go from high level to low level summary

12 Conceptual Model Date Product Country Total annual sales
of TV in U.S.A. 1 2 3 4 sum TV Product PC U.S.A PVR sum Canada Country Mexico sum ALL

13 Database design Two ways to represent Multi dimensional model
Star schema Database consist of single fact table and single table for each dimension Each tuples in fact table consist of pointer to each of dimension Snowflake schema Refinement over star schema Dimensional hierarchy is explicitly represented by normalizing dimension tables

14 Star Schema Time Sales Fact Table item Time_key Item_key Branch
location_key street city province country location Time_key Item_key Branch_key Location_key Units_sold Dollars_sold Avg_sales Measures B_key B_name B_type Branch I_key I_name I_brand I_type I_supplier_type item T_key T_day T_day_week T_month T_quarter T_year Time Sales Fact Table

15 Snowflake Schema Time Sales Fact Table Item time_key item_key Branch
T_key T_day T_day_week T_month T_quarter T_year Sales Fact Table I_key I_name I_brand I_type I_supplier_type Item time_key item_key branch_key location_key units_sold dollars_sold avg_sales S_key S_type Supplier C_key C_city C_province C_country City B_key B_name B_type Branch Measures location_key street city Location Snowflake Schema

16 Warehouse Servers Specialized SQL servers
Provides advanced query language and query processing support for SQL queries over star and snowflake schemas Example: Redbrick ROLAP Between relational back end and client front end tools Extend traditional relational servers to support multidimensional queries Example: Microstratergy MOLAP Multidimensional storage engine Direct mapping Example: Essbase from Arbor Inc.

17 Index structures Bit map indices Join indices
Use single bit to indicate specific value of attribute Example: instead of storing eight characters to record “engineer” as skill of employee use single bit id# Name Skill John Join indices Maintains the relationship between foreign key with its matching primary keys

18 Meta data and warehouse management
Its data about data Used for building, maintain, managing and using data warehouse Administrative meta data Information about setting up and using warehouse Business meta data Business terms and definition Operational meta data Information collected during operation of warehouse


Download ppt "An overview of Data Warehousing and OLAP Technology"

Similar presentations


Ads by Google