An overview of Data Warehousing and OLAP Technology

Slides:



Advertisements
Similar presentations
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Advertisements

C6 Databases.
April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Data Warehousing.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
Data Warehousing M R BRAHMAM.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Data Warehousing Xintao Wu. Evolution of Database Technology (See Fig. 1.1) 1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational.
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
1 Lecture 10: More OLAP - Dimensional modeling
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
CS346: Advanced Databases
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Components of the Data Warehouse Michael A. Fudge, Jr.
1 Data Warehouses C hapter 2. 2 Chapter 2 Outline Chapter 2 Outline – Introduction –Data Warehouses –Data Warehouse in Organisation – OLTP vs. OLAP –Why.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehousing.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CISB594 – Business Intelligence Data Warehousing Part I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Data Mining Data Warehouses.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Advanced Database Concepts
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Mining: Data Warehousing
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
OLAP Concepts and Techniques
Data Warehouse.
Data Warehousing and OLAP Technology for Data Mining
Data Warehouse and OLAP
Overview of Data Warehousing and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Data Warehousing: Data Models and OLAP operations
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Mining: Concepts and Techniques
Data Warehouse.
Data Warehousing Concepts
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

An overview of Data Warehousing and OLAP Technology Original slides by: Manish Desai Modified and presented by Alice Leung

Introduction Essential elements of decision support Enables The Knowledge Worker to make better and faster decisions Used in many industries like: Manufacturing (for order shipment) Retail (for inventory management) Financial Services (claims and risk analysis) Every major database vendor offers product in this area

What is Data Warehouse ? A data warehouse is a “subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making” Typically maintained separately from operational databases

Explanation of definition Subject-Oriented: Designed around subject such as customer, vendor, product and activity Does not includes data that are not needed for Decision support system (DSS) Integrated: Most important feature Consistent naming convention, measurement of variables and so forth The data should be stored in single globally acceptable fashion

Explanation (continues…) Time Varying: All data in the warehouse should be accurate as of some moment in time Data stored over a long time horizon (5 –10 years) Key structure contains element of time (implicitly or explicitly) Data once correctly recorded cant be updated Non Volatile: No Update of data allowed only loading and access of data operations

Why Separate Data Warehouse? High performance for both systems DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation Different functions and different data: missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled April 20, 2017 Data Mining: Concepts and Techniques 6

Data Warehouse Vs. Operational Database User Knowledge worker (Executive, manager, analyst) Clerk, IT professional Function Decision support Day to day operations Data Historical,summarized, multidimensional, integrated Current, up-to-date, detailed Unit of work Complex query Short, simple transaction DB Design Subject-oriented Application-oriented Metric Query throughout, response Transaction throughput

Tiered Architecture Data Sources OLAP Engine Data Warehouse Extract Transform Load Refresh Data Sources Operational Databases External Sources Serve OLAP Engine OLAP Server Tier2: OLAP Server Tier3: Clients Tier1: Data Warehouse Server Data Warehouse Analysis Query/Reports Data mining Data Marts Data Storage Front-End Tools

Architecture (continues…) Distributed Data warehouse Load balancing, scalability,higher availability Meta data replicated and centrally administrated Too expansive Data marts Departmental subset focused on selected subjects example: marketing department includes customer, sales and product tables Has own repository and administration May lead to complex integration problems if not designed properly

Back end tools and Utilities Data cleaning, loading, refreshing tools Cleaning Multiple source, possibility of errors Example: replace string sex by gender Loading Building indices, sorting and making access paths Large amount of data Incremental loading Only updated tuples are inserted ,Process hard to manage Refresh Propagating updates When to refresh ? Set by administrator depending on user needs and traffic

Conceptual Model and front end tools Multi dimensional view Dimensions together uniquely determine the measure Example: Sales can be represented as city,product, data Each dimension is described by set of attribute Example: product consist of Category of product Industry of product Year of introduction Front end tools Multi dimensional spreadsheet Supports Pivoting-reorientation Roll_up - summarized data Drill_down - go from high level to low level summary

Conceptual Model Date Product Country Total annual sales of TV in U.S.A. 1 2 3 4 sum TV Product PC U.S.A PVR sum Canada Country Mexico sum ALL

Database design Two ways to represent Multi dimensional model Star schema Database consist of single fact table and single table for each dimension Each tuples in fact table consist of pointer to each of dimension Snowflake schema Refinement over star schema Dimensional hierarchy is explicitly represented by normalizing dimension tables

Star Schema Time Sales Fact Table item Time_key Item_key Branch location_key street city province country location Time_key Item_key Branch_key Location_key Units_sold Dollars_sold Avg_sales Measures B_key B_name B_type Branch I_key I_name I_brand I_type I_supplier_type item T_key T_day T_day_week T_month T_quarter T_year Time Sales Fact Table

Snowflake Schema Time Sales Fact Table Item time_key item_key Branch T_key T_day T_day_week T_month T_quarter T_year Sales Fact Table I_key I_name I_brand I_type I_supplier_type Item time_key item_key branch_key location_key units_sold dollars_sold avg_sales S_key S_type Supplier C_key C_city C_province C_country City B_key B_name B_type Branch Measures location_key street city Location Snowflake Schema

Warehouse Servers Specialized SQL servers Provides advanced query language and query processing support for SQL queries over star and snowflake schemas Example: Redbrick ROLAP Between relational back end and client front end tools Extend traditional relational servers to support multidimensional queries Example: Microstratergy MOLAP Multidimensional storage engine Direct mapping Example: Essbase from Arbor Inc.

Index structures Bit map indices Join indices Use single bit to indicate specific value of attribute Example: instead of storing eight characters to record “engineer” as skill of employee use single bit id# Name Skill 1000 John 1 Join indices Maintains the relationship between foreign key with its matching primary keys

Meta data and warehouse management Its data about data Used for building, maintain, managing and using data warehouse Administrative meta data Information about setting up and using warehouse Business meta data Business terms and definition Operational meta data Information collected during operation of warehouse