Decision Support, Data Warehousing, and OLAP By Prof. Sham Navathe Georgia Institute of Technology (Courtesy : Prof. Anindya Datta) Extensions by Svetlana.

Slides:



Advertisements
Similar presentations
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Advertisements

C6 Databases.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Data Warehousing M R BRAHMAM.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Database Systems: Design, Implementation, and Management Tenth Edition
CSE6011 Data Warehouse and OLAP  Why data warehouse  What’s data warehouse  What’s multi-dimensional data model  What’s difference between OLAP and.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Data Warehousing Min Song IS 465. Decision Support and OLAP Information technology to help the knowledge worker (executive, manager, analyst) make faster.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DATA WAREHOUSE (Muscat, Oman).
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
An overview of Data Warehousing and OLAP Technology
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Datawarehouse Objectives
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Decision Support, Data Warehousing, and OLAP Anindya Datta Director, iXL Center for E-Commerce Georgia Institute of Technology
Data Warehousing.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Warehousing and OLAP. Warehousing ► Growing industry: $8 billion in 1998 ► Range from desktop to huge:  Walmart: 900-CPU, 2,700 disk, 23TB Teradata.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Data Warehousing Multidimensional Analysis
7 Strategies for Extracting, Transforming, and Loading.
Data Warehousing.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Data Warehouse and OLAP
Data Warehousing: Data Models and OLAP operations
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

Decision Support, Data Warehousing, and OLAP By Prof. Sham Navathe Georgia Institute of Technology (Courtesy : Prof. Anindya Datta) Extensions by Svetlana Mansmann University of Konstanz

Outline  Terminology: OLAP vs. OLTP  Data Warehousing Architecture  Technologies  Products  References

Decision Support and OLAP  Information technology to help the knowledge worker (executive, manager, analyst) make faster and better decisions What were the sales volumes by region and product category for the last year?What were the sales volumes by region and product category for the last year? How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years?How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years? Which orders should we fill to maximize revenues?Which orders should we fill to maximize revenues? Will a 10% discount increase sales volume sufficiently?Will a 10% discount increase sales volume sufficiently? Which of two new medications will result in the best outcome: higher recovery rate & shorter hospital stay?Which of two new medications will result in the best outcome: higher recovery rate & shorter hospital stay?  On-Line Analytical Processing (OLAP) is an element of decision support systems (DSS)

Business Intelligence

Evolution  60’s: Batch reports  hard to find and analyze information  inflexible and expensive, reprogram every new request  70’s: Terminal-based DSS and EIS (executive information systems)  still inflexible, not integrated with desktop tools  80’s: Desktop data access and analysis tools  query tools, spreadsheets, GUIs  easier to use, but only access operational databases  90’s: Data warehousing with integrated OLAP engines and tools  2000’s: Personalization engines and e-commerce

OLTP vs. OLAP  Clerk, IT Professional  Day to day operations  Application-oriented (E-R based)  Current, Isolated  Detailed, Flat relational  Structured, Repetitive  Short, Simple transaction  Read/write  Index/hash on prim. Key  Tens  Thousands  100 MB-GB  Trans. throughput  Knowledge worker  Decision support  Subject-oriented (Star, snowflake)  Historical, Consolidated  Summarized, Multidimensional  Ad hoc  Complex query  Read Mostly  Lots of Scans  Millions  Hundreds  100 GB-TB  Query throughput, response User Function DB Design Data View Usage Unit of work Access Operations # Records accessed #Users Db size Metric OLTPOLAP

Data Warehouse  A decision support database that is maintained separately from the organization’s operational databases.  A data warehouse is a  subject-oriented,  integrated,  time-varying,  non-volatile  A collection of data that is used primarily in organizational decision making

Why Separate Data Warehouse?  Performance  Operational databases designed & tuned for known taxes & workloads  Complex OLAP queries would degrade performance, taxing operations  Special data organization, access & implementation methods needed for multidimensional views & queries

Why Separate Data Warehouse?  Function  Missing data: Decision support requires historical data, which operational databases do not typically maintain  Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: operational databases, external sources.  Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.

Data Warehousing / OLAP Market

Data Warehousing Market

Data Warehousing Architecture

Three-Tier Architecture  Warehouse database server  Almost always a relational DBMS; rarely flat files  OLAP servers  Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operations.  Multidimensional OLAP (MOLAP): special purpose server that directly implements multidimensional data and operations.  Clients  Query and reporting tools  Analysis tools  Data mining tools (e.g., trend analysis, prediction)

Data Warehouse vs. Data Marts  Enterprise warehouse: collects all information about subjects (customers, products, sales, assets, personnel) that span the entire organization.  Requires extensive business modeling  May take years to design and build  Data Marts: Departmental subsets that focus on selected subjects: Marketing data mart: customer, products, sales.  Faster roll out, but complex integration in the long run  Virtual warehouse: views over operational DBs  Materialize some summary views for efficient query processing  Easier to build  Requisite excess capacity on operational DB servers

Design & Operational Process  Define architecture. Do capacity planning.  Integrate DB and OLAP servers, storage and client tools.  Design warehouse schema, views.  Design physical warehouse organization: data placement, partitioning, access methods.  Connect sources: gateways, ODBC drivers, wrappers.  Design & implement scripts for data extract, load refresh.  Define metadata and populate repository.  Design & implement end-user applications.  Roll out warehouse and applications.  Monitor the warehouse.

OLAP for Decision Support  Goal of OLAP is to support ad-hoc querying for the business analyst  Business analysts are familiar with spreadsheets  Extend spreadsheet analysis model to work with warehouse data  Large data set  Semantically enriched to understand business terms (e.g., time, geography)  Combined with reporting features  Multidimensional view of data is the foundation of OLAP

OLAP for Decision Support  Pivot table - a multidimensional spreadsheet

Multidimensional Data Model  Database is a set of facts (points) in a multidimensional space  A fact has a measure dimension  quantity that is analyzed, e.g., sale, budget  A set of dimensions on which data is analyzed  e.g., store, product, date associated with a sale amount  Dimensions form a sparsely populated coordinate system  Each dimension has a set of attributes  e.g., owner city and county of store  Attributes of a dimension may be related by partial order  Hierarchy: e.g., street > county >city  Lattice: e.g., date> month>year, date>week>year

Multidimensional Data JuiceColaMilkCream NYLASF Sales volume as a function of date, city and product 3/1 3/2 3/3 3/4 Date Product City

Sample Data Cube ∑ Degree Diploma B.Sc. M.Sc. Term 1st2nd3rd4th Country Germany Switzerland U.S.A. German students in the 4th term pursuing a diploma Country Germany Switzerland U.S.A. ∑ ∑ ∑ ∑ ∑ ∑

Operations in Multidimensional Data Model  Aggregation (roll-up)  dimension reduction: e.g., total sales by city  summarization over aggregate hierarchy: e.g., total sales by city and year -> total sales by region and by year  Navigation to detailed data (drill-down)  e.g., (sales - expense) by city, top 3% of cities by average income  Selection (slice) defines a subcube  e.g., sales where city = Palo Alto and date = 1/15/96  Visualization Operations (e.g., Pivot)

A Visual Operation: Pivot (Rotate) JuiceColaMilkCream NYLASF 3/1 3/2 3/3 3/4 Month Region Product

Approaches to OLAP Servers  Relational OLAP (ROLAP)  Relational and Specialized Relational DBMS to store and manage warehouse data  OLAP middleware to support missing pieces  Optimize for each DBMS backend  Aggregation Navigation Logic  Additional tools and services  Multidimensional OLAP (MOLAP)  Array-based storage structures  Direct access to array data structures  Domain-specific enrichment

Relational DBMS as Warehouse Server  Schema design  Specialized scan, indexing and join techniques  Handling of aggregate views (querying and materialization)  Supporting query language extensions beyond SQL  Complex query processing and optimization  Data partitioning and parallelism

Warehouse Database Schema  ER design techniques not appropriate  Design should reflect multidimensional view  Star Schema  Snowflake Schema  Fact Constellation Schema

Example of a Star Schema Order No Order Date Customer No Customer Name Customer Address City SalespersonIDSalespersonNameCityQuota OrderNOSalespersonIDCustomerNOProdNoDateKeyCityNameQuantity Total Price ProductNOProdNameProdDescrCategoryCategoryDescriptionUnitPrice DateKeyDate CityNameStateCountry Order Customer Salesperson City Date Product Fact Table

Star Schema  A single fact table and a single table for each dimension  Every fact points to one tuple in each of the dimensions and has additional attributes  Does not capture hierarchies directly  Generated keys are used for performance and maintenance reasons  Fact constellation: Multiple Fact tables that share many dimension tables  Example: Projected expense and the actual expense may share dimensional tables

Example of a Snowflake Schema Order No Order Date Customer No Customer Name Customer Address City SalespersonIDSalespersonNameCityQuota OrderNOSalespersonIDCustomerNOProdNoDateKeyCityNameQuantity Total Price ProductNOProdNameProdDescrCategoryCategoryUnitPrice DateKeyDateMonth CityNameStateCountry Order Customer Salesperson City Date Product Fact Table CategoryNameCategoryDescr MonthYear Year StateNameCountry Category State Month Year

Snowflake Schema  Represent dimensional hierarchy directly by normalizing the dimension tables  Easy to maintain  Saves storage, but is alleged that it reduces effectiveness of browsing (Kimball)  Galaxy schema: multiple fact tables with shared dimension categories

Population & Refreshing the Warehouse  Data extraction  Data cleaning  Data transformation  Convert from legacy/host format to warehouse format  Load  Sort, summarize, consolidate, compute views, check integrity, build indexes, partition  Refresh  Propagate updates from sources to the warehouse

Metadata Repository  Administrative metadata  source databases and their contents  gateway descriptions  warehouse schema, view & derived data definitions  dimensions, hierarchies  pre-defined queries and reports  data mart locations and contents  data partitions  data extraction, cleansing, transformation rules, defaults  data refresh and purging rules  user profiles, user groups  security: user authorization, access control

Metadata Repository.. 2  Business data  business terms and definitions  ownership of data  charging policies  operational metadata  data lineage: history of migrated data and sequence of transformations applied  currency of data: active, archived, purged  monitoring information: warehouse usage statistics, error reports, audit trails.

Warehouse Design Tools  Creating and managing a warehouse is hard  Development tools  defining & editing metadata repository contents (schemas, scripts, rules)  Queries and reports  Shipping metadata to and from RDBMS catalogue (e.g., Prism Warehouse Manager)  Planning & analysis tools  impact of schema changes  capacity planning  refresh performance: changing refresh rates or time windows

Warehouse Management Tools  Monitoring and reporting tools (e.g., HP Intelligent Warehouse Advisor)  which partitions, summary tables, columns are used  query execution times  for summary tables, types & frequencies of roll downs  warehouse usage over time (detect peak periods)  Systems and network management tools (e.g., HP OpenView, IBM NetView, Tivoli): traffic, utilization  Exception reporting/alerting tools 9e.g., DB2 Event Alerters, Information Advantage InfoAgents & InfoAlert)  runaway queries  Analysis/Visualization tools: OLAP on metadata

OLAP Tools < Existing Tools: Seagate, Brio, Cognos Functionality: - Choice of tables - Allowing user to specify interrelation relationships - Use of filtering conditions - Construction of “cubes on the fly” < Main Problems: Cost per license, poor semantics of aggregations across tables, performance for multiple dimension cubes  Visual OLAP Tool Tableau: