Successful Dimensional Modeling of Very Large Data Warehouses By Bert Scalzo, Ph.D.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Supervisor : Prof . Abbdolahzadeh
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Technical BI Project Lifecycle
Data Warehousing M R BRAHMAM.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Chapter 3 Database Management
Data Modeling on Steroids: Patterns and Reusability Presented by Bert Scalzo, PhD
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Copyright © 2006 Quest Software Data Modeling: It’s All About the Relationships Presenter: Bert Scalzo, Oracle Domain Expert AUDIO.
Chapter 13 The Data Warehouse
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Managing Database Change with Data Modeling Bert Scalzo, PhD
Data Warehouse Toolkit Introduction. Data Warehouse Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An.
Designing a Data Warehouse
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
1.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Introduction: Databases and Database Users
Optimizing Data Warehouse Ad-Hoc Queries against "Star Schemas " By Bert Scalzo, Ph.D.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Building and Optimizing Data Warehouse "Star Schemas" with MySQL Bert Scalzo, Ph.D.
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Two-Tier DW Architecture. Three-Tier DW Architecture.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Copyright © 2006 Quest Software Quest RAC Tools Bert Scalzo, Domain Expert, Oracle Solutions
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Supervisor : Prof . Abbdolahzadeh
Advanced Applied IT for Business 2
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Applying Data Warehouse Techniques
Unidad II Data Warehousing Interview Questions
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Dimensional Modeling.
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Data Warehouse and OLAP Technology
Data Warehousing.
Presentation transcript:

Successful Dimensional Modeling of Very Large Data Warehouses By Bert Scalzo, Ph.D.

Learning Objectives  Application Nature versus Data Modeling Approach  Important DW/DM Concepts for “Star Schema” Design  Transforming a simple data model into a “Star Schema”  Why Hierarchies are better than Snowflakes  Common Aggregation/Summarization Themes  Recommendations for Implementing Facts  Recommendations for Indexes and Keys  Oracle Issues (not modeling topic, but always asked for) –Partitioning Options –Indexing Options –Tuning Star Queries –Materialized Views

Speaker’s Qualifications  Oracle Solutions Product Architect for Quest Software  Chief architect for Quest’s popular “TOAD” product  Oracle DBA for 20+ years, versions 4 through 10g  Worked for Oracle Education & Consulting  Holds several Oracle Masters (DBA & CASE)  BS, MS, PhD in Computer Science and also an MBA  LOMA insurance industry designations: FLMI and ACS  Books –The TOAD Handbook (Feb 2003) –Oracle DBA Guide to Data Warehousing and Star Schemas (Mar 2003) –The TOAD Pocket Reference 2 nd edition (June 2005)  Articles –Oracle Magazine –Oracle Technology Network (OTN) –Oracle Informant –PC Week (now E-Magazine) –Linux Journal –

New 2 nd Edition – June 2005

About Quest Software  Quest Software (NASDAQ: QSFT)  Founded: 1987  More than 2000 employees in 40 offices: North America, South America, Europe, Asia, Australia  Application management leader: 75% of Fortune 500  Develop, deploy, manage and maintain enterprise applications without downtime or business interruption  Best known in the Oracle community for TOAD, Spotlight, Quest Central, Shareplex, etc.

The Architect will create the first high level drawings to validate the concept with the client and then make a more detailed plan (i.e. the blueprint ) for the Contractor … The Contractor will take this blueprint and optimise it based on technical constraints. The Contractor will then create the actual office. Would you build an office without a blueprint? Why do we model?

Where in Development Lifecycle Design Develop Deploy Monitor & Maintain Reengineer Analysis Conceptual Physical Some shops just treat this as one big “Design” task Not uncommon for Star Schema data model to concentrate more on physical design characteristics

World of Modeling … Identify all data & relationships - E/R (Entity/Rel’ship) diagrams - DB independent view Business Rules? Conceptual Data Modeling (CDM – E/R) Physical Data Modeling (PDM) Business Process Modeling (BPM) Object-Oriented Modeling (OOM - UML) DB-specific model Reverse engineer existing DB Create/Update DB from model Data Warehouse Modeling DBA DB Developer DB Architect Bus. Analyst Data Architect Data Analyst System Architect System Analyst App Developer End-user IT Partner/Liaison Business Analyst Support for all UML diagrams - Analyze requirements - Design application Reverse/forward engineer code Improve process efficiency Define/document Bus. Processes - create correct and complete application requirements Quest’s “QDesigner” synchronizes models from all levels in a single tool

Know Your Application … What type of application are you building:  On Line Transaction Processing (OLTP)  Operational Data Store (ODS)  On Line Analytical Processing (OLAP)  Data Mart / Data Warehouse (DM/DW)

Warehouse Architecture

OLTPODSOLAPDM/DW Business Focus OperationalOperational Tactical TacticalTactical Strategic End User Tools Client Server Web Client ServerClient Server Web DB Technology Relational CubicRelational Trans CountLargeMediumSmall Trans SizeSmallMedium Large Trans TimeShortMediumLong Size in Gigs10 – – Normalization3NF N/A0NF Data Modeling Traditional ER N/ADimensional Application Natures…

Embrace New Concepts  “Teach Old Dog New Tricks”  Throw out any OLTP baggage  Forget OLTP “Golden Rules” X

Star Schema Design “Star schema” approach to dimensional data modeling was pioneered by Ralph Kimball Dimensions: smaller, de-normalized tables containing business descriptive columns that end-users query on Facts: very large tables with primary keys formed from the concatenation of related dimension table foreign key columns, and possessing numerically additive, non- key columns used for calculations during end-user queries

Dimensions Facts

10 8th th 10 3rd -10 5th

Transform OLTP Model Fold OLTP model into itself to form a Star:  De-Normalize parent/child relationships  De-Normalize lookup relationships  Use surrogate or meaningless keys  Create and populate a time dimension  Create hierarchies of data in dimensions

OLTP Model

Dimensional Model

Dimension Hierarchies SQL> select distinct levelx from dw_period; LEVELX DAY MONTH QUARTER WEEK YEAR SQL> select distinct levelx from dw_product; LEVELX ALL PRODUCTS CATEGORY ITEM PSA SUB_CATEGORY

Avoid Snowflakes Avoid natural desire to normalize model:  Complicates end-user query construction  Adds additional level of “JOIN” complexity  Database optimizers do not handle very well  Saves some space at the cost of longer queries

Snowflake Model

Common Aggregations Build end-user driven aggregate tables:  By time (e.g. week, month, quarter, year)  By geographic regions (e.g. time zones)  By end-user reporting interests (e.g. beer)  By dimension hierarchy (e.g. product category)  Aggregates should be 5 to 10 times smaller

Time Aggregates

Non-Time Aggregates

Index Design One Very Simple Rule:  All fact table, foreign key columns must have individual bitmap indexes on them  All dimension table columns should each have individual bitmap indexes

Nighttime - 10 B-Tree Indexes

Daytime - 48 Bitmap Indexes!!!

Bit-map indexes –Contrary to widespread belief, can be effective when there are many distinct column values –Not suitable for OLTP however

Key Fact Table Issues Fact tables should:  NOT create or enable foreign key constraints (exception – MV’s need FK’s for query rewrites)  NOT create or enable table check constraints  NOT create or enable primary/unique constraints (use unique indexes which offer parallel creation)  NOT create or enable column check constraints (other than simple NOT NULL check constraints)  NOT create or enable “row” level triggers  NOT enable logging on tables or their indexes

No PK/UK/FK Constraints

Key Oracle Issues …  Trust me – no way to build a large DW/DM in Oracle 7.X (don’t recommend 8.X either)  Very brief overview in next few slides of: –Partitioning options –Indexing options –Comparative timings –Tuning ad-hoc Star queries –Serial versus Parallel queries –Materialized Views …

Oracle Partitioning Way beyond the scope of dimensional modeling, but:  Use Range or List Partitioning using time dimension  Fact unique index = local, prefixed b-tree index  Fact time index = local, prefixed bitmap index  Fact non-time index = local, non-prefixed bitmap index  If any non-time dimension provides a good locality of reference for typical user queries, then sub-partition on that dimension (i.e composite partitioning) – but note that under non-ideal data distributions, things could be worse or sometime even much worse…

Indexing Options …

Query Time vs. Table Design NOTE: specific to my data and user queries Fact ImplementationTiming Regular “Heap” Table9,293 Single Column Partition4,747 Multi Column Partition4,987 Composite Partition6,319 Index Organized Table12,508 Partition Index Organized14,902

Tuning Star Queries … Way beyond the scope of dimensional modeling, but:  Use Range Partitioning based upon your time dimension (do not try to force use of hash or composite partitioning)  Fact unique index uses local, prefixed b-tree index  Fact time index uses local, prefixed bitmap index  Fact non-time index use local, non-prefixed bitmap index

Query: beer and coffee sales for November of 98 in Dallas Example BI Generated Query

Star Transformation Star Transformation Explain

Star join performance 3 orders of magnitude difference between best and worst plan

NOTE: specific to my data and user queries Explain PlanUNIXNT Serial, No Partition9,68822,344 Serial, with Partition5,57811,625 Parallel, No PartitionORA- 600 Parallel, with Partition11,14025,454 Query Time vs. Serial/Parallel

Oracle Materialized Views Way beyond the scope of dimensional modeling, but:  Special form of snapshots (i.e. replication)  End-users direct all queries against detail table  Optimizer rewrites queries to use best aggregate  Optimizer suggests new aggregates based on load  Eliminates need for numerous aggregation programs

Exercise caution when creating materialized views Conclusion: Better to rebuild MV after load – not concurrent with load

Parting Thoughts …  To be successful, all modelers’ mindset must change from an OLTP to DW/DM paradigm  There are many other key/core data modeling issues – this was just but one of them … –Breaking models into sub-models –Repository-based collaborative modeling –Modeling the relationships between OLTP and DW models –Documenting the meta-data for OLTP ETL transformations –Modeling the Business Requirements –Object-Relational Mapping –etc, etc, etc …