Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when specific queries occur frequently, a strict performance.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Chapter 10: Designing Databases
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
DENORMALIZATION CSCI 6442 © Copyright 2015, David C. Roberts, all rights reserved.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Dimensional Modeling Business Intelligence Solutions.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 6 Methodology Logical Database Design for the Relational Model Transparencies © Pearson Education Limited 1995, 2005.
Chapter 3 Database Management
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Chapter 17 Designing Databases
Physical Database Monitoring and Tuning the Operational System.
Methodology Logical Database Design for the Relational Model
Chapter 11 Data Management Layer Design
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Chapter 14 The Second Component: The Database.
Chapter 13 The Data Warehouse
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Logical DB Design 5. 1 CSE2132 Database Systems Week 5 Lecture Logical Database Design.
Chapters 17 & 18 Physical Database Design Methodology.
CSC271 Database Systems Lecture # 30.
Introduction to Accounting Information Systems
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
OnLine Analytical Processing (OLAP)
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 4th Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
© Pearson Education Limited, Chapter 15 Physical Database Design – Step 7 (Consider Introduction of Controlled Redundancy) Transparencies.
Object Persistence (Data Base) Design Chapter 13.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Functional Dependencies and Normalization for Relational Databases.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Methodology – Physical Database Design for Relational Databases.
File and Database Design Class 22. File and database design: 1. Choosing the storage format for each attribute from the logical data model. 2. Grouping.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Methodology – Monitoring and Tuning the Operational System.
1 On-Line Analytic Processing Warehousing Data Cubes.
Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)
Foundations of Business Intelligence: Databases and Information Management.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
Mapping E/R to RM, R. Ramakrishnan and J. Gehrke with Dr. Eick’s additions 1 Mapping E/R Diagrams to Relational Database Schemas Second Half of Chapter.
5 1 Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
1 Agenda TMA02 M876 Block 4. 2 Model of database development data requirements conceptual data model logical schema schema and database establishing requirements.
Chapter 13 The Data Warehouse
Methodology – Physical Database Design for Relational Databases
Methodology – Monitoring and Tuning the Operational System
Physical Database Design for Relational Databases Step 3 – Step 8
Methodology – Monitoring and Tuning the Operational System
NORMALIZATION FIRST NORMAL FORM (1NF):
Chapter 17 Designing Databases
Presentation transcript:

Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when specific queries occur frequently, a strict performance is required and it is not heavily updated -So, denormalize only when there is a very clear advantage to doing so and document carefully the reason for doing so

typical denormalization techniques (1)Flatten a repeating group in one table Instead of EMP (E#, Ename) SKILL (E#, Skill) UseEMP (E#, Skill, Ename) when Emp has a smaller # of attributes. - This means use Method 2 of 1NF algorithm. But know the danger of this method as we discussed in MVD.

Cont’ (2) Embed stable Code-Interpretation (Reference) Table. Instead of FLIGHT (F#, Departs, From_Code, To_Code) CODE (Code, Airport_Name) Use FLIGHT (F#, Departs, From_AP, From_Code, To_AP, To_Code)

Cont’ Combine1:1 or 1:N (a) when N is small and (b) the record on the "one" side is small (thus the amount of redundancy will be small) Instead of SALE (S#, SPName, SaleDate), SALE_ITEMS (S#, Line#, Code, Qty) Use SALE(S#, Line#, SPName, SaleDate, Code, Qty) -- "How many T179's did we sell yeaterday?" can be answered without join. Another example: Order_Item(O#, I#, C#, Cname, I_Desc, Qty, I_Price)

Cont’ (4) When the other entity in is not interesting by itself Order(O#, ODate, OShipTerms, PmtTerms, Cname, CAddr) (5) Replicate non-frequently updated attributes to avoid JOIN WORK_ON (ESSN, P_NUM, PName, Hours)

Problems of denormalization Makes row longer Makes data transfer longer Needs more memory for memory processing Cause redundancy and expensive update

Adding redundant data - Add summary attributes or derived attributes - Redundant relationships can improve performance with the cost of update overhead

Schema translation Reduce #of relations for JOIN by using mapped translation Handling null values Combine 1:1 relationships Relax participation constraints Divide the big table into two, if A & B are distinct in R(A, B) Ignore FDs based on co-occurring attributes, which are not updated ZIP --> CITY

Primary key - Most frequently used attributes - Prefer small sized attributes (used in indexes, Ref. integrity)

Index - Create a set of appropriate indexes optimzing queries (This will be discussed more in physical DB chapters.)

Denormalization Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for On Line Analytical Processing (OLAP). OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read only" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate Business Intelligence applications.

Denormalization Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during ETL processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. Denormalization is also used to improve performance on smaller computers as in computerized cash- registers. Since these use the data for look-up only (e.g. price lookups), no changes are to be made to the data and a swift response is crucial.