Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling.

Similar presentations


Presentation on theme: "COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling."— Presentation transcript:

1 COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling

2 Some consequences of normalization Data redundancy is reduced or eliminated. Relations are broken into smaller, related tables. Using all the attributes from the original relation requires joining these smaller tables.

3 Denormalization Deliberately reintroducing some redundancy, so that we can access data faster. DENORMALIZED DATA AHEAD

4 Exampl e Technique: Add duplicate fields Technique: Add computed fields

5 Example Technique: Join tables

6 Less common denormalization techniques Duplicating a commonly-used subset of table fields Splitting some table rows into different tables Frequently- vs. rarely-used data Data for different regions Common subclasses of data

7 When to denormalize? Typically used when some or all of the following apply: Many queries need to join the data Joining the data is expensive – uses scans, rather than indices Computing derived data is expensive – complex queries or complex functions

8 Dealing with redundant data Still want data consistency, but now it requires work. Is full consistency required at all times? Techniques: Stored procedures act as API for updating DB. They add the redundant data. Triggers check (and fix?) consistency. Application code carefully maintains consistency during updates. Reconcile data as background process. Reconcile data during system maintenance.

9 Dimensional modeling In a tiny, brief nutshell

10 An alternative to ER modeling Only a brief overview. So, we’ll view it through our lens of ER modeling + denormalization. Emphasizes decision making & use of historical data Fast retrieval & aggregation of data Less concern with updating data & maintaining consistency while updating

11 DB design often resembles multiple starflakes Course(crn, …) Student(sid, …, collegeid, …, home_stateid) Enrollment(crn, sid, …) College(collegeid, …) State(stateid, …, country) Instructor(instr_id, …) Teach(crn, instr_id, …) Starflake = tree of junction table & child tables in 1-to-many relationships Typical: Few cycles. Super/sub classes implemented only with superclass table.

12 Student(sid, …, college, …, home_state) Starflakes can be compressed to stars Course(crn, …) Enrollment(crn, sid, …) Instructor(instr_id, …) Teach(crn, instr_id, …) Denormalize by joining each child’s tree. Star = junction table & one level of child tables in 1-to-many relationships

13 Fact & dimension tables Facts: The junction tables are the most important data – the facts E.g.: store purchases, class enrollments, click data Generally the largest tables Key data often numeric & additive – e.g., quantity bought, cost per unit, advertisement views Dimensions: The child tables in 1-to-many relationship with facts E.g., stores, customers, sales people, sales period

14 Dimensional modeling process Centered on identifying the business model Identifying the potential queries Identifying the facts – the data used in such queries Each query should use only one fact table & its dimension tables. Possibly start with an ER model Identify which junction tables serve as fact tables Use only surrogate keys Often add time dimension Denormalize into starflake or star schema

15 Order order_id customer_id store_id clerk_id date Store store_id store_name address district floor_type Clerk clerk_id clerk_name clerk_grade Product sku description brand category Customer customer_ID customer_name purchase_profile credit_profile address Promotion promotion_id promotion_name price_type ad_type OrderLine order_id sku promotion_id dollars_sold units_sold dollars_cost ER model

16 Time time_key SQL_date day_of_week month Store store_key store_id store_name address district floor_type Clerk clerk_key clerk_id clerk_name clerk_grade Product product_key sku description brand category Customer customer_key customer_name purchase_profile credit_profile address Promotion promotion_key promotion_name price_type ad_type Order time_key store_key clerk_key product_key customer_key promotion_key dollars_sold units_sold dollars_cost Dimensional model

17 Each dimension table represents a dimension of the cube. Facts are the data in the cube. Facts might be pre- aggregated along each dimension or combination of dimensions.

18 Sometimes things are still messy Not all data fit nicely into facts + 1-to-many dimensions. Leads to exceptions from this simple presentation.


Download ppt "COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling."

Similar presentations


Ads by Google