© 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Data Warehousing II: Extract, Transform,

Slides:



Advertisements
Similar presentations
Chapter 11: Data Warehousing
Advertisements

Dimensional Modeling.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
C6 Databases.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Chapter 13 The Data Warehouse.
Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified.
Chapter 10: data Quality and Integration
Dr. Chen, Data Base Management Chapter 10: Data Quality and Integration Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 11: Data Warehousing
© 2007 by Prentice Hall 1 Chapter 11: Data Warehousing Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Chapter 1: Data Warehousing
Chapter 4 Data Warehousing.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Data Warehousing.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 10: Data Quality and Integration Modern Database Management 10 th Edition Jeffrey.
Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Business Intelligence Tools and Techniques Robert Monroe March.
Chapter 9: data warehousing
MBA 664 Database Management Systems Dave Salisbury ( )
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Concepts and Terminology Introduction to Database.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Data Warehouses BUAD/American University Data Warehouses.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
1 Data Warehousing. 2Definition Data Warehouse Data Warehouse: – A subject-oriented, integrated, time-variant, non- updatable collection of data used.
Data Warehousing.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Chapter 9: data warehousing
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Warehousing BI Tools and Techniques.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
7 Strategies for Extracting, Transforming, and Loading.
Data Warehousing.
1 © Prentice Hall, 2002 ITD1312 Database Principles Chapter 4B: Logical Design for Relational Systems -- Transforming ER Diagrams into Relations Modern.
Carnegie Mellon University © Robert T. Monroe Management Information Systems Data Warehousing Management Information Systems Robert.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
© 2013 Pearson Education, Inc. Publishing as Prentice Hall 1 CHAPTER 10: DATA QUALITY AND INTEGRATION Modern Database Management 11 th Edition Jeffrey.
 Definition of terms  Reasons for need of data warehousing  Describe three levels of data warehouse architectures  Describe two components of star.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 14: Data Warehousing Modern Database Management 9 th Edition Jeffrey A. Hoffer, Mary.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Lecture 12: Data Quality and Integration
Managing Data Resources File Organization and databases for business information systems.
1 HCMC UT, 2008 Data Warehousing 1.Basic Concepts of data warehousing 2.Data warehouse architectures 3.Some characteristics of data warehouse data 4.The.
Chapter 5: Logical Database Design and the Relational Model
Chapter 13 The Data Warehouse
Summarized from various resources Modern Database Management
Chapter 11: Data Warehousing
Data Warehouse.
Competing on Analytics II
Basic Concepts in Data Management
Data Warehousing Concepts
Presentation transcript:

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Warehousing II: Extract, Transform, and Load (ETL) BI Tools and Techniques Robert Monroe March 27, 2008

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Goals Provide a quick review of fundamental relational database design principles Understand key stages and challenges of ETL processing –Data reconciliation and cleansing –Data derivation Understand how to create dimensional models (star schemas) and why they are useful in data warehousing

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Quick Review: Relational Database Principles

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques The Relational Data Model The Relational Model has become the de-facto standard for managing operational business data Core concepts in a relational model: –Tables (relations) –Records (rows) –Data fields (columns) –Primary keys –Foreign keys Products Product IDDescriptionColorSizeQty Available 52Shoes (pair)Blue Socks (pair)WhiteLarge BlouseGreen PantsBlue32/340

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data, Information, Database Example Purchases Order IDCustomer NameProduct IDQuantityDate 5623Jimmy Hwang52312/15/ Sue Smith64512/16/ Jane Chen145112/16/2004 Products Product IDDescriptionColorSizeQty Available 52Shoes (pair)Blue Socks (pair)WhiteLarge BlouseGreen PantsBlue32/340 Jimmy Hwang purchased 3 pairs of size 10 shoes on 12/15/2004 What other information can we derive from these data tables? Data in Database Tables Information

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Relational Data, Tables, Records, and Metadata Example Purchases Order IDCustomer NameProduct IDQuantityDate 5623Jimmy Hwang52312/15/ Sue Smith64512/16/ Jane Chen145112/16/2004 Products Product IDDescriptionColorSizeQty Available 52Shoes (pair)Blue Socks (pair)WhiteLarge BlouseGreen PantsBlue32/340 Table Name: Products ProductID Int (pkey) Description Text(50) Color Text(50) SizeText(20) QtyAvailableInt Table Name: Purchases OrderIDInt (pkey) CustomerNameText(75) ProductIDInt (fkey) QuantityDecimal DateDateTime Data (Records) in Database Tables Metadata

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Normalization And Denormalization Data normalization is the process of decomposing relations with anomalies to produce smaller, well-structured relations –Basic idea: each table only holds data about one ‘thing’ Goals of normalization include: –Minimize data redundancy –Simplifying the enforcement of referential integrity constraints –Simplify data maintenance (inserts, updates, deletes) –Improve representation model to match “the real world” Normalization sometimes hurts query performance

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Example: Denormalized Table Insertion anomaly: when an employee takes a new class we need to add duplicate data (Name, Dept_Name, and Salary) Deletion anomaly: If we remove employee 140, we lose information about the existence of a Tax Acc class Modification anomaly: Employee 100 salary increase forces update of multiple records These anomalies exist because there are two themes (entity types) into one relation – course and employee, resulting in duplication, and an unnecessary dependency between the entities Employee Emp_IDNameDept_NameSalaryCourse_TitleDate_Completed 100Margaret SimpsonMarketing48000SPSS6/19/ Margaret SimpsonMarketing48000Surveys10/7/ Alan BeetonAccounting52000Tax Acc12/8/ Chris LuceroInfo Systems43000SPSS1/12/ Chris LuceroInfo Systems43000C++4/22/ Lorenzo DavisFinance Susan MartinMarketing42000Java8/12/ Susan MartinMarketing42000SPSS6/19/2005 Example Derived from Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Normalizing Previous Employee/Class Table Course_Completion Emp_IDCourse_IDDate_Completed 10016/19/ /7/ /8/ /12/ /22/ /19/ /12/2002 Employee Emp_IDNameDept_NameSalary 100Margaret SimpsonMarketing Alan BeetonAccounting Chris Lucero Lorenzo DavisFinance Susan MartinMarketing42000 Course Course_IDCourse_Title 1SPSS 2Surveys 3Tax Acc 4C++ 5Java This seems more complicated Why might this approach be superior to the previous one?

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Indexing An index is a table or other data structure used to determine the location of rows in a file that satisfy some condition Indices reduce the time needed to retrieve records … but increase the time and cost to insert, update, or delete Indexing is critical for high performance in large, complex db’s, –Especially data warehouses and data marts Products Product IDDescriptionColorSize 52Shoes (pair)Blue10 145Socks (pair)WhiteLarge 62BlouseGreen7 12PantsBlue32/34 532SkirtGreen7 ………… Product_Index Product IDRow ……

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Alternative Data Models The relational data model is the current de-facto standard for storing and managing corporate data There are other data storage models, usually associated with legacy systems –The data you need for your analysis may be stored in them! Four common alternative data models –Flat file –Hierarchical –Network –Object

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Extract, Transform, and Load (ETL)

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Quick Review: Op. Systems Feed Analytic Systems Informational systems get their data from operational databases This process generally requires significant processing (transformation) of the data stored in operational databases This process is commonly known to as ETL –Extract, Transform, and Load (ETL)

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques The ETL Process The process of creating analytic data stores from operational data stores is commonly described as the Extract, Transform, and Load process, or ETL There are four basic steps to ETL –Capture/Extract source data –Cleanse (scrub) –Transform –Load and Index

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques The Three-Layer Data Architecture Data goes through three common stages during ETL Operational Data –transactional data stored in individual systems of record throughout the organization Reconciled Data –detailed, current data intended to be the single, authoritative source for all decision support applications Derived Data –data that have been selected, formatted, and aggregated for end-user decision support applications Operational Data Reconciled Data Derived Data

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Reconciling and Deriving Data Reconcile Data Derive Data Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques In-Class Exercise: ETL Form teams of 2-3 people Complete exercise 1 on handout

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Profiling First step: understand your source data –What is available? What is missing? –What is ‘good’ quality data? What is of questionable quality? –Data volumes, frequency, sparseness –Embedded business rules –Obvious (and subtle) data conflicts Ranges and formats Cardinality and uniqueness Key collisions This is a long, and often painful process that can require a lot of meticulous effort - budget and plan accordingly!

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Reconciling and Deriving Data Reconcile Data Derive Data Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Characteristics: Status vs. Event Data Status Event: a database action (create/update/delete) that results from a transaction Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Characteristics: Transient vs. Periodic Data Transient data: –Changes to existing records are written over previous records, thus destroying the previous data content Periodic data: –Never physically altered or deleted once they have been added to the store

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Reconciliation Typical operational data is: –Transient – not historical –Not always normalized (perhaps due to denormalization for performance) –Restricted in scope – not comprehensive –Sometimes poor quality – inconsistencies and errors After reconciliation, data should be: –Detailed – not summarized yet –Historical – periodic –Normalized – 3rd normal form or higher –Comprehensive – enterprise-wide perspective –Timely – data should be current enough to assist decision-making –Quality controlled – accurate with full integrity Operational Data Reconciled Data Derived Data

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Static extract Static extract: capturing a snapshot of the source data at a point in time Incremental extract Incremental extract: capturing changes that have occurred since the last static extract Capture/Extract: obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Data Reconciliation: Capture/Extract Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Extract Challenges / Issues What data should be extracted, and from where? How should it be extracted? How frequently should it be extracted?

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Fixing errors: Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data Scrub/Cleanse: Use pattern recognition and AI techniques to upgrade data quality Rule of thumb: Automate where possible! Data Reconciliation: Scrub/Cleanse Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Common Data Cleansing Tasks Suppliers Supplier_IDSupplier NameContact Name 5623International Business MachinesJoe Smith 14534IBMJim Hwang qwq77dfsIntl. Business MachinesSusan Chen Supplier_Orders_US Order_IDItemQuantity_Tons 44253Salt Salt250 Quick exercise: How many suppliers are listed in this table? Quick exercise: how many pounds of salt were purchased? Supplier_Orders_Europe Order_IDItemQuantity 44253RoadSalt25 Truckloads 14534TableSalt500 Cases ???

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Common Data Cleansing Tasks Reconciling mismatched data fields across source databases –E.g. CompanyName field in db1 = Comp_Name field in db2 Finding or fixing missing data or data fields –Database 1 records “region” as part of address, database 2 does not Mismatched data types –Zip stored as a string in on source database and as an integer in another Converting between different units of measure –Kilograms in european divisions database, pounds in US database Resolving primary key collisions

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Quality Goal of cleansing stage is to improve data quality Common dimensions for measuring data quality: –Accuracy –Completeness –Consistency –Currency/Timeliness [Los03] Why is it so hard to achieve (and maintain) a high level of data quality in a data warehouse?

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Record-level transformation: Selection – data partitioning Joining – data combining Aggregation – data summarization Transform: convert data from format of operational system to format of data warehouse Data Reconciliation: Transform Field-level transformation: single-field – from one field to one field multi-field – from many fields to one, or one field to many Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Transform Examples: Single Field Transform General transformation: –Directly maps and transforms individual fields in the source record directly to individual fields in the target record Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Transform Examples: Single Field Transform Algorithmic transformation: –Uses a formula or logical expression to map and transforms individual fields in the source record directly to individual fields in the target record Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Transform Examples: Single Field Transform Table look-up transformation: –Uses a separate table, keyed by source-code records to map and transforms individual fields in the source record directly to individual fields in the target record Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Transform Examples: Multi-Field Transform M:1 maps many source fields to one target field transformation: Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Transform Examples: Multi-Field Transform 1:M maps and transforms one source field to many target fields Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Surrogate Keys Reconciled data tables should use surrogate keys –Surrogate keys are not business related –Surrogate keys are independent of operational store’s primary keys Surrogate keys are important because: –Avoid primary key collisions –Primary keys may change over time in source system –Ability to properly track changes over time –Consistency of key length/format/type

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Refresh mode: Refresh mode: bulk rewriting of target data at periodic intervals Load/Index: place transformed data into the warehouse and create indexes Data Reconciliation: Load and Index Update mode: Update mode: only changes in source data are written to data warehouse Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Reconciliation Recap After load/index, data reconciliation should be complete After reconciliation, data should be: –Detailed – not summarized yet –Historical – periodic –Comprehensive – enterprise-wide perspective –Timely – data is current enough to assist decision-making –Quality controlled – accurate with full integrity Operational Data Reconciled Data Derived Data

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques ETL Issue: Frequency Of Data Updates How should an organization decide the frequency of updates from operational databases to data warehouses/marts? What are the benefits and costs of frequent loads? What are the benefits and costs of infrequent loads?

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Derived Data

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Quick Review: Typical Data Warehouse Structure Reconcile Data Derive Data Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Derived Data Although reconciled data provides a consistent, hiqh-quality collection of enterprise data it is not necessarily in an efficient form for use by BI tools Derived data objectives: –Ease of use for decision support applications –Fast response to predefined user queries –Customized data for particular target audiences –Ad-hoc query support –Data mining capabilities Characteristics –Detailed (mostly periodic) data –Aggregated (for summary) –Processed –Distributed (to data marts) Operational Data Reconciled Data Derived Data

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Dimensional Modeling: Facts and Dimensions Dimensional Modeling –a simple database design in which dimensional data are separated from fact or event data. Dimensional models are also sometimes called star schemas. Dimensional models are a common way to represent derived data for informational data stores –Well suited to ad-hoc queries and OLAP –Poorly suited for transaction processing –Commonly used for data warehouse/mart storage model

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Star Schema Structure Dimension tables contain descriptions about the subjects of the business Fact tables contain factual or quantitative data Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Star Schema Example Fact table provides statistics for sales broken down by product, period and store dimensions Dimension tables provides details on stores, products, and time periods Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Star Schema Example With Data Product Period Store Sales Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7 th ed.

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Dimensional Model Benefits Simple and predictable framework –Well suited to ad-hoc analytical queries –Relatively straightforward mapping from most transactional systems Dimensional independence –Query performance is somewhat independent of dimensions used in the query Straightforward model extensions support evolution

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques ETL Issue: Fact Table Granularity One of the biggest challenges in designing an effective star schema is deciding on the granularity of the fact data Transactional grain – finest level Aggregated grain – more summarized –Finer grains provide More detailed analysis capability More dimension tables, more rows in fact table (much larger storage) Allow better “drill-down” capabilities Rule of thumb: use the smallest granularity of fact data that is possible given your technical, storage, and computational constraints

© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques In-Class Exercise: Dimensional Modeling Form teams of 2-3 people Complete exercise 2, question #1 on handout –Build a star schema to store grades at Millenium College