Class Agenda: 02/13/2014 Review Goals of assignments.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Designing the Data Warehouse and Data Mart Methodologies and Techniques.
Ch1: File Systems and Databases Hachim Haddouti
13 Chapter 13 The Data Warehouse Hachim Haddouti.
BUSINESS DRIVEN TECHNOLOGY
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Business Intelligence
1 Class Agenda: 03/13 – 3/15  Review Database design – core concepts Review design for ERD Scenarios #3 & #4 Review concepts of normalization. Do practice.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Data Warehouse & Data Mining
1 Brett Hanes 30 March 2007 Data Warehousing & Business Intelligence 30 March 2007 Brett Hanes.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Data Warehouses BUAD/American University Data Warehouses.
Storing Organizational Information - Databases
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Chapter 9: data warehousing
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Dimensional Modeling Primer Chapter 1 Kimball & Ross.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
UNIT-II Principles of dimensional modeling
Building a Data Warehouse for Business Reporting Presented by – Arpit Desai Faculty Advisor – Dr. Meiliu Lu CSC Department – Spring 2006 California State.
DATA RESOURCE MANAGEMENT
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
University of Nevada, Reno Organizational Data Design Architecture 1 Agenda for Class: 02/06/2014  Recap current status. Explain structure of assignments.
 Definition of terms  Reasons for need of data warehousing  Describe three levels of data warehouse architectures  Describe two components of star.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse—Subject‐Oriented
Data Warehouse.
Data Warehouse and OLAP
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehouse and OLAP
Presentation transcript:

Class Agenda: 02/13/2014 Review Goals of assignments. Technology: SQL Server, Tableau Internal Data Project Questions about assignments Discuss process of data warehouse design Discuss issues in data warehouse design Contrast different approaches to data warehouse design Design a data warehouse

Goals for data warehouse design Make complete and accurate information easily accessible. Present information consistently. Be adaptive and flexible to change. Provide reasonable and expected performance for information to support decision making. Protect/secure information.

How do we achieve those goals? Use systems analysis and design techniques. Have domain knowledge of required decision support systems. Model the data in a variety of different forms. Appropriate use (or non-use) of normalization. Use an appropriate DBMS for implementation.

Three different “general” data models Transaction (operational) data model: Contains current data required by separate and/or integrated operational systems. Supports the transactional processing of the organization. Is frequently used to support day-to-day decision making. 3rd normal form. Does not usually contain external data. Reconciled (enterprise data warehouse) data model: Contains detailed, current data intended to be the single, authoritative source for all decision support applications. Usually in 3rd normal form. May contain data generated externally from the organization. Derived (data mart) data model: Contains data that are selected, formatted and aggregated for end-user decision support applications. Star or snowflake schema. May not be normalized. May contain data generated externally from the organization.

Reconciled and Derived Data Models Reconciled (EDW) Independent of specific decisions Centralized control; usually owned by IT Historical Not usually summarized Normalized Flexible Many data sources Long life Starts large, becomes larger Derived (Data Mart) Specific decisions One central subject Usually accessed directly by users; usually decentralized into user area Closely defined subject area Detailed and/or summarized Usually denormalized Restrictive – few sources Short life span Starts small, becomes large

Two general approaches to design Enterprise Data Warehouse (Bill Inmon) Focus is on enterprise subjects that will be needed to support comprehensive decision making. Emphasis on creating design that is consistent among subject areas. Implementation is of a data mart. Uses ERD for modeling. Relies on comprehensive blueprint for interrelation of data. Interrelated Data Marts (Ralph Kimball) Focus is on business subject area for data warehouse. Emphasis on creating simple design that can be implemented quickly. Implementation is of a data mart. Uses “dimensional model” for modeling. Kind of like an ERD with UML-type aspects. Relies on consistent interrelation of data by integration of existing data models.

Compare/Contrast Approaches Similarities: Both focus on subject areas for development of data model. Both require extensive input from data warehouse stakeholders. Both produce a subject-oriented, non-volatile, time-related data warehouse. Both try to quickly implement a prototype data mart. Differences: Inmon creates a more integrated and consistent data warehouse by attempting to design an enterprise-wide warehouse at the beginning of the first data warehouse project. This is called a “reconciled” DW design. Kimball relies on future project teams referencing existing data warehouse models for new projects.

What do both approaches yield? A design for a data mart. The design for a data mart is based on the concept of a data warehouse “cube.” A cube is a logical construct containing a “fact” table that is accessed on multiple “dimension” tables. A fact table contains values that a manager uses to make decisions. A dimension table is used as a reference for the values in the fact table.

Process of data warehouse design Identify the stakeholders that need data to support their decisions. Define and describe the data needs of those stakeholders. Define the subject area. Choose (EDW and data mart) or just data mart, or some combination thereof. Select the data of interest. May be internal, external. May be purchased. May be stored in a transaction database – may not. May be generated just for the data warehouse. Identify the dimensions (master data/strong entities). Add element of time. Determine granularity level. Identify the fact data. Add derived data if necessary or desired.

How do you identify those people within an organization who require data to support their decision making processes?

Define and describe the data needs Usually termed “stakeholder analysis”. Differing levels of decision making require differing sets of data. Internal vs. external data. Integrated vs. non-integrated data. Detailed vs. summarized data. Different stakeholders require different access mechanisms. Online vs. reports. Pre-formatted vs. ad-hoc availability of data. Different stakeholders require different timing. Online, real time vs. delay. Relative size of delay/timeliness is always an issue.

Stakeholder Analysis Table Example – Replica Toys Decision Making Responsibilities Existing Information? Additional Information? Availability of Additional Information? Marketing Analyst Decide what features are most valuable to which customers. Determine trends in toy purchases. No data related to features currently available. Customer order data by distribution outlet. Features selected by customers. Purchases by toy by customer. Not in existing system and cannot be compiled manually. Maybe telephone survey? Maybe registration system? Distribution Manager Determine trends in use of distribution outlets. Determine distribution outlet profitability. Purchases by toy by customer by distribution outlet. Purchase price by toy by customer by distribution outlet. Need customer order data with more specific parameters. See if available in customer order system. Quality control specialist Evaluate comparative defects of toys within and across product lines. Support call data. Product return data. Detailed problem reports including date, toy, problem, extent of damage. Not available in current support call and product return systems. Could be added. Development engineer Evaluate relative safety issues with existing product line. Determine potential safety issues with new product development. Safety test data. Detailed problem reports including date, toy, problem, injury, relative impact of injury, potential responsibility. Engineering safety test data is available.

Define the subject area Potential subject areas in common to many businesses: Customers: people and organizations who acquire and/or use the company’s products. Equipment: Machinery, devices, tools and their components. Facilities: Real estate and their components. Sales: Transactions that move a product from company to a customer. Suppliers: Entities that provide a company with goods and services. Products: Goods and services that the company, or its competitors, provide to customers. Materials: Goods and services that the company uses to produce its products. Financials: Information about money that is received, retained, expended, invested or in any way tracked by the company. Human resources: Individuals who perform work for the company – may be employees, contracts, or simply positions.

Select the data of interest Use the existing transaction database model. Identify and understand the necessary business decisions. Identify external data that could help support decisions. Use tables to help sort available attributes.

Data Attributes Required to Inform the Decision Decision: Which toys will sell best next year? in three years? Sample Informational Questions that might help answer Decision Question Data Attributes Required to Inform the Decision Additional Systems/Processes that could be used to Create and/or Access Data  

Additional Data Attributes Required Potential Data Sources Existing or New? Potential Data Sources Data Costs Data Ownership  

Transform operational data to DW Transient vs. Periodic Data Transient: Data in which changes to existing records are written over previous records, thus destroying the previous data content. (Type 1 change) Most transaction systems are based on transient data. Most data warehouses avoid transient data. Periodic: Data that are never physically altered or deleted once they have been added to the data store. (Type 2 change) Most data warehouses are based on periodic data.

Data warehouse Periodic Data Fact vs. dimension A “fact” is a numeric measure. Replica example: A registration is a “fact” along with the price that was paid for the purchase that spawned the registration. Facts are “weak entities” Facts are usually transactions A “dimension” is reference information that relates to the fact. Replica examples: customer, product model, feature, place of purchase. Dimensions are “strong entities” Dimensions are also considered the “master data” of an organization

Dimensions are different in DW-land Slowly changing dimension: Dimension will change values over time. How to maintain knowledge of the past Approaches: Type 1: just replace old data with new (lose historical data) Type 2: for each changing attribute, create a current value field and several old-valued fields (multivalued) Type 3: create a new dimension table row each time the dimension object changes, with all dimension characteristics at the time of change. Most common approach.

Other dimensional issues Degenerative dimension: A dimension that has no interesting dimension attributes (e.g. serial number) Multi-valued dimension: A dimension that needs to be qualified by a set of values (e.g. feature) May have a related hierarchy Example: group-> category -> family -> product

Dimensions can be hierarchical

Dimensions are usually normalized

Conformed Dimensions for growth Conformed dimension: One or more dimension tables associated with two or more fact tables. Dimensions must have the same meaning for all related fact tables. Very hard to achieve without good planning. Goal of any data warehouse is to plan the dimensions so that they span business processes/decision areas. Enhances consistency of facts. Allows integration of diverse systems. Helps a designer to create data warehouse systems incrementally.

A Bus Matrix to help plan for Conformed Dimensions Business Process or Decision Issue Date Cust-omer Product Model Purchase Place Emp-loyee Registering a toy X Accepting a return Accepting a complaint Marketing toys to distributors Accepting an order from a distributor

Time is a dimension Data warehouse is a historical model rather than a current “point in time” model. Must have a way to incorporate changes that occur over time. Important issues: Fact table must include a time component. Ranges of time vs. effective period in time Time also relates to dimension tables May have to deal with differing time periods. Examples are fiscal years, “holiday rush,” billing cycle, etc.

Time is complex

Fact tables Measures: Sale Flag Quantity Can have a “factless” fact table

Determine granularity level What are the benefits and drawbacks of a low level of granularity? What are the benefits and drawbacks of a high level of granularity? What factors should be considered when determining the level of granularity in the data warehouse?

Might have to “derive” facts Derived data includes any kind of calculated field. Usually derive facts when there will be an overwhelming amount of data if not derived. Examples: total sales; net sales amount; total funds raised; total cost of products. Issues: Must be identified, defined and agreed upon by data warehouse stakeholders. Must be documented in metadata. Must be consistent.