Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehousing M R BRAHMAM.
Dimensional Modeling Business Intelligence Solutions.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 3 Database Management
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) The Data Warehouse Lifecycle Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
CS346: Advanced Databases
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Data Warehouse & Data Mining
Data Warehouse Concepts Transparencies
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Data Warehousing.
Module 1: Introduction to Data Warehousing and OLAP
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Best Practices in Higher Education Student Data Warehousing Forum Northwestern University October 21-22, 2003 FIRST QUESTIONS Emily Thomas Stony Brook.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Dimensional Modeling Primer Chapter 1 Kimball & Ross.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Data Mining Data Warehouses.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Advanced Database Concepts
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Intro to MIS – MGS351 Databases and Data Warehouses
Advanced Applied IT for Business 2
Decision Support System by Simulation Model (Ajarn Chat Chuchuen)
Data warehouse and OLAP
Data Warehouse.
Databases and Data Warehouses Chapter 3
المحاضرة 4 : مستودعات البيانات (Data warehouse)
Data Warehouse and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
An Introduction to Data Warehousing
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Introduction of Week 9 Return assignment 5-2
Data Warehousing Concepts
Data Warehouse and OLAP
Presentation transcript:

Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1

Introduction: n Definitions –Legacy Systems –Dimensions –Data Dependencies Model –Dimensional Model

An ER Model Ship Type Shipper District Credit Order Item Ship To Product Contact Locat. Product Line Sales Order Cust. Locat. Product Group ContractContract Type Customer Sales Rep Sales District Sales Region Sales Division Contact

A Dimensional Model Product Market Time

Why Data Warehouses? n To meet the long sought after goal of providing the user with more flexible data bases containing data that can be accessed “every which way.”

OLTP vs. OLAP n OLTP (Online transaction processing) has been the standard reason for IS and DP for the last thirty years. Most legacy systems are quite good at capturing data but do not facilitate data access. n OLAP (Online analytical processing) is a set of procedures for defining and using a dimension framework for decision support

The Goals for and Characteristics of a DW n Make organizational data accessible n Facilitate consistency n Adaptable and yet resilient to change n Secure and reliable n Designed with a focus on supporting decision making

The Goals for and Characteristics of a DW n Generate an environment in which data can be sliced and diced in multiple ways n It is more than data, it is a set of tools to query, analyze, and present information n The DW is the place where operational data is published (cleaned up, assembled, etc.)

Basic elements of the data warehouse Services: Clean, combine, and standardize Conform Dimensions No user query services Data Store: Flat files and relational tables Processing: Sorting and sequential processing Data Staging Area Data Mart #1 Dimensional Atomic and summary data Based on a single business process Data Mart #2 Similar design DW Bus: Conformed facts and dimensions Ad hoc query tools Report Writers Analytical Applications Modeling: Forecasting Scoring Data Mining Extract Load Access Operational Source Systems Data Presentation Area Data Access Tools

Data Staging Area n Extract-Transformation-Load –Extract: Reading the source data and copying the data to the staging area –Transformation: Cleaning Combining Duplicating Assigning keys –Load: present data to the bulk loading facilities of the data mart

Organization of data in the presentation area of the data warehouse n Data in the warehouse are dimensional, not normalized relations –However, data that are ultimately presented in the data warehouse will often be derived directly from relational DBs n Data should be atomic someplace in the warehouse; even if the presentation is aggregate n Uses the bus architecture to support a decentralized set of data marts

Updates to a data warehouse n For many years, the dogma stated that data warehouses are never updated. n This is unrealistic since labels, titles, etc. change. n Some components will, therefore, be changed; albeit, via a managed load (as opposed to transactional updates)

Dimensional Modeling Terms and Concepts n Fact table n Dimension tables

Fact Tables n Fact table: a table in the data warehouse that contains –Numerical performance measures –Foreign keys that tie the fact table to the dimension tables

Fact Tables n Each row records a measurement describing a transaction –Where? –When? –Who? –How much? –How many? n The level of detail represented by this data is referred to as the grain of the data warehouse –Questions can only be asked down to a level corresponding with the grain of the data warehouse

Fact Tables n Fact tables contain numeric data that can be one of three types –Additive –Semi-additive –Non-additive n Fact tables contain foreign keys –A group of foreign keys will be used to create a concatenated primary key n Fact tables generally don’t contain textual data

Dimension tables n Tables containing textual descriptors of the business –Dimension tables are usually wide (e.g., 100 columns) –Dimension tables are usually shallow (100s of thousand or a few million rows) –Values in the dimensions usually provide Constraints on queries (e.g., view customer by region) Report headings

Dimension tables n The quality of the dimensions will determine the quality of the data warehouse; that is, the DW is only as good as its dimension attributes n Dimensions are often split into hierarchical branches (i.e., snowflakes) because of the hierarchical nature of organizations –Product part  Product  Brand n Dimensions are usually highly denormalized

Dimension tables n The dimension attributes define the constraints for the DW. Without good dimensions, it becomes difficult to narrow down on a solution when the DW is used for decision support

Bringing together facts and dimensions – Building the dimensional Model n Start with the normalized ER Model n Group the ER diagram components into segments based on common business processes and model each as a unit n Find M:M relationships in the model with numeric and additive non-key facts and include them in a fact table n Denormalize the other tables as needed and designate one field as a primary key

A Dimensional Model time_key day_of_Week month quarter year holiday_flag time_key product_key store_key dollars_sold units_sold dollars_cost product_key description brand category store_key store_name address floor_plan_type Time Dimension Sales Fact Product Dimension Store Dimension

So, What is a DW? n A data warehouse is a subject-oriented, integrated, non-volatile, and time-variant collection of data in support of management’s decisions W.H. Inmon (the father of DW)

Subject Oriented n Data in a data warehouse are organized around the major subjects of the organization

Integrated n Data from multiple sources are standardized (scrubbed, cleansed, etc.) and brought into one environment

Non-Volatile n Once added to the DW, data are not changed (barring the existence of major errors)

Time Variant n The DW captures data at a specific moment, thus, it is a snap-shot view of the organization at that moment in time. As these snap-shots accumulate, the analyst is able to examine the organization over time (a time series!) n The snap-shot is called a production data extract