Www.company.com Lab3 CPIT 440 Data Mining and Warehouse.

Slides:



Advertisements
Similar presentations
Chapter 4 Tutorial.
Advertisements

Chapter 4 Tutorial.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Nguyen Ngoc Tuan – Le Nguyen Duy Vu /24/
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Dimensional Modeling Business Intelligence Solutions.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 Lecture 10: More OLAP - Dimensional modeling
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 29 Overview of Data Warehousing and OLAP.
By N.Gopinath AP/CSE. Two common multi-dimensional schemas are 1. Star schema: Consists of a fact table with a single table for each dimension 2. Snowflake.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DATA WAREHOUSE (Muscat, Oman).
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Chapter 4 Tutorial.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Business Intelligence Instructor: Bajuna Salehe Web:
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Chetan Bhirud Raza Mohammad Abinash Sahoo Online Marketing Giant.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Module 1: Introduction to Data Warehousing and OLAP
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
SHIFALI CHOUBEY GISE LAB IITB Decision Support System For Farmers.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Data Mining Data Warehouses.
A POWER OF OLAP TECHNOLOGY National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Advanced Database Concepts
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 6 The Data Warehouse Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
1 Online Analytical Processing (OLAP) Anjali Gupta Mithun Arora Aameek Singh Kranthi Kumar.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Databases & Data Warehouses
Data warehouse Design Using Oracle
Data Warehouse and OLAP
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Introduction of Week 9 Return assignment 5-2
Data Warehouse and OLAP
Presentation transcript:

Lab3 CPIT 440 Data Mining and Warehouse

Lab3: Outlines Introduction to Data Warehouse –What is Data Warehouse ? –Difference between Data Warehouse and Database Introduction to OLAP operations –Introduction to cubes –Cube structure –OLAP Operations Exercises CPIT 440 Data Mining and Warehouse

Data Warehouse What is Data Warehouse ? –A data warehouse is a repository of an organization's stored data that is designed for query and analysis rather than for transaction processing to facilitate reporting and analysis. –It usually contains historical data derived from transaction data, but it can include data from other sources. –It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. CPIT 440 Data Mining and Warehouse

Data Warehouse CPIT 440 Data Mining and Warehouse

Difference between Data Warehouse and Database A question we often asks out in the field is: I already have a database, so why do I need a data warehouse ? What is the difference between a database vs. a data warehouse? CPIT 440 Data Mining and Warehouse DatabaseData Warehouse Designed to handle transactions It is structured to make analytics fast and easy. It isn’t designed to handle and do analytics well. It exists as a layer on top of another database or databases, and takes the data from all these databases and creates a layer optimized for and dedicated to analytics.

Introduction to OLAP Operations Introduction to cubes: –A cube is a set of data that is usually constructed from a subset of a data warehouse and is organized and summarized into a multidimensional structure defined by a set of dimensions and measures. –Cubes are the main objects in online analytic processing (OLAP), –It is a technology that provides fast access to data in a data warehouse. CPIT 440 Data Mining and Warehouse

Introduction to OLAP Operations CPIT 440 Data Mining and Warehouse

Introduction to OLAP Operations Cube Structure: –Every cube has a schema, which is the set of joined tables in the data warehouse from which the cube draws its source data. –The central table in the schema is the fact table, the source of the cube's measures. –The other tables are dimension tables, the sources of the cube's dimensions. CPIT 440 Data Mining and Warehouse

Introduction to OLAP Operations Cube Structure –A cube's structure is defined by its measures and dimensions. –They are derived from tables in the cube's data source. –The set of tables from which a cube's measures and dimensions are derived is called the cube's schema. –Every cube schema consists of a single fact table and one or more dimension tables. –The cube's measures are derived from columns in the fact table. –The cube's dimensions are derived from columns in the dimension tables. CPIT 440 Data Mining and Warehouse

Introduction to OLAP Operations Cube Structure –Star schema: A fact table in the middle connected to a set of dimension tables –Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake –Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation CPIT 440 Data Mining and Warehouse

Introduction to OLAP Operations CPIT 440 Data Mining and Warehouse OLAP Operations: –Roll up: summarize data / dimension reduction –Roll down: reverse of roll-up Make detailed data, or introducing new dimensions –Slice and dice –Pivot (rotate)

Roll up and Roll down CPIT 440 Data Mining and Warehouse

Slice and Dice CPIT 440 Data Mining and Warehouse

Pivot (Rotate) CPIT 440 Data Mining and Warehouse

Exercise 1 Suppose that a data warehouse consists of the three dimensions: time, doctor, and patient, and the two measures count and charge, where charge is the fee that a doctor charges a patient for a visit. CPIT 440 Data Mining and Warehouse

Exercise 1 (a) Enumerate three classes of schemas that are popularly used for modeling data warehouses. Three classes of schemas popularly used for modeling data warehouses are The star schema, The snowflake schema The fact constellations schema. CPIT 440 Data Mining and Warehouse

Exercise 1 (b) Draw a schema diagram for the above data warehouse using one of the schema classes listed in part (a). CPIT 440 Data Mining and Warehouse

Exercise 1 (c) Starting with the base cuboid [day; doctor; patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? CPIT 440 Data Mining and Warehouse The operations to be performed are: Roll-up on time from day to year. Slice for time=2004. Roll-up on patient from individual patient to all.

Exercise 2 Suppose that a data warehouse for Big University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg. grade. When at the lowest conceptual level (e.g.,for a given student, course, semester, and instructor combination), the avg. grade measure stores the actual course grade of the student. At higher conceptual levels, avg. grade stores the average grade for the given combination. CPIT 440 Data Mining and Warehouse

Exercise 2 (a) Draw a snowflake schema diagram for the data warehouse. CPIT 440 Data Mining and Warehouse

Exercise 2 (b) Starting with the base cuboid [student; course; semester; instructor], what specific OLAP operations should perform in order to list the average grade of CS courses for each Big University student. CPIT 440 Data Mining and Warehouse The specific OLAP operations to be performed are: Roll-up on course from course id to department. Roll-up on student from student id to university. Dice on course, student with department=\CS" and university = \Big University". Drill-down on student from university to student name.

Exercise 3 Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. CPIT 440 Data Mining and Warehouse

Exercise 3 (a) Draw a star schema diagram for the data warehouse. CPIT 440 Data Mining and Warehouse

Exercise 3 (b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004? CPIT 440 Data Mining and Warehouse The specific OLAP operations to be performed are: Roll-up on date from date id to year. Roll-up on spectator from spectator id to status. Roll-up on location from location id to location name. Roll-up on game from game id to all. Dice with status=\students", location name=\GM Place", and year=2004.