I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing.

Slides:



Advertisements
Similar presentations
Data Warehousing – A Technology Marvel -by Swati Chawla.
Advertisements

Dimensional Modeling.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Management Information Systems, Sixth Edition
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
© Copyright 2011 John Wiley & Sons, Inc.
Organizing Data & Information
11 3 / 12 CHAPTER Databases MIS105 Lec14 Irfan Ahmed Ilyas.
L The Difference Between Logical and Physical Views of Information l Databases and Database Management Systems l How You Can Develop Database Applications.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Chapter 14 The Second Component: The Database.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
XP Information Information is everywhere in an organization Employees must be able to obtain and analyze the many different levels, formats, and granularities.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 8 Accessing Organizational Information – Data Warehouse.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA ebay
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Database Systems – Data Warehousing
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Databases and Data Warehouses: Supporting the Analytics-Driven.
Architecture for a Database System
OnLine Analytical Processing (OLAP)
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
1 Data Warehouses BUAD/American University Data Warehouses.
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
Data Warehousing.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data resource management
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Chapter 4 Logical & Physical Database Design
Two-Tier DW Architecture. Three-Tier DW Architecture.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Foundations of information systems : BIS 1202 Lecture 4: Database Systems and Business Intelligence.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Pertemuan <<13>> Data Warehousing dan Decision Support
Data warehouse and OLAP
Fundamentals & Ethics of Information Systems IS 201
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Data Analysis.
Chapter 17 Designing Databases
Data Warehousing Concepts
Presentation transcript:

I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing

◉ Decision Support Systems ◉ Data Warehousing  Concepts of Data Warehouse  Components of Data Warehouse  Warehouse Schema  Storage Features – Column Oriented OUTLINE

Decision Support Systems Let’s start with the first set of slides 1

Information Explosion Age

Transaction Processing Systems ◉ Record information about transactions ◉ For example: product sales information for companies, ◉ Course registration and grade information for universities. ◉ Organizations have accumulated a vast amount of information generated by these systems Database Application Classification Decision Support Systems ◉ Get high-level information out of the detailed information stored in transaction- processing systems ◉ Use the high-level information to make a variety of decisions Decision Support Systems

Transaction Information Example Decision Support Systems Retailer customer Item purchased Price paidDate on which the purchase made Item Name Manufacturer Color, size, etc Credit History Annual Income Age Information range up to hundreds of gigabytes or even terabytes

Decision Making Based On Transaction Information Decision Support Systems Decision Making – E.g. what items to stock and what discount to offer Transaction Information

Decision Making Example – Precision Marketing Decision Support Systems Decision Making System: Analyze input transaction information Customer Input: Age/ Gender / Job / Purchase Pattern, etc Make decision according decision making system: Expose specific ads to specific customer group

Decision Making Issues Decision Support Systems ◉ General queries written in SQL cannot fulfil decision making analysis, several SQL extensions have been proposed. ◉ Database query language cannot meet the performance of detailed statistical analysis of data. Professional software come to use such as SAS ◉ Data used for decision making come from different sources. ◉ Knowledge-discovery techniques discover rules and patterns from data automatically – data mining

Data Warehousing 2

Concepts Data Warehousing ◉ A data warehouse is a repository (or archive) of information gathered from multiple sources, stored under a unified schema, at a single site. ◉ Data warehouses provide the user a single consolidated interface to data, making decision- support queries easier to write.

Architecture Data Warehousing

When and How to Gather Data Data Warehousing ARRAY Factory OLED Factory Module Factory A_DBC_DBB_DB Panel Screen Industry: One company owns 3 factories, i.e. 3 databases – Cumbersome to analyze data when extract data from 3 different sources.

When and How to Gather Data Data Warehousing ◉ Source data (transaction data) update in real time, e.g when a customer buy an item, the database will be update at the same time. ◉ Warehouse will never be quite up-to-date with sources. ◉ Warehouse sends a request for new data to the sources periodically, e.g. update every night.

What Schema to Use Data Warehousing ◉ Data sources have different schemas or even use different data models. ◉ Before stored, data warehouse will perform schema integration and convert data into the integrated schema. ◉ Actually, data stored in the data warehouse is a materialized view of the data at the source.

What Schema to Use Data Warehousing ARRAY OLED MODULE (0, 0) (0, 1) (1, 0) (1, 1) (2, 0) (3, 0) (2, 1) (3, 1) (0, 0) (0, 1) (1, 0) (1, 1) (0, 0) (0, 1) (1, 0) (1, 1) CUT 1 CUT 2

What Schema to Use Data Warehousing sheet_idX_axisY_axis A Cut_idsheet_idX_axisY_axis C001A00100 C002A00101 panel_idCut_id B001C001 B002C001 TABLE: ASHEET TABLE: CSHEET TABLE: BSHEET

What Schema to Use Data Warehousing SHEET_IDCUT_IDPANEL_IDX_AXISY_AXISFAB_ID A001NULL 00A A001C001NULL00C A001C001B001NULL B Schema Integration ARRAY DATA OLED DATA Module DATA

Data Transformation Data Warehousing ◉ Sometimes data stored in warehouse should be transformed. ◉ For example change the units of measure or convert the data to a different schema by joining data from multiple source relations, see previous example. ◉ Data warehouses typically have graphical tools to support data transformation.

Data Cleansing Data Warehousing ◉ Correcting and preprocessing data. ◉ Fuzzy lookup – E.g correct misspelled name, address, incorrect postal code to a reasonable extent by consulting a database of street names and postal codes in each city. ◉ Merge-purge operation / deduplication – E.g: Address lists collected from multiple sources may have duplicates which need to be eliminated. ◉ Householding – E.g. Records for multiple individuals in a house may be grouped together so only one mailing is sent to each house.

How to propagate updates Data Warehousing ◉ Updates on relations at the data source must be propagated to the data warehouse. ◉ Case1: the relations at the data warehouse are exactly the same as those at the data source – copy directly. ◉ Case2: Relations are different between data warehouse and data source – it’s a view- maintenance problem.

What data to summarize Data Warehousing ◉ Raw data generated by a transaction-processing system may be too large to store online. ◉ Maintain summary data by aggregation on a relation is important – E.g. instead of storing data about every sale of clothing, we can store only total sales of clothing by item_name and category.

Warehouse Schemas Data Warehousing ◉ Data warehouses schemas for data analysis. ◉ Data are usually multidimensional – dimension attributes and measure attributes. ◉ Tables containing multidimensional data are called facts tables, usually very large. ◉ To minimize storage requirements, dimension attributes are usually short identifiers that are foreign keys into other tables called dimension tables.

Warehouse Schemas Data Warehousing A fact table with several multiple dimension tables is called star schema ; More complex which have multiple levels of dimension tables are called snowflake schema

Column-Oriented Storage Data Warehousing ◉ Row-oriented storage – store all attributes of a tuple together and tuples are stored sequentially in a file – traditional database ◉ Column-Oriented storage – Each attribute of a relation is stored in a separate file, with values from successive tuples stored at successive positions in a file.

Column-Oriented Storage Data Warehousing SHEET_IDCUT_IDPANEL_IDX_AXISY_AXISFAB_ID A001NULL 00A A001C001NULL00C A001C001B001NULL B File Row-oriented Storage SHEET_ID A001 CUT_ID NULL C001 FAB_ID A C B File1File2File 3 Column-oriented Storage

Column-Oriented Storage – Benefits Data Warehousing ◉ When a query access only a few attributes with a large number of attributes, the remaining attributes need not to be fetched from disk into memory. ◉ Storing values of the same types increases the effectiveness of compression which can greatly reduce both the disk storage cost and time to retrieve data from disk.

Column-Oriented Storage – Drawbacks Data Warehousing ◉ Storing or fetching a single tuple requires multiple I/O operations. ◉ Transaction-processing systems always manipulate tuples – use row-oriented storage. ◉ Warehouses rarely access to individual tuples, but rather require scanning and aggregating multiple tuples – use column-oriented storage.

Any questions ? Thanks!