Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

Supervisor : Prof . Abbdolahzadeh
Data Warehousing – An Introductory Perspective
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehousing M R BRAHMAM.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Chapter 13 Business Intelligence and Data Warehouses
Database Systems: Design, Implementation, and Management Tenth Edition
© Copyright 2011 John Wiley & Sons, Inc.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Database Management: Getting Data Together Chapter 14.
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
Components and Architecture CS 543 – Data Warehousing.
DATA WAREHOUSING.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Chapter 14 The Second Component: The Database.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 13 The Data Warehouse
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Architecture and Infrastructure Module 2 G.Anuradha.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
An Introduction to Infrastructure Ch 11. Issues Performance drain on the operating environment Technical skills of the data warehouse implementers Operational.
ITEC 3220A Using and Designing Database Systems
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
OnLine Analytical Processing (OLAP)
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
By N.Gopinath AP/CSE.  The data warehouse architecture is based on a relational database management system server that functions as the central repository.
Mapping the Data Warehouse to a Multiprocessor Architecture
Two-Tier DW Architecture. Three-Tier DW Architecture.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
An Overview of Data Warehousing and OLAP Technology
Supervisor : Prof . Abbdolahzadeh
Decision Support System by Simulation Model (Ajarn Chat Chuchuen)
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
An Introduction to Data Warehousing
Dimensional Modeling.
Chapter 17 Designing Databases
Data Warehousing Concepts
Presentation transcript:

Designing a Data Warehouse Issues in DW design

Three Fundamental Processes Data Acquisition Data Storage Data a Access

Data Acquisition Handles acquisition of data from legacy systems and outside sources. Data is identified, copied, formatted and prepared for loading into the warehouse.

Acquisition steps Catalog the data Develop an inventory of where it is and what it means. Clean and prepare the data. Extract from legacy files and reformat to make it usable. Transport data from one location to another.

Storage The storage component holds the data so that the many different data mining, executive information and decision support systems can make use of it effectively.

The Storage Area Managed by Relational databases like those from Oracle Corp. or Informix Software Inc. Specialized hardware symmetric multiprocessor (SMP) or massively parallel processor (MPP) machines

Storage The majority of warehouse storage today is being managed by relational databases running on Unix platforms. Oracle, Sybase Inc., IBM Corp. and Informix control 65 percent of the warehouse storage market. Meta Group Inc. (1996)

Access Different end-user PCs and workstations draw data from the warehouse with the help of multidimensional analysis products, neural networks, data discovery tools or analysis tools. These powerful, "smart" software products are the real driving force behind the viability of data warehousing.

Access Tools Intelligent Agents and Agencies Query Facilities and Managed Query Environments Statistical Analysis Data Discovery. (decision support, artificial intelligence and expert systems) OLAP Data Visualization

Hardware Budget A typical startup warehouse project allocates more than 60 percent of its budget for hardware and software to the creation of a powerful storage component, spending just 30 percent on data mining and user access technologies.

Systems Analysis Budget Budgeting for systems analysis and development, however, follows a very different pattern. More than 50 percent of development dollars are spent on building acquisition capabilities, 30 percent fund the development of user solutions and 20 percent are dedicated to the creation of databases in the storage component.

Design Issues Relational and Multidimensional Models Denormalized and indexed relational models more flexible Multidimensional models simpler to use and more efficient

Star Schemas in a RDBMS In most companies doing ROLAP, the DBAs have created countless indexes and summary tables in order to avoid I/O-intensive table scans against large fact tables. As the indexes and summary tables proliferate in order to optimize performance for the known queries and aggregations that the users perform, the build times and disk space needed to create them has grown enormously, often requiring more time than is allotted and more space than the original data!

Building a Data Warehouse from a Normalized Database The steps Develop a normalized entity-relationship business model of the data warehouse. Translate this into a dimensional model. This step reflects the information and analytical characteristics of the data warehouse. Translate this into the physical model. This reflects the changes necessary to reach the stated performance objectives.

The Business Model Identify the data structure, attributes and constraints for the client’s data warehousing environment. Stable Optimized for update Flexible

Business Model As always in life, there are some disadvantages to 3NF: Performance can be truly awful. Most of the work that is performed on denormalizing a data model is an attempt to reach performance objectives. The structure can be overwhelmingly complex. We may wind up creating many small relations which the user might think of as a single relation or group of data.

Structural Dimensions The first step is the development of the structural dimensions. This step corresponds very closely to what we normally do in a relational database. The star architecture that we will develop here depends upon taking the central intersection entities as the fact tables and building the foreign key => primary key relations as dimensions.

Simple DW pattern.

Other Dimensions Categorical dimensions: generated groups (additional key components) Partitioning dimensions: subtypes (planned vs. actual) Informational dimensions: generate different types of data (messy).