Components and Architecture CS 543 – Data Warehousing.

Slides:



Advertisements
Similar presentations
Introduction to DBA.
Advertisements

Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Physical Design CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 Physical Design Steps 1. Develop standards 2.
Accelerated Access to BW Al Weedman Idea Integration.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) The Data Warehouse Lifecycle Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Chapter 13 The Data Warehouse
Data Warehouse Components
Data Warehouse Toolkit Introduction. Data Warehouse Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An.
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Architecture and Infrastructure Module 2 G.Anuradha.
Database Administration Chapter 16. Need for Databases  Data is used by different people, in different departments, for different reasons  Interpretation.
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
The Client/Server Database Environment
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
ETL By Dr. Gabriel.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Understanding Data Warehousing
Database Systems – Data Warehousing
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
Datawarehouse Objectives
Chapter © 2006 The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/ Irwin Chapter 7 IT INFRASTRUCTURES Business-Driven Technologies 7.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Introduction to the Adapter Server Rob Mace June, 2008.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
Data Warehouse Fundamentals
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
CHAPTER 7: ARCHITECTURAL COMPONENTS. CHAPTER OBJECTIVES  Understand data warehouse architecture  Examine how the architectural framework supports the.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Database Administration
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
7 Strategies for Extracting, Transforming, and Loading.
By N.Gopinath AP/CSE.  The data warehouse architecture is based on a relational database management system server that functions as the central repository.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Data Warehouse Components
Advanced Applied IT for Business 2
The Client/Server Database Environment
Chapter 13 The Data Warehouse
Data Warehouse.
#01 Client/Server Computing
An Introduction to Data Warehousing
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
THE ARCHITECTURAL COMPONENTS
Data Warehouse.
The Database Environment
#01 Client/Server Computing
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Components and Architecture CS 543 – Data Warehousing

CS Data Warehousing (Sp ) - Asim LUMS2 Architecture What are the key components of a data warehouse? Architecture is the structure that binds the components into an integrated whole  DW architecture provides the overall framework for developing and deploying DW solutions

CS Data Warehousing (Sp ) - Asim LUMS3 Architectural Components

CS Data Warehousing (Sp ) - Asim LUMS4 Distinguishing Characteristics Different objectives and scope Data content Complex analysis and quick response Flexible and dynamic Metadata driven

CS Data Warehousing (Sp ) - Asim LUMS5 Architecture Supporting Flow of Data

CS Data Warehousing (Sp ) - Asim LUMS6 Technical Architecture The technical architecture of a DW is the complete set of functions and services provided within its components  Functions  Services  Rules and procedures  Data stores Tools are the means to implement an architecture  Architecture comes first, then the tools; select the appropriate tools based on the architecture

CS Data Warehousing (Sp ) - Asim LUMS7 Data Acquisition (1) This component includes  Extraction  Transfer into staging area  Preparation for loading (transformation, cleansing, and integration)

CS Data Warehousing (Sp ) - Asim LUMS8 Data Acquisition (2)

CS Data Warehousing (Sp ) - Asim LUMS9 Data Acquisition – Functions and Services (1) Data extraction  Select data sources and determine the types of filters to apply to individual sources  Generate automatic extract files from operational systems using replication and other techniques  Create intermediary files to store selected data to be merged later  Transport extracted files from multiple platforms  Provide automated job control services for creating extract files  Reformat input from outside sources, departmental files, databases, and spreadsheets  Resolve inconsistencies for common data elements from multiple sources  Generate common application code for data extraction

CS Data Warehousing (Sp ) - Asim LUMS10 Data Acquisition – Functions and Services (2) Data transformation  Map input data to data for DW repository  Clean data, remove duplicates, merge/purge  De-normalize extracted data structures as required by the dimensional model of the DW  Convert data types  Calculate and derive attribute values  Check for referential integrity  Aggregate data as needed  Resolve missing values  Consolidate and integrate data

CS Data Warehousing (Sp ) - Asim LUMS11 Data Acquisition – Functions and Services (3) Data staging  Provide backup and recovery for staging area repository  Sort and merge files  Create files as input to make changes to dimension tables  If staging area storage is a relational database, create and populate database

CS Data Warehousing (Sp ) - Asim LUMS12 Data Storage This architectural component covers the process of loading the prepared data from the data staging area into the data warehouse repository

CS Data Warehousing (Sp ) - Asim LUMS13 Data Storage – Functions and Services Load data for full refreshes of DW tables Perform incremental loads at regular prescribed intervals Support loading into multiple tables at the detailed and summarized levels Optimize the loading process Provide automated job control services for loading the data warehouse Provide backup and recovery for the DW database Provide security Monitor and fine-tune the database Periodically archive data from the database according to preset conditions

CS Data Warehousing (Sp ) - Asim LUMS14 Information Delivery (1) This architectural component spans a broad spectrum of many different methods of making information available to the users of the DW To the users, information delivery is the DW; it is the front-end through which the users retrieve information from the DW Information  Online queries and interactive analyses  Regular and ad-hoc reports  Specialized applications (e.g. executive information system)  Data mining

CS Data Warehousing (Sp ) - Asim LUMS15 Information Delivery (2)

CS Data Warehousing (Sp ) - Asim LUMS16 Information Delivery – Functions and Services Provide security to control information access Monitor user access to improve service and for future enhancements Allow users to browse data warehouse content Simplify access by hiding internal complexities of data storage from users Automatically reformat queries for optimal execution Enable queries to be aware of aggregate tables for faster results Govern queries and control runaway queries Provide self-service report generation for users Store result sets for queries and reports for future use Provide multiple levels of data granularity Provide event triggers to monitor data loading Make provision for the users to perform complex analysis Enable data feeds to downstream, specialized data support systems such as EIS and data mining

CS Data Warehousing (Sp ) - Asim LUMS17 Infrastructure Supporting Architecture The architecture defines the functions and services; the infrastructure defines the elements to support the architecture Infrastructure is the foundation supporting the architecture  Hardware servers  OSs  Data management systems  Networking elements  Supporting tools and applications  People  Procedures

CS Data Warehousing (Sp ) - Asim LUMS18 Operational Infrastructure Operational infrastructure includes  People  Procedures  Training  Management software Operational infrastructure are the people and procedures that keep the DW functioning, and not those who develop the DW

CS Data Warehousing (Sp ) - Asim LUMS19 Physical Infrastructure (1)

CS Data Warehousing (Sp ) - Asim LUMS20 Physical Infrastructure (2) Physical infrastructure includes  Computing hardware (e.g. server)  OS and utilities  Networking hardware and software  Software tools Decisions about the physical infrastructure are critical for a DW. Two principles  Leverage as much of the existing physical infrastructure  Keep the infrastructure as modular as possible

CS Data Warehousing (Sp ) - Asim LUMS21 Hardware and Operating System Hardware  Scalability  Support  Vendor reference  Vendor stability Operating system  Compatibility  Scalability  Security  Reliability  Availability  Preemptive multitasking  Multi-threaded approach  Memory protection

CS Data Warehousing (Sp ) - Asim LUMS22 Single Platform Option Simplest option, where all functions and services are performed by a single computing platform Typically used by small to medium sized companies who have mainframes or large Unix servers already in use with capacity to spare Some shortcomings of using mainframes  Stretched to capacity  Non availability of tools  Multiple legacy platforms  Company’s migration policy

CS Data Warehousing (Sp ) - Asim LUMS23 Hybrid Option Most companies opt for the hybrid option where multiple platforms are used for data warehousing (data acquisition, data storage, information delivery)

CS Data Warehousing (Sp ) - Asim LUMS24 Data Extraction Data extraction  Best performed on each source system’s own computing platform Initial reformatting and merging  Best performed on each source system’s own computing platform  Extract files are reformatted and merged into a smaller number of files performing verification against the source system Initial data cleansing  Also performed on source system platform Transformation and consolidation  Performed on the staging area platform Validation and final quality check  Performed on the staging area platform Creation of load images  Performed on the staging area platform

CS Data Warehousing (Sp ) - Asim LUMS25 Options for the Data Staging Area In one of the legacy platforms On the data storage platform On a separate optional platform  You can optimize the platform for complex transformations and cleaning  Install specialized tools for transformations and cleaning  Keep track of entire data content in the staging area

CS Data Warehousing (Sp ) - Asim LUMS26 Data Movement

CS Data Warehousing (Sp ) - Asim LUMS27 Client/Server Architecture (1)

CS Data Warehousing (Sp ) - Asim LUMS28 Client/Server Architecture (2) Application server (middle tier)  To run middleware and establish connectivity  To execute management and control software  To handle data access from the Web  To manage metadata  For authentication  As front end  For managing and running standard reports  For sophisticated query management  For OLAP applications

CS Data Warehousing (Sp ) - Asim LUMS29 Maturing of the Infrastructure