Building a Data Warehouse: Understanding Why & How

Slides:



Advertisements
Similar presentations
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Advertisements

Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Lecture 5 Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
DATA WAREHOUSE CONCEPTS. A Definition · A Data Warehouse: Is a repository for collecting, standardizing, and summarizing snapshots of transactional data.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Components and Architecture CS 543 – Data Warehousing.
Data Warehouse success depends on metadata
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
The database development process
® IBM Software Group © IBM Corporation IBM Information Server Metadata Management.
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
1 Components of A Successful Data Warehouse Chris Wheaton, Co-Founder, Client Advocate.
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Understanding Data Warehousing
Database Systems – Data Warehousing
Foundations of information systems
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
© 2007 by Prentice Hall 1 Introduction to databases.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 10: The Data Warehouse Decision Support Systems in the 21 st.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
By N.Gopinath AP/CSE.  The data warehouse architecture is based on a relational database management system server that functions as the central repository.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
MBA/1092/10 MBA/1093/10 MBA/1095/10 MBA/1114/10 MBA/1115/10.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Managing Data Resources File Organization and databases for business information systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Building a Data Warehouse
Intro to MIS – MGS351 Databases and Data Warehouses
Business System Development
Defining Data Warehouse Concepts and Terminology
Global E-Business: How Businesses Use Information Systems
Manajemen Data (2) PTI Pertemuan 6.
Chapter 13 The Data Warehouse
Components of A Successful Data Warehouse
Data Warehouse—Subject‐Oriented
Informix Red Brick Warehouse 5.1
Data Warehouse.
Defining Data Warehouse Concepts and Terminology
Database Management System (DBMS)
Data, Databases, and DBMSs
MANAGING DATA RESOURCES
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Introduction to Data Warehousing
Chapter 1: The Database Environment
Data Warehousing Data Model –Part 1
THE ARCHITECTURAL COMPONENTS
The Database Environment
Data Warehouse.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Data Warehousing Concepts
The Database Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Best Practices in Higher Education Student Data Warehousing Forum
Data Warehouse and OLAP Technology
Presentation transcript:

Building a Data Warehouse: Understanding Why & How

Overview Slides 1-3: Benefits Slides 4-5: SDLC Slides 6-9: Reasons why Slides 10-12: Environment Slides 13-16: Building Slides 17-20: Architecture Slides 21-22: Iterative development Slides 23-25: Meta data Slides 26-30: Staffing and requirements

Why Companies Build Automated Data Warehouses Improved access to integrated data Consistent reporting structure Speed of development Improved productivity Better data integrity and quality Reduction in maintenance cost $1,872.95 $1,472.95 ???

Development Cycle With an Automated Tool 12 weeks for full production use 6-8 weeks for first production-level implementation 3 weeks to learn automated data warehouse development process weeks 1 2 3 4 5 6 7 8 9 10 11 12

Number of Years of Data in a Typical Data Warehouse

Customers Receive Operational and Strategic Benefits Operational goals Improve profitability Increase sales Leverage procurement activity Maintain competitive advantage Strategic business goals Identify new market opportunities Strengthen customer relationships Expand market share Manage risk over time

The New Business Paradigm Integrating data to produce useful business information is a major challenge in business today! Operational Processing Informational Processing Legacy Environment Islands of Data Consolidated Information with a Historical Perspective

The Business Problem: Mixed Operational Environments Tuned for transactions, not decision support Data spans short time periods No standards for data definition and naming Queries never get same answer twice Dynamic data No “what ifs” possible No summarized data No integrated data

Typical Customer Environments Data stored in many forms Relational databases Hierarchical databases Flat files Heterogeneous mainframe and UNIX-based environments DEC (VMS, UNIX) HP IBM (MVS, UNIX) Sun Tandem Other UNIX environments

Transforming Data Into a Business Intelligence End User Data Warehouse Operational Systems Informational Processing Transform Operational Data to Data Warehouse Data: Extract Integrate Summarize Filter Convert Set default values Restructure Reformat Establish time variance Create consistency

Comparing the Environments Business Decisions Operational Decisions Data Warehouse Operational Systems Operational Systems No summary data No drill down No historical data Not integrated applications Rich supply of summary data Structure for drill down analysis Historical data for trend analysis Integrated data for corporate analysis

Building and Using the Data Warehouse OPERATIONAL PROCESSING INFORMATION PROCESSING Individual Transactions Consolidated Analysis IBM Ascential SAP Oracle

Requirements to Build a Data Warehouse Information architecture to understand the movement of data Extraction of data from legacy systems, operational applications and external sources Transformation of data to integrate, condense and summarize into a historical format Documentation of the development process (meta data); to understand sources of data, transformations and changes over time Ongoing maintenance to capture changes to source data and perform updates for iterative processing

Building the Data Warehouse 1 CREATE DATA WAREHOUSE DATA MODEL Generic Data Models™ Consulting 2 DEFINE SYSTEM OF RECORD Consulting 3 DESIGN DATA WAREHOUSE Consulting Development Methodology Methodology Readiness Assessment Database Design Project Management DataStage Developer 4a CAPTURE META DATA DEFINITIONS MetaStage™ File Definitions Data Dictionaries CASE Tools Business Descriptions 5 CREATE TRANSFORMATION PROGRAMS Extract Filter Integrate Condense Convert Derive Data Create Time Variance Generate Code 4b CAPTURE LOG CHANGES

Building the Data Warehouse 5 CREATE TRANSFORMATION PROGRAMS 6 EXTRACT, INTEGRATE & CONSOLIDATE SOURCE FILES DB Sources Extract Filter Integrate Condense Convert Derive Data Create Time Variance Generate Code 7 POPULATE & MAINTAIN DATA WAREHOUSE DB Targets DataStage 8 CREATE INFORMATION DIRECTORY & POPULATE ACCESS TOOLS MetaStage Directory™

Distributed Data Warehouse Solution DataStage MetaStage Iterations Methodology

A New Kind of Information Architecture An architected approach to information management and delivery Improves data integrity, performance, manageability, and access Individually Structured Departmentally Structured Data Warehouse m/d m/d m/d m/d m/d Organizationally Structured m/d m/d m/d Acquisition, Transformation & Integration Programs Archived Detail Operational Systems (System of Record) m/d = meta data

Architecture Affects Data Movement, System Impact and Development Effort Virtual Data Warehouse Conversion Technology Our Architected Solution

Structured Information Architecture EXTERNAL DATA DATA ACCESS & MULTIDIMENSIONAL ANALYSIS DATA WAREHOUSE OPERATIONAL DATA INFORMATION DIRECTORY Target Modules Transfer Technical Meta Data Source Modules DEVELOPER WORKSTATION

More Rapid Development Process DEFINE & CAPTURE PHYSICAL META DATA (CASE, Dictionaries, Catalogs, COBOL FDs) Transfer Technical Meta Data IMPORT BUSINESS META DATA (CASE Tools, Repositories, Machine Readable Files) SELECT TRANSFORMATIONS (Mapping, Conversion, Selection & Summarization) CREATE INFORMATION DIRECTORY & POPULATE ACCESS TOOLS CAPTURE LOG CHANGES UPDATE WAREHOUSE Developer Workstations

Iterative Processing All domestic raw goods January All domestic raw goods Foreign raw goods Domestic wip February All domestic raw goods * Foreign raw goods * Domestic wip Foreign wip Large customers March All domestic raw goods Foreign raw goods * Domestic wip * Foreign wip * Large customers Domestic finished goods Foreign customers April All domestic raw goods * Foreign raw goods Domestic wip * Foreign wip Large customers * Domestic finished goods * Foreign customers Foreign finished goods New prospects/customers May * being reworked Successful data warehouses are built in small, fast, iterative development efforts that produce measurable results.

Meta Data Exists Throughout the Structure of the Data Warehouse Highly summarized META DATA appl Lightly summarized appl Current detail appl integration/ transformation 5-10 years Older detail appl A data warehouse contains: Integrated data Subject oriented Historical data with time variance Both detailed and summary data

Turning Meta Data Into Information Public Library Corporation Where’s your card catalog for your corporate information?

Meta Data Answers Questions for Users of the Data Warehouse ?? How do I find the data I need? What is the original source of the data? How was this summarization created? What queries are available to access the data? How have business definitions and terms changed? How do product lines vary across organizations? What business assumptions have been made?

Be Ready to Adopt the Rapidly Advancing Data Warehouse Tool Sets Monitoring Security Design automation Transport automation Log tape as a source Meta Data Transformation Extraction, multiple DBMS Platform, DBMS Let’s build a data warehouse That’s a good idea

Skills and Experience To Build a Data Warehouse Can build on vision instead of hard requirements Can perform data warehouse information modeling Can implement a leveled information architecture Can manage a parallel, iterative, time-boxed project Can use/manage data warehouse specific technologies Can tightly coordinate a broad spectrum of resources Can set and continuously manage realistic expectations Can manage huge amounts of data Has “been there, done that” Uncommon Specialized Skill Set

Staff Requirements Project management Warehouse modeling Database design Data administration Component implementation Data access and analysis Data stewardship 5-7 people involved in a typical project 3-4 people using automated tool 1-2 people for ongoing maintenance

Be Ready for the Technological Impact The data warehouse introduces new technologies and taxes old ones Information modeling A new type of information architecture Interdependence with legacy systems Incredible growth in data volume Rapidly advancing data warehousing tool sets

Ready for Information Modeling? DATA WAREHOUSE TRADITIONAL DATABASE Integrated Data Historical Data Organized by Subject Non-Volatile Data Redundant Data Descriptive Data Summarized Data Meta Data Application-specific Data Current Data Organized for Performance Updated Data Normalized Data Encoded Data Raw Data (Just Data)

Be Ready to Manage Incredible Data Volume Growth Extend history Expand subject areas Cultivate data marts for departmental use Increase data volume due to business growth