Decision Support Systems 1201311 Data Warehousing Chattrakul Sombattheera.

Slides:



Advertisements
Similar presentations
C6 Databases.
Advertisements

Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Chapter 3 Database Management
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Chapter 5 DATA WAREHOUSING.
Business Intelligence: A Managerial Approach (2nd Edition)
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Chapter 8: Data Warehousing
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Chapter 2 Data Warehousing
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
2nd semester 2010 Dr. Qusai Abuein
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Chapter 2 Data Warehousing
Data Warehouse & Data Mining
Understanding Data Warehousing
Database Systems – Data Warehousing
Data Warehouse Concepts Transparencies
@ ?!.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
CISB594 – Business Intelligence Data Warehousing Part II.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
BUSINESS DRIVEN TECHNOLOGY
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
CISB594 – Business Intelligence
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
CISB594 – Business Intelligence Data Warehousing Part I.
CISB594 – Business Intelligence Data Warehousing Part I.
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
CISB594 – Business Intelligence Data Warehousing Part I.
CISB594 – Business Intelligence Data Warehousing Part II.
Decision Support Systems Data Warehousing. Modified from Decision Support Systems and Business Intelligence Systems 9E. 1-2 Learning Objectives Understand.
1 ISQS 3358, Business Intelligence Data Warehousing Zhangxi Lin Texas Tech University 1.
Chapter 2 Data Warehousing. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures.
Chapter 2 Data Warehousing. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Understand data warehousing architectures.
CISB594 – Business Intelligence Data Warehousing Part I.
CISB594 – Business Intelligence Data Warehousing Part II.
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 5: Data Warehousing.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
DATA WAREHOUSING. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Understand data warehousing architectures 
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
Managing Data Resources File Organization and databases for business information systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Business Intelligence Overview
Business Intelligence: A Managerial Approach (2nd Edition)
Chapter 2 Data Warehousing
Advanced Applied IT for Business 2
Data Warehousing and Data Mining By N.Gopinath AP/CSE
Data Warehouse.
Chapter 8: Data Warehousing
MANAGING DATA RESOURCES
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Chapter 3 DATA WAREHOUSING.
Data Warehousing Concepts
Data Warehouse and OLAP Technology
Presentation transcript:

Decision Support Systems Data Warehousing Chattrakul Sombattheera

Agenda Definitions and Concepts Process Overview Architecture Data Integration and the ETL Processes Development Real-Time Data Warehousing Administration and Security Issues

Definitions and Concepts A data warehouse (DW) is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization. Data are usually structured to be available in form ready for analytical processing activities (e.g., online analytical processing [OLAP], data mining, querying, reporting, other decision support applications). A data warehouse is a subject-oriented, integrated, time- variant, nonvolatile collection of data in support of management’s decision-making process.

Characteristics of DW Subject-oriented. Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support that allows decision makers to both know how their business performs and why. –A DW differs from typical DBs that they are more product-oriented and are tuned to handle transactions of DBs. Subject orientation allows a more comprehensive view of organization Integrated. Data are from different sources and must be in a consistent form. DWs must deal with naming conflicts and discrepancies among units of measure. Time variant (time series). A DW usually contains historical data (containing multiple time points, e.g. daily, weekly, monthly), except in real-time systems. They detect trends, deviations, long-term relationships for forecasting and comparisons, leading to decision making. Nonvolatile. Users cannot change data in DWs. Obsolete data are discarded, and changes are recorded as new data.

Additional DW Characteristics Web-based. DWs are typically web-based application. Relational/Multidimensional. DWs use either a relational structure or multidimensional structure. Client/server. DWs use client/server architecture to provide easy access to end users. Real-time. Newer DWs provide real-time or active, data access and analysis capabilities. Include metadata. DWs contain metadata (data about data) about how data organized and how to effectively use them.

Parts of DWs. Data Marts. A data mart is a subset of a DW, typically consisting of a single subject area (e.g. marketing, operations). Operational Data Stores (ODS). An ODS provides a fairly recent form of customer information file (CIF) and is used for short-term decision involving mission-critical applications. The contents of an ODS are updated through the course of business operations. Enterprise Data Warehouses (EDW). An EDW is a large scale DW that is used across the enterprise for decision support. EDW are used to provide data for many types of DSS, including customer relation management (CRM), supply chain management (SCM), business performance management (BPM), business activity monitoring (BAM), product lifecycle management (PLM), revenue management, knowledge management systems (KMS), etc.

DW Process Overview Data sources. Data come from various sources including legacy systems, external data providers, online transaction processing (OLTP), enterprise resource planning (ERP) system, Web logs, etc. Data extractions. Data are extracted using custom-written or commercial software (ETL). Data loading. Data are loaded into staging area, where they are transformed and cleansed. Comprehensive database. EDWs support all decision analysis by providing relevant summarised and detailed information originating from many different sources. Metadata. Medata are maintained so that they can be accessed by IT personnel and users. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search. Middleware tools. Middleware tools enable access to the data warehouse. Middleware tools include SQL, Business Object, applications (data mining, OLAP, reporting tools, data visualize tools), etc.

DW Process Overview

DW Architectures

DW Architecture

DW Architectures

10 Factors for choosing DW architecture Information interdependence between organizational units Upper management’s information needs Urgency of need for a data warehouse Nature of end-user tasks Constraints on resources Strategic view of the data warehouse prior to implementation Compatibility with existing systems Perceived ability of the in-house IT staff Technical issues Social/Political factors

Data Integration Data integration comprises three major processes that, when correctly implemented, permit data to be accessed and made accessible to an array of ETL and analysis tools and data warehousing environment: –Data access (i.e., the ability to access and extract data from any data source), –Data federation (i.e., the integration of business views across multiple data stores), and –Change capture (i.e., based on the indentification, cpature, and delivery of the changes made to enterprise data sources).

Data Integration Techniques Enterprise application integration (EAI) provides a vehicle for pushing data from source systems into the data warehouse. EAI involves integrating application functionality and is focused on sharing functionality (rather than data) across systems, thereby enabling flexibility and reuse. Traditional EAP focuses on enabling application reuse at the programming level, whereas modern EAP uses services-oriented architecture (SOA). Enterprise information integration (EII) proposes real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases. EII tools use predefined metadata to populate views that make integrated data appear relational to end users. XML seems to be the most appropriate tool to define metadata.

Extraction, Transformation, and Load (ETL) ETL is the heart of DW. ETL is composed of –Extraction: reading data from one or more databases, –Transformation: converting the extracted data from its previous form into the form in which it needs to be so that it can be placed into a data warehouse or simply another database, and –Load: putting the data into datawarehouse. Transformation occurs by using rules or lookup tables or by combining the data with other data. The three database functions are integrated into one tool to pull data out of one or more databases and place them into another, consolidated database or a data warehouse. ETL tools transport data between sources and targets, document how data elements (e.g. metadata) change as the move between source and target, exchange metadata with other application as needed, and administer all runtime processes and operations (e.g., scheduling, error management, audit logs, statistics).

ETL Process

DW Development Approaches The inmon Model: The EDW Approach The Kimbell Model: The Data Mart Approach

DW Structure: Star Schema

DW Implementation Issues The following is a list of major tasks that could be performed: Establishment of service-level agreements and data-refresh requirements Identification of data sources and their governance policies Data quality planning Data model design ETL tool selection Relational database software and platform selection Data transport Data conversion Reconciliation process Purge and archive planning End-user support

DW Implementation Issues Some of the best practices for implementing a data warehouse: The project must fit with corporate strategy and business objectives. There must be complete buy-in to the project by executives, managers, and users. It is important to manage user expectation about the completed project. The DW must be built incrementally. Build in adaptability. The project must be managed by both IT and business professionals. Develop a business/supplier relationship. Only load data that have been cleansed and are of a quality understood by the organization. Do not overlook training requirements. Be politically aware.

DW Implementation issues Risks in implementing DW: No mission or objective Quality of source data unknown Skills not in place Inadequate budget Lack of supporting software Source data not understood Weak sponsor Users not computer literate Political problems or turf wars Unrealistic user expectations Architectural and design risks Scope creep and changing requirements Vendors out of control Multiple platforms Key people leaving the project Loss of the sponsor Too much new technology Having to fix an operating system Geography distributed environment Team geography and language culture

DW Implementation Issues When developing a DW, the followings are “not to” things: Starting with the wrong sponsorship chain Setting expectations that you cannot meet and frustrating executives at the moment of truth Engaging in politically naïve behavior Loading the warehouse with information just because it is available Believing that data warehousing database design is the same as transactional database design Choosing a data warehouse manager who is technology oriented rather than user oriented. Focusing on traditional internal record-oriented data and ignoring the value of external data and of text, images, and, perhaps, sound and video. Delivering data with overlapping and confusing definitions. Believing promises of performance, capacity, and scalability. Believing that your problems are over when the data warehouse is up and running. Focusing on ad hoc data mining and periodic reporting instead of alerts.

Massive DW and Scalability Scalability is the ability of a system to maintain its level of performance when the size of input grows. When the size of input in DWs grows, DWs must be able to handle the number of users and the complexity of their queries. Wal-Mart use hundreds of terabytes of data to study sales trends and other tasks. US Defence use 5 petabyte DW to maintain the medical records of its 9 million personnel. Good scalability of DW means that queries and other data access functions will grow (ideally) linearly with the size of the DW.

Real-Time Data Warehousing Traditionally, DWs work mainly on historical data to support strategic and tactical decision making. For many business, making fast and consistent decisions across the enterprise requires real-time data warehousing. Decision support has become operational. Real-time data warehouse (RDW), also known as active data warehouse (ADW), is the process of loading and providing data via the data warehouse as they become available. At a basic level, ADW simply report what happened. At the next level, some analysis occurs. As the system evolves, it provides prediction capabilities, which lead to the next level of operationalization. At its highest evolution, ADW is capable of making events happen (e.g. activities such as creating sales and marketing compaigns or idnetifying and exploiting opportunities).

Traditional vs Active DW Strategic, tactic and Operational decisions Results measured with operations Only comprehensive detailed data available within minutes is acceptable High number (1000 or more) of users accessing and querying the system simultaneously Flexible ad hoc reporting, as well as machine-assisted modeling (e.g., data mining) to discover new hypotheses and relationships Operational staffs, call centers, external users Strategic decision only Results sometimes hard to measure Daily, weekly, monthly data currency acceptable; summaries often appropriate Moderate user concurrency Highly restrictive reporting used to confirm or check existing processes and patterns; often uses predeveloped summary tables or data marts Power users, knowledge workers, internal users

DW Administration and Security Issues Establishing effective corporate security policies and procedures. An effective security policy should start at the top, with executive management, and should be communicated to all individuals within the organization. Implementing logical security procedures and techniques to restrict access. This includes user authentication, access controls, and encryption technology. Limiting physical access to the data center environment. Establishing an effective internal control review process with an emphasis on security and privacy.