Data Management for Decision Support Session-3 Prof. Bharat Bhasker.

Slides:



Advertisements
Similar presentations
Chapter 1: The Database Environment
Advertisements

Database Architectures and the Web
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Data Warehouse Design Enrico Franconi CS 636. CS 3362 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...
Data - Information - Knowledge
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #15.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Managing Data Resources
CSE6011 Data Warehouse and OLAP  Why data warehouse  What’s data warehouse  What’s multi-dimensional data model  What’s difference between OLAP and.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
SLIDE 1IS 257 – Spring 2004 Data Warehousing University of California, Berkeley School of Information Management and Systems SIMS 257: Database.
SLIDE 1IS 257 – Fall 2011 Data Warehousing University of California, Berkeley School of Information IS 257: Database Management.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
SLIDE 1IS Fall 2002 Data Warehousing University of California, Berkeley School of Information Management and Systems SIMS 257: Database.
Components and Architecture CS 543 – Data Warehousing.
Ch1: File Systems and Databases Hachim Haddouti
© 2007 by Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
10/30/2001Database Management -- R. Larson Data Warehousing University of California, Berkeley School of Information Management and Systems SIMS 257: Database.
Introduction to Data Warehousing Enrico Franconi CS 636.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Chapter 1: The Database Environment
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Chapter 3 Database Architectures and the Web Pearson Education © 2009.
Joachim Hammer 1 Data Warehousing Overview, Terminology, and Research Issues Joachim Hammer.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Database Systems – Data Warehousing
Database Design - Lecture 1
1 Chapter 3 Database Architecture and the Web Pearson Education © 2009.
Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.
Database Architectures and the Web Session 5
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Chapter 1 In-lab Quiz Next week
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
© 2007 by Prentice Hall 1 Introduction to databases.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 9 th Edition Jeffrey A. Hoffer,
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
SLIDE 1IS 257 – Fall 2014 Data Warehousing University of California, Berkeley School of Information IS 257: Database Management.
1 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing Fundamentals of Data Warehousing Dr. Akhtar Ali School of Computing,
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
7 Strategies for Extracting, Transforming, and Loading.
1 Carga de DW como Wrkf Presentación sobre el artículo: Modeling Data Warehouse Refreshment Process as a Workflow Application M. Bouzeghoub, F. Fabret,
Two-Tier DW Architecture. Three-Tier DW Architecture.
Object storage and object interoperability
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Plan for Populating a DW
Databases and DBMSs Todd S. Bacastow January 2005.
Data Warehousing University of California, Berkeley
Data Warehouse—Subject‐Oriented
Data Warehouse.
Data Warehouse Design Enrico Franconi CS 636.
Basic Concepts in Data Management
Data, Databases, and DBMSs
Instructor: Dan Hebert
Introduction to Data Warehousing
Chapter 1: The Database Environment
The Database Environment
Metadata The metadata contains
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Data Management for Decision Support Session-3 Prof. Bharat Bhasker

Types of Data Business Data - represents meaning –Real-time data (ultimate source of all business data) –Reconciled data (cleaned) –Derived data (extracted + summaries) Metadata - describes meaning –Build-time metadata –Control metadata –Usage metadata Data as a product* - intrinsic meaning –Produced and stored for its own intrinsic value –e.g., the contents of a text-book

Data Warehouse Architectures: Conceptual View Single-layer –Every data element is stored once only –Virtual warehouse Two-layer –Real-time + derived data –Most commonly used approach in industry today “Real-time data” Operational systems Informational systems Derived Data Real-time data Operational systems Informational systems

Three-layer Architecture: Conceptual View Transformation of real-time data to derived data really requires two steps Derived Data Real-time data Operational systems Informational systems Reconciled Data Physical Implementation of the Data Warehouse View level “Particular informational needs”

Data Warehousing: Two Distinct Issues (1 ) How to get information into warehouse “Data warehousing” (2) What to do with data once it’s in warehouse “Warehouse DBMS” Both rich research areas

Issues in Data Warehousing Warehouse Design Extraction –Wrappers, monitors (change detectors) Integration –Cleansing & merging Warehousing specification & Maintenance Optimizations Miscellaneous (e.g., evolution)

Data Extraction Source types –Relational, flat file, WWW, etc. How to get data out? –Replication tool –Dump file –Create report –ODBC or third-party “wrappers”

Wrapper pConverts data and queries from one data model to another pExtends query capabilities for sources with limited capabilities Data Model B Data Model A Queries Data Queries Source Wrapper

Wrapper Generation Solution 1: Hard code for each source Solution 2: Automatic wrapper generation Wrapper Generator Definition

Data Transformations Convert data to uniform format –Byte ordering, string termination –Internal layout Remove, add & reorder attributes –Add key –Add data to get history Data Conflicts Sort tuples

Monitors Goal: Detect changes of interest and propagate to integrator How? –Triggers –Replication server –Log sniffer –query results –snapshots/dumps

Data Integration Receive data (changes) from multiple wrappers/monitors and integrate into warehouse Rule-based Actions –Resolve inconsistencies –Eliminate duplicates –Integrate into warehouse –Summarize data –Fetch more data from sources (wh updates)

Data Cleansing Find (& remove) duplicate tuples –e.g., Jane Doe vs. Jane Q. Doe Detect inconsistent, wrong data –Attribute values that don’t match Patch missing, unreadable data Notify sources of errors found

Warehouse Maintenance Warehouse data  materialized view –Initial loading –View maintenance

Differs from Conventional View Maintenance... Warehouses may be highly aggregated and summarized Warehouse views may be over history of base data Process large batch updates Schema may evolve

Differs from Conventional View Maintenance... Base data doesn’t participate in view maintenance –Simply reports changes –Loosely coupled –Absence of locking, global transactions –May not be queriable

Warehouse Maintenance Anomalies Materialized view maintenance in loosely coupled, non-transactional environment Simple example SalesEMP. Integrator Data Warehouse Sale(item,clerk)Emp(clerk,age) Sold (item,clerk,age) Sold = Sale Emp

Warehouse Maintenance Anomalies 1. Insert into Emp(Mary,25), notify integrator 2. Insert into Sale (Computer,Mary), notify integrator 3. (1)  integrator adds Sale (Mary,25) 4. (2)  integrator adds (Computer,Mary) Emp 5. View incorrect (duplicate tuple) SalesEmp. Integrator Data Warehouse Sale(item,clerk)Emp(clerk,age) Sold (item,clerk,age)

Maintenance Anomaly - Solutions Incremental update algorithms (ECA, Strobe, etc.) Research issues: Self-maintainable views –What views are self-maintainable –Store auxiliary views so original + auxiliary views are self-maintainable

Self-Maintainability: Examples Sold(item,clerk,age) = Sale(item,clerk) Emp(clerk,age) Inserts into Emp If Emp.clerk is key and Sale.clerk is foreign key (with ref. int.) then no effect Inserts into Sale Maintain auxiliary view: Emp-  clerk,age (Sold) Deletes from Emp Delete from Sold based on clerk

Warehouse Specification (ideally) Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Integrator Warehouse... Metadata Warehouse Configuration Module View Definitions Integration rules Change Detection Requirements

Client Server Computing Cooperative processing- Request Response Model RPC Vs. Protocols Client Server architecture can be based on software only, meaning the Client and Server components can be residing and sharing the same processors. Client Server Client

Client Server Computing Three Tier Architecture Client Data Servers Server Unix Server Netware Server Win/NT Application Servers Client

Server Requirements Multi-user/ Concurrent Access Scalability- Given increase in the size of problem, no of users etc. can be handled by increasing proportionate resources without deterioration in performance. If problem P is solved by current resources in time T. If problem size is N*P and resources are augmented to N times the time taken is T1. Scalability is T/T1. ( Ideally 1) Speedup- Problem P in T. Increase resources to M times, Time taken is T1. Speed up is T/T1 (Ideal value M) Capacity Availability MM, Networking and Communications