1 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Data Warehousing “An Introduction” Dr. Akhtar Ali School of Computing,

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #15.
Data Management for Decision Support Session - 1 Prof. Bharat Bhasker.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
CSE6011 Data Warehouse and OLAP  Why data warehouse  What’s data warehouse  What’s multi-dimensional data model  What’s difference between OLAP and.
Chapter 3 Database Management
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 7 Information System Data Management.
Introduction to Data Warehousing Enrico Franconi CS 636.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Joachim Hammer 1 Data Warehousing Overview, Terminology, and Research Issues Joachim Hammer.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Management for Decision Support Session-2 Prof. Bharat Bhasker.
Database Systems – Data Warehousing
1 California State University, Fullerton Chapter 7 Information System Data Management.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
© 2007 by Prentice Hall 1 Introduction to databases.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 7 Information System Data Management.
1 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing Fundamentals of Data Warehousing Dr. Akhtar Ali School of Computing,
CISB594 – Business Intelligence
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 1.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
Data Warehouses and OLAP Data Management Dennis Volemi D61/70384/2009 Judy Mwangoe D61/73260/2009 Jeremy Ndirangu D61/75216/2009.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Data Warehousing 4 Definition of Data Warehouse 4 Architecture of Data Warehouse 4 Different Data Warehousing Tools 4 Summary.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Warehousing/Mining 1 Data Warehousing/Mining Introduction.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Warehouse and OLAP
Data warehouse.
Data and Applications Security Developments and Directions
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Basic Concepts in Data Management
Instructor: Dan Hebert
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Introduction to Data Warehousing
Data Warehouse and OLAP
Data Warehousing Data Model –Part 1
Data Warehouse.
Data and Applications Security Developments and Directions
Data Warehousing Concepts
Chapter 3 Database Management
Data and Applications Security Developments and Directions
Data Warehouse and OLAP Technology
Presentation transcript:

1 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Data Warehousing “An Introduction” Dr. Akhtar Ali School of Computing, Engineering and Information Sciences

2 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Lecture Outline n New Trends for data/information management  Background  Two Approaches n Data Warehousing (DW)  Definitions and History n DW Architectures  Strategies for building data warehouses n Problems and Issues  Maintenance and Performance n DW Support in database management systems

3 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing 1: New Trends for data/information management n Secondary storage is becoming more and more affordable.  So enterprises keep more and more data  Data replication is becoming widespread to avoid single point of failure n What to do with large volumes of data ?  Decision makers want to get more of data  Decision support systems (DSSs) »Have long execution time »Are CPU-intensive »Involve Statistical Analysis/Analytical queries n Transaction-oriented databases are not suitable for DSSs.  Transactional data usually change rapidly  Database and application servers are already at peak loads  Transactional data is usually normalized while DSSs require summarised and highly aggregated data – and possibly de- normalized

4 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Data Management Past, Present and Future n Past  File Processing (e.g. COBOL)  Network and Hierarchical Databases n Present  Relational, Object-Relational and Object-Oriented Databases  Fragmentation of Information Systems »Subject/User/Application-Driven Transaction Processing Systems »Stand-alone systems e.g.  Manufacturing (Inventory Control)  Finance (Payroll, Stock Management)  Sales Administration (Planning, Suppliers, Daily Sales) n Future  Integration of Data and Applications  Data Exchange, Interoperability and Homogeneity in the presence of Heterogeneity.

5 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Surviving in the Information Jungle n Different interfaces and protocols n Different data models and representations n Duplicate and Inconsistent Information

6 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Solution Integrated Information Store n Integration Systems  Collect and combine information from multiple sources  Provide integrated view and uniform user interface  Support sharing of data and processing capabilities

7 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Two Approaches 1: On-Demand/Query-Driven n On-Demand (Lazy) Data Integration is a kind of Virtual Data Warehouse

8 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Disadvantages of On-Demand Approach n Poor response time due to delay in query processing  Slow or unavailable data sources  Time consuming and complex filtering and integration n Inefficient and potentially expensive for frequent queries n Wrappers compete on resources with local applications at data sources n There are only few notable systems based on this approach e.g.  TAMBIS: Transparent Access to Multiple Bio-informatics Information Systems  SRS: Sequence Retrieval System  OPM (Object Protocol Model) based multi-database tools and query language (OPM-QL)

9 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Two Approaches 2: Data Warehousing n In advance/ Eager data integration n Integrated data is persistently stored in a database – data warehouse for direct querying and analysis

10 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Advantages of Data Warehousing Approach n High performance query processing  Though the information returned may not be most up-to-date n Does not interfere with local data processing at sources  Analytical Querying/Statistical Analysis or On-Line Analytical Processing (OLAP) at warehouse  On-Line Transaction Processing (OLTP) at data sources n Data Persistently Stored at Warehouse  Data at the warehouse can be further re-structured, aggregated, summarized and modified if necessary.  A DW may store historical/archive data. n Data warehousing approach has been widely used e.g.  The Maryland ADMS Project  Supporting Data Integration and Warehousing Using H2O  The Stanford Data Warehousing Project  GIMS: Genome Information Management System  Marks & Spencer Data Warehouse

11 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Trade-off between Query-Driven and Data Warehousing Approaches n Query-driven approach is still better for:  Rapidly changing information/data sources;  Accessing very large amounts of data from many sources;  Clients with unpredictable and dynamic requirements n Data Warehousing is more suitable when:  Data sources on which a data warehouse is based are not frequently changing;  Data up-to-dateness is not crucially important;  Querying and Analysis is complex;  Data needs to be highly summarized and aggregated;  Fast access to integrated and derived data is vital; and  Keeping data warehouse consistent with the underlying data sources is efficient and does not compromise on expected performance.

12 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing What is a Data Warehouse? (a practitioner’s viewpoint) n “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context” – Barry Devlin, IBM Consultant n “A data warehouse is a database of data gathered from many systems and intended to support management reporting and decision making” – Michael Corey et al, CTO of OneWarranty.com

13 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Subject Oriented Integrated Time Variant Non Volatile Data Warehouse What is a Data Warehouse? (classical viewpoint) According to W. H. Inmon (Building a Data Warehouse, 1992) “A DW is a subject- oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making.”

14 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing In a Nutshell, a DW is n A persistent collection of diverse data  Generally speaking, an efficient solution to data integration  A single repository of information n Subject-Oriented  Organized by subject (not by application)  Used for analysis, reporting, data mining, etc. n Structured and optimized differently from transaction- oriented databases n User interface aimed at executive – decision makers

15 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Data Warehouse History

16 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Standard DB v. DW Standard Database n Mix of updates and querying n Many small-medium transactions n MBs to GBs in size n Most Current snapshot n Heavily indexed n Raw Data n Thousands of users (e.g. clerical to mid-level-mangers) Data Warehouse n Mostly reads (infrequent updates, append-only – very rarely data is deleted) n Queries are complex and long- running n GBs to TBs in size n Not the most current snapshot/Historical n Lots of scans (as data is readily accessible) n Summarized/Aggregated n Hundreds of users (e.g. decision-makers, analysts)

17 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Architectures (I) Simple n Metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data. Summaries are very valuable in data warehouses because they pre-compute long operations in advance. For example, a typical data warehouse query is to retrieve something like December sales.

18 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Architectures (II) With Staging Area We need to clean and process operational data before putting it into the warehouse. We can do this programmatically, although most data warehouses use a staging area instead. A staging area simplifies building summaries and general warehouse management.

19 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Architectures (III) With Staging Area + Data Marts This is a customized warehouse architecture for different groups within an organization. By adding data marts, which are systems designed for a particular line of business, we can build a more customized DW.

20 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Problems and Issues n Warehouse Maintenance  Data sources (DSs) on which a DW is based may change over time.  Changes at DSs may require changes at a DW.  How often to propagate changes to a DW? »At night, weekly/fortnightly/monthly, immediately, etc.  How to propagate changes to a DW? »Completely re-build all affected tables at the DW (easy but inefficient) »Apply changes to affected tables incrementally (efficient but difficult) n Performance  How to assess if a DW is performing well?  How to improve performance? n Miscellaneous Issues  Data Quality Assurance (How good is data in a DW?)  How to cope with data warehouse evolution?

21 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Data Systems Supporting DW n Oracle 8i, 9i n IBM DB2 n Sybase n RedBrick Data Warehouse/Informix n MS SQL Server n Tandem (HP) n Teradata n MicroStrategy

22 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing BibliographyBibliography n Advanced Topics in Database Systems by Sharma Chakravarthy, 2001, University of Texas at Arlington, USA. n Oracle9i Data Warehousing Guide Release 2 (9.2), n Oracle 8i Data Warehousing by Michael Corey, Michael Abbey, Ian Abramson, Ben Taub, Oracle Press, 2001.