Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing Fundamentals of Data Warehousing Dr. Akhtar Ali School of Computing,

Similar presentations


Presentation on theme: "1 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing Fundamentals of Data Warehousing Dr. Akhtar Ali School of Computing,"— Presentation transcript:

1 1 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing Fundamentals of Data Warehousing Dr. Akhtar Ali School of Computing, Engineering and Information Sciences

2 2 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing Lecture Outline 1. Inmon’s Four Characteristics of a DW Subject-Oriented, Integrated, Time Variant and Non-Volatile 2. Some Useful Definitions 3. Issues in Data Warehousing DW Design Extraction Integration Creation and Loading DW Maintenance Querying and Optimization

3 3 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 1.1 Subject-Oriented Data is categorized and stored by business subject rather than by application. Operational Systems Operational Systems Shares Loans Insurance Equity Plans Customer Financial Information Customer Financial Information Data Warehouse Subject Area Savings

4 4 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 1.2 Integrated Data on a given subject is defined and stored once. Data Warehouse Data Warehouse Operational Environment Subject = Customer Savings Application Current Accounts Application Loans Application NoApplicationFlavor

5 5 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 1.3 Time Variant Data is stored as a series of snapshots, each representing a period of time. DataTime 01/03 02/03 03/03 Data for January Data for February Data for March DataWarehouse

6 6 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 1.4 Non-Volatile Typically data in the data warehouse is not directly updated or deleted. Read Load INSERT Read (e.g. SELECT) UPDATEDELETE Operational Databases Warehouse Database

7 7 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 2.1 What is an Operational Data Store? 1. An operational data store (ODS) is the point of integration for operational/transaction-oriented systems.  For example Banks typically have several independent systems set up to support different financial products e.g. loans, checking accounts, savings accounts etc.  The advent of ATMs helped push many banks to create an ODS to integrate current balances and recent transactional data from these separate accounts under one customer number.  Such ODSs are normally kept separate from a DW. 2. An ODS may be seen as a lowest layer of a DW for lower- management to access detailed as well as integrated data.  This means that an ODS may be seen as a front edge of a DW.  Such ODSs are normally kept as part of a DW.

8 8 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 2.2 What is a Data Mart? 1. A data mart (DMT) is a logical subset of a complete DW.  A DMT is a complete “pie-wedge” of the overall DW pie.  A DW is a made up of the union of all its DMTs.  Some people take the definition literally. They create several independent DMTs to meet the needs of several departments.  Will everyone be happy? Well, maybe. There may be serious issues of integrating these DMTs together. 2. A DMT is an extension of a DW.  Data is integrated as it enters the DW. DMTs then derive data from the central source, the DW.  Each department gets its own DMT.  Each department determines which of the data warehouse contents are of interest.  These subject areas are then replicated into the smaller and local DMT so that users can get to the data they want with less interference from other departments.

9 9 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 2.3 What is On-Line Analytical Processing (OLAP) ? n OLAP is complementary to data warehousing. n OLAP embodies general activities of querying and presenting text and number data from DWs. n OLAP is based on dimensional modelling as opposed to entity- relationship (ER) modelling. n A dimensional model may contain the same information as an ER model but packages data in a symmetric form. n A dimensional model is geared towards user understandability and high performance query processing. n ROLAP (Relational OLAP)  A set of user interfaces and applications that give a relational database (RDB) a dimensional flavour. n MOLAP (Multi-dimensional OLAP)  A set of user interfaces, applications and proprietary database technologies that have a strong dimensional flavour.

10 10 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 2.4 What is Data Mining (DM) ? n DM is often defined as “finding hidden information” in a database. n Alternatively, DM is “exploratory data analysis, data driven discovery, and deductive learning”. n Data mining software is a class of tools that apply artificial intelligence techniques to the analysis of data. n Given access to data, DM tools dig through the data looking for patterns and discovering relationships that the user might have never suspected. n DM tools work against an operational database or a DW. n Since data in a DW is usually integrated and summarized it may be more efficient to use it for DM. n But a DM tool may find more useful information from an operational database (compared to a DW) as a DW usually hosts data to support anticipated DSS and may miss out data useful for DM.

11 11 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.0 Generic Data Warehouse Architecture

12 12 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.1 Warehouse Design n Influenced by both maintenance and querying n Many trade-offs  Space vs. update time vs. query performance n Logical model of data  ER vs. Dimensional, Relational vs. OR vs. OO, ROLAP vs. MOLAP n Identify sources of data n Identify warehouse data – what to materialize?  Which summary tables?  Which fact/dimensional tables?  Which indices? n Choose software and hardware

13 13 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.2 Data Extraction n Selecting relevant data from data sources (DSs) and moving it into DW. n DS types  Database (e.g. relational), flat file, WWW, XML, COBOL, etc n How to obtain the data?  Using data replication servers/tools  Dump file or Export tools  ODBC/JDBC/CORBA/RMI/COM and DCOM  Third party Wrappers/Middleware/Agents n Other activities: data transformation, change detection (monitoring), cleansing etc.

14 14 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.2.1 Monitors n Detecting changes (of interest to a DW) in data sources and propagate to DW. n How?  Triggers  Replication servers/tools  Log Sniffer  Compare query results  Compare snapshots/dumps/exported data

15 15 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.2.1 Data Cleansing n Finds and removes duplicate tuples  For example Judie Harris Morris vs. Judie H. Morris n Detect inconsistent or wrong data  Attribute values that do not match (because of wrong data types or violating certain constraints e.g. for Gender attribute a value of ‘N’ meaning neither may be rejected). n Unreadable or incomplete data n Notify DSs of errors found during the cleansing process

16 16 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.3 Data Integration n Receive data (changes) from multiple wrappers/monitors/data cleansers and integrate into DW. n Often Rule-based n Actions  Resolve inconsistencies  Eliminate duplicates  Integrate into DW  Summarize data  Fetch more data from DSs  Notify users that DW is now up-to-date

17 17 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.4.1 Warehouse Loading n Includes all of the previous processes n Similar to loading/populating a database but complex due to heterogeneity of data and dealing with multiple DSs, possibly remote and external.  Building indices  Checking integrity constraints, etc. n Issue: huge volumes of data but small time window to complete the process. n Computation of additional data  Auxiliary data to facilitate DW maintenance and support/speed up querying and analysis.

18 18 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.4.2 Warehouse Creation n A DW can be seen as a collection of materialized views (MVs) over DSs.  Contains a copy of data (collected from DSs) tailored to end-users. n Steps  Create DW schema (e.g. creating fact and dimensional tables, defining MVs)  Load warehouse  Start monitoring for changes at DSs  Update/Maintain DW as needed.

19 19 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.5 DW Maintenance n DSs on which a DW is based may change over time. n Changes at DSs may require changes at a DW. n How often to propagate changes to a DW?  At night, weekly/fortnightly/monthly, immediately, etc. n Off-line or on-line  Most current vendor products take a warehouse off-line during maintenance n How to propagate changes to a DW?  Completely re-build all affected tables at the DW (easy but inefficient)  Apply changes to affected tables incrementally (efficient but difficult) n Read my paper about MOVIE or wait until we discuss this topic in detail.

20 20 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing 3.6 Querying and Optimization n Queries are long-running and complex  Multiple/Nested joins and aggregation  Usually “touch all tuples” kind of queries n Query language or analysis tools must support multi- dimensional operations  Pivot, Slice/Dice, Rollup, Percentile, etc  Standard SQL does not provide adequate operations n Solution: pre-compute partial answers and reuse  Drawback: it may increase DW maintenance n Emergence of warehouse management systems (WHMS e.g. ADMS, WHIPS)


Download ppt "1 Database Administration (CG168) – Lecture 10b: Fundamentals of Data Warehousing Fundamentals of Data Warehousing Dr. Akhtar Ali School of Computing,"

Similar presentations


Ads by Google