Download presentation
Presentation is loading. Please wait.
1
Designing the data warehouse / data marts Methodologies and Techniques
2
Basic principles
3
Life cycle of the DW Operational Databases Warehouse Database First time load Refresh Refresh Refresh Purge or Archive
4
Oracle Warehouse Components Relationaltools Applications/ Web Any Data Any Access Any Source Externaldata Operationaldata OLAPtools Text, image Oracle Medi` Relational / Multidimensional Spatial Audio, video Web
5
Oracle Intelligence Tools IS develops user’s Views Oracle Reports Current Business users Oracle Discoverer Tactical Analysts Oracle Express Strategic
6
Oracle Data Mart Suite Ware- housing Engines Data Modeling Oracle Data Mart Designer Data Management Oracle Enterprise Manager Data Extraction Oracle Data Mart Builder Data Access & Analysis Discoverer & Oracle Reports OLTP Engines OLTP Databases Data Mart Database Oracle8 SQL*PLUS
7
“Big Bang” Approach: Advantages and Disadvantages Advantages: –warehouse built as part of major project (eg: BPR) –Having a “big picture” of the data warehouse before starting the data warehousing project Disadvantages: –Involves a high risk, takes a longer time –Runs the risk of needing to change requirements –Costly and harder to get support for from users
8
Incremental Approach to Warehouse Development Multiple iterations Shorter implementations Validation of each phase Strategy Definition Analysis Design Build Production
9
Benefits of an Incremental Approach Delivers a strategic data warehouse solution through incremental development efforts Provides extensible, scalable architecture Quickly provides business benefits and ensures a much earlier return of investment Allows a data warehouse to be built based on a subject or application area at a time Allows the construction of an integrated data mart environment
10
Data Mart A subset of a data warehouse that supports the requirements of a particular department or business function. Characteristics include: –Do not normally contain detailed operational data unlike data warehouses. –May contain certain levels of aggregation
11
Marketing Sales Finance Human Resources Dependent Data Mart DataWarehouse Data Marts External Data Flat Files Operational Systems Marketing Sales Finance
12
Independent Data Mart Sales or Marketing External Data Flat Files Operational Systems
13
Reasons for Creating a Data Mart To give users more flexible access to the data they need to analyse most often. To provide data in a form that matches the collective view of a group of users To improve end-user response time. Potential users of a data mart are clearly defined and can be targeted for support
14
Reasons for Creating a Data Mart To provide appropriately structured data as dictated by the requirements of the end-user access tools. Building a data mart is simpler compared with establishing a corporate data warehouse. The cost of implementing data marts is far less than that required to establish a data warehouse.
15
Data Marts Issues Data mart functionality Data mart size Data mart load performance Users access to data in multiple data marts Data mart Internet / Intranet access Data mart administration Data mart installation
16
Example of DW tool OLAP Rotate and drill down to successive levels of detail. Create and examine calculated data interactively on large volumes of data. Determine comparative or relative differences. Perform exception and trend analysis. Perform advanced analytical functions for example forecasting, modeling, and regression analysis
17
Original OLAP Rules 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture
18
Original OLAP Rules 6. Multiuser support 7. Unrestricted cross-dimensional operations 8. Intuitive data manipulation 9. Flexible reporting 10. Unlimited dimensions and aggregation levels
19
1001 1007 1010 1020 Relational Database Model 31 42 22 32 FMMFFMMF Anderson Green Lee Ramos Attribute 1 Name Attribute 2 Age Attribute 3 Gender Row 1 Row 2 Row 3 Row 4 The table above illustrates the employee relation. Attribute 4 Emp No.
20
Multidimensional Database Model The data is found at the intersection of dimensions. Store GL_Line Time FINANCE Store Product Time SALES Customer
21
Two dimensions
22
Three dimensions
23
Specialised Multidimensional tool Benefits: –Quick access to very large volumes of data –Extensive and comprehensive libraries of complex functions analysis Strong modeling and forecasting capabilities –Can access multidimensional and relational database structures –Caters for calculated fields Disadvantages: –Difficulty of changing model –Lack of support for very large volumes of data –May require significant processing power
24
MOLAP Server The application layer stores data in a multidimensional structure The presentation layer provides the multidimensional view MOLAP Engine DSS client Application layer Warehouse Efficient storage and processing Complexity hidden from the user Analysis using preaggregated summaries and precalculated measures
25
ROLAP Server The warehouse stores atomic data. The application layer generates SQL for the three- dimensional view. The presentation layer provides the multidimensional view. ROLAP engine DSS client Application layer Warehouse server Multiple SQL
26
MOLAP ExpressServerExpressuserWarehouse Query Data MDDB Periodicload
27
ROLAP ExpressServer Expressuser Warehouse Datacache Livefetch Cache Query Data Also Hybrid (HOLAP)
28
Choosing a Reporting Architecture Business needs Potential for growth interface enterprise architecture Network architecture Speed of access Openness MOLAP ROLAP Simple Complex QueryPerformance Good OK Analysis
29
Data Acquisition Identify, extract, transform, and transport source data Consider internal and external data Perform gap analysis between source data and target database objects Plan move of data between sources and target Define first-time load and refresh strategy Define tool requirements Build, test, and execute data acquisition modules
30
Modeling Warehouses differ from operational structures:Warehouses differ from operational structures: –Analytical requirements –Subject orientation Data must map to subject oriented information:Data must map to subject oriented information: –Identify business subjects –Define relationships between subjects –Name the attributes of each subject Modeling is iterativeModeling is iterative Modeling tools are availableModeling tools are available
31
1.Defining the business model 2.Creating the dimensional model 3.Modeling summaries 4.Creating the physical model Physical model 1 2, 3 4 Select a business process Modeling the Data Warehouse
32
Identifying Business Rules Product Type Monitor Status PC15 inchNew Server17 inchRebuilt 19 inchCustom None Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Store Store > District > Region Time Month > Quarter > Year
33
Creating the Dimensional Model Identify fact tables –Translate business measures into fact tables –Analyze source system information for additional measures –Identify base and derived measures –Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users
34
Dimension Tables Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference ProductChannel Facts (units, price) Customer Time
35
Fact Tables Fact tables have the following characteristics: Contain numeric measures (metrics) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables
36
Dimensional Model (Star Schema) ProductChannel Facts (units, price) Customer Time Dimension tables Fact table
37
Star Schema Model Central fact table Radiating dimensions Denormalized model Store Table Store_id District_id... Item Table Item_id Item_desc... Time Table Day_id Month_id Period_id Year_id Product Table Product_id Product_desc … Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units...
38
Star Schema Model Easy for users to understand Fast response to queries Simple metadata Supported by many front end tools Less robust to change Slower to build Does not support history
39
Snowflake Schema Model Time Table Week_id Period_id Year_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Product Table Product_id Product_desc Item Table Item_id Item_desc Dept_id Sales Fact Table Item_id Store_id Sales_dollars Sales_units Store Table Store_id Store_desc District_id District Table District_id District_desc
40
Snowflake Schema Model Direct use by some tools More flexible to change Provides for speedier data loading May become large and unmanageable Degrades query performance More complex metadata
41
Using Summary Data Provides fast access to precomputed data Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Usually exists in summary fact tables Phase 3: Modeling summaries
42
Designing Summary Tables UnitsSales(€)Store Product A Total Product B Total Product C Total Average Maximum Total Percentage
43
Summary Tables Example SALES FACTS SalesRegionMonth 10,000NorthJan 99 12,000SouthFeb 99 11,000North Jan 99 15,000WestMar 99 18,000South Feb 99 20,000North Jan 99 10,000EastJan 99 2,000WestMar 99 SALES BY MONTH/REGION MonthRegionTot_Sales$ Jan 99North41,000 Jan 99East10,000 Feb 99South40,000 Mar 99West17,000 SALES BY MONTH MonthTot_Sales Jan 9951,000 Feb 9940,000 Mar 9917,000
44
Summary Management in Oracle8i Product Region Time Sales summary City Sales State Summary usage Summary advisor Space requirements Summary recommendations
45
The Time Dimension How and where should it be stored? Time dimension Sales fact Time is critical to the data warehouse. A consistent representation of time is required for extensibility.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.