Download presentation
Presentation is loading. Please wait.
1
Designing the data warehouse / data mart Methodologies and Techniques
2
Basic principles
3
Life cycle of the DW Operational Databases Warehouse Database First time load Refresh Refresh Refresh Purge or Archive
4
Data transfers into a database First time system implementation –From a manual system Data warehousing projects Database version upgrade ERP projects Migration –From old to new system
5
Data transfers between systems Dynamic data (eg. sales orders) –Interface required? Static data (eg. customers) –Conversion required?
6
What can go wrong Data not available –feature activated from implementation onwards –Massive data entry –Eg: different account structure Data incomplete Data inconsistent (eg: engineering vs accounts) Wrong level of granularity Data not clean New system requires changes – new product codes
7
Data cleaning must address Different department record same info under different codes Multiple records of same company (under different names) Fields missing in input tables (eg: c/o) Different depts. Record different addresses for same customer Use of different units for time periods
8
Labour intensive tasks Data entry Data checks Working on solving conflicts Allocating new codes Solution = introduce as much automation as possible –SQL / SQL loader (Oracle) –Custom conversion programmes to extract, modify and upload data –Filtering –Parsing (eg: excel) –Staging areas for conversion in progress
9
Data utilities ORACLE is king of data handling Export: to transfer data between DBs –Extract both table structure and data content into dump file Import: corresponding facility SQL*loader automatic import from a variety of file formats into DB files –Needs a control file
10
Control files: using SQLloader Data tranfers in and out of DB can be automated using the loader –Create a data file with the data(!) –Create a control file to guide the operation Load creates two files –Log file –“bad transactions” file Also a discard file if control file has selection criteria in it
11
Example 1 – the supplier file Sup codeSup nameSup addressCityPhone 4 digits OLD New supplier code to include city where firm is based Assignation of category based on amounts purchased
12
Example 1 – the supplier file Sup codeSup nameSup addressCityPhone 4 digits Sup codeSup nameSup address…PhoneCat 3 letters +1,2,3 depending 4 digitson total purchases last year OLD NEW New supplier code to include city where firm is based Assignation of category based on amounts purchased
13
Example 2 – New Cost Accounting Structure Maintenance department expenditure: 1 account => separate accounts for different production activities Intervention codeDesc.DateLabourPartsTotal OLD
14
Example 2 – New Cost Accounting Structure Maintenance department expenditure: 1 account => separate accounts for different production activities Intervention codeDesc.DateLabourPartsTotal Intervention codeDesc.DatelabourPartsTotalAccount OLD NEW
15
Example 3: merging files Complete customer file based on Accounts and Sales and Shipping OLD (finance) CustIDnameaddresscityaccount numbercredit limitbalance OLD (sales) OLD (Shipping) CustID*nameaddresscitydiscount ratessales_to_daterep_name CustID**nameaddresscityPreferred haulier
16
Example 4: change of business practices Payment by bank draft for international customers Automatic payment into account for national customers Payment direct into account for all customers
17
Data Staging Area The construction site for the warehouse Required by most scenarios Connected to wide variety of sources Clean / aggregate / compute / validate data Extract Transform Operational system Transport (Load) Warehouse Data staging area
18
Remote Staging Model Data staging area within the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Warehouse environment Oper. envt. Data staging area in its own environment, avoiding negative impact on the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Staging envt. Oper. envt. Warehouse envt.
19
Onsite Staging Model Extract Transform Operational system Transport (Load) Data staging area Warehouse Operational environment WH envt. Data staging area within the operational environment, possibly affecting the operational system
20
Data Mart A subset of a data warehouse that supports the requirements of a particular department or business function. Characteristics include: –Do not normally contain detailed operational data unlike data warehouses. –May contain certain levels of aggregation
21
Marketing Sales Finance Human Resources Dependent Data Mart DataWarehouse Data Marts External Data Flat Files Operational Systems Marketing Sales Finance
22
Independent Data Mart Sales or Marketing External Data Flat Files Operational Systems
23
Reasons for Creating a Data Mart To give users more flexible access to the data they need to analyse most often. To provide data in a form that matches the specific needs of a group of users To improve end-user response time. Potential users of a data mart are clearly defined and can be targeted for support
24
Reasons for Creating a Data Mart To provide appropriately structured data as dictated by the requirements of the end-user access tools. Building a data mart is simpler (and much quicker) compared with establishing a corporate data warehouse. The cost of implementing data marts is far less than that required to establish a data warehouse.
25
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support Need a model for organising data into meaningful reports Need specific interfaces for users
26
Extraction Cleaning Transformation Loading Relational Database on a dedicated Server De normalised, data Static Reporting Scrutinising Multidimensional Data Cubes OLAP tools Data Warehouse Source Systems Discovering Data Mining ……. Data Staging Area Exploiting the DW data
27
Multidimensional Models The data is found at the intersection of dimensions. Product P/L_Line Time FINANCE Market Product Time SALES Customer
28
Representing multidimensional data
29
MOLAP Server The application layer stores data in a multidimensional structure The presentation layer provides the multidimensional view MOLAP Engine DSS client Application layer Warehouse Efficient storage and processing Complexity hidden from the user (but NOT from developer) Analysis using preaggregated summaries and precalculated measures
30
ROLAP Server The warehouse stores atomic data. The application layer generates SQL for the three- dimensional view. The presentation layer provides the multidimensional view. ROLAP engine DSS client Application layer Warehouse server Multiple SQL
31
MOLAP ServeruserWarehouse Query Data MDDB Periodicload
32
ROLAP Server user Warehouse Datacache Livefetch Cache Query Data Also Hybrid (HOLAP)
33
Choosing a Reporting Architecture Business needs Potential for growth interface enterprise architecture Network architecture Speed of access Openness MOLAP ROLAP Simple Complex QueryPerformance Good OK Analysis
34
Modeling Warehouses differ from operational structures:Warehouses differ from operational structures: –Analytical requirements –Subject orientation Data must map to subject oriented information:Data must map to subject oriented information: –Identify business subjects –Define relationships between subjects –Name the attributes of each subject Modeling is iterativeModeling is iterative Modeling tools are availableModeling tools are available
35
1.Defining the business model 2.Creating the dimensional model 3.Modeling summaries 4.Creating the physical model Physical model 1 2, 3 4 Select a business process Modeling the Data Warehouse
36
Identifying Business Rules Product Type Monitor Status PC15 inchNew Server17 inchRebuilt 19 inchCustom None Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Store Store > District > Region Time Month > Quarter > Year
37
Creating the Dimensional Model Identify fact tables –Translate business measures into fact tables –Analyze source system information for additional measures –Identify base and derived measures –Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users
38
Dimension Tables Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference ProductChannel Facts (units, price) Customer Time
39
Fact Tables Fact tables have the following characteristics: Contain numeric measures (metrics) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables
40
Dimensional Model (Star Schema) ProductChannel Facts (units, price) Customer Time Dimension tables Fact table
41
Star Schema Model Central fact table Radiating dimensions Denormalized model Store Table Store_id District_id... Item Table Item_id Item_desc... Time Table Day_id Month_id Period_id Year_id Product Table Product_id Product_desc … Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units...
42
Star Schema Model Easy for users to understand Fast response to simple queries Simple metadata Supported by many front end tools Less robust to change Does not support history
43
Using Summary Data Provides fast access to precomputed data Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Usually exists in summary fact tables Phase 3: Modeling summaries
44
Designing Summary Tables UnitsSales(€)Store Product A Total Product B Total Product C Total Average Maximum Total Percentage
45
Summary Tables Example SALES FACTS SalesRegionMonth 10,000NorthJan 99 12,000SouthFeb 99 11,000North Jan 99 15,000WestMar 99 18,000South Feb 99 20,000North Jan 99 10,000EastJan 99 2,000WestMar 99 SALES BY MONTH/REGION MonthRegionTot_Sales$ Jan 99North41,000 Jan 99East10,000 Feb 99South40,000 Mar 99West17,000 SALES BY MONTH MonthTot_Sales Jan 9951,000 Feb 9940,000 Mar 9917,000
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.