Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse Systems

Similar presentations


Presentation on theme: "Data Warehouse Systems"— Presentation transcript:

1 Data Warehouse Systems
Dimensional Model Gabriel David

2 Building a data warehouse
Build the whole DW at a time Huge task requiring the knowledge of All the legacy systems The meaning of all columns All the management goals Build a fraction at a time, independently Easier but leads to isolated data marts Dimensional Bus Architecture Step by step method Global initial design Connectable data mart at a time implementation

3 Data Mart No longer: highly aggregated subset of a DW too large to de queried But now: natural (area, subject, process) and complete (atomic data) subset of the global DW Must not be isolated Non-connectable data marts are the curse of the DW Worse then loosing an opportunity of a deep organization analysis Perpetuates incompatible views of the organization

4 Method Short initial phase Supervision of data mart building Result
Global planning of the overall architecture Conformed dimensions Normalized facts Supervision of data mart building Only conformed dimensions and normalized facts are used Extracting data from operational sources Transforming data Loading the data mart Result Puzzle which will become an integrated DW

5 Conformed dimension Means the same in all fact tables
Across data marts Ex: client, product, local, time Well defined user key Managed data (cleaned, consolidated) Consistent interfaces and contents Consistent interpretation of attributes and aggregations Anonymous DW key Differente from production key Avoids key collisions Allows the creation of new records Establish a dictionary of conform dimensions Approved by the manager and by the information manager Process reengineering is a possibility

6 Ex1: Accounting operational system
Category category_number category_desc Department acronym name budget Cost_center number description owner Person number name category department Record ref date cost_center person classification amount Classification account current_budget

7 Ex1: Accounting data warehouse
Cost_center cost_center_id number description owner_id owner_number owner_name Classification classification_id account current_budget keys dimension Record person_id classification_id time_id cost_center_id ref amount dimension Person person_id number name category_id category_number category_desc department_id acronym department_name budget Time time_id date day week_day month month_name year fact table fact Category Department Person dimension dimension hierarchy

8 Ex2: Pedagogical inquiry DW
See PDF for the operational model

9 Ex2: Pedagogical inquiry
Conclusions Process analysis Relevant facts: the student answers to each question on each subject/lecturer Relevant dimensions: inquiry, question (dimension, scope), subject (year, program), lecturer (department), quiz Several tables discarded Visual quiz configuration Non-relevant attributes on lecturers and subject occurrence Dereferencing of answer values Data filtering to keep just the relevant lines

10 ER vs dimensional model
Entity relationship Operational systems Transaction oriented Recording single facts Highly normalized Consistently updatable Tables represent entities or associations; user keys Non-queriable Many tables; arbitrary links Dimensional model Decision support systems Analysis oriented Pre-computed aggregations Denormalized No update after load Tables represent numeric facts and complex dimensions; anonymous keys Systematic efficient query Star schema, star join

11 Equivalent? One ER may correspond to many stars
There is no loss of information (include as many stars as necessary) Simplicity in entities and complexity in relations is traded by complexity in dimensions and simplicity in the star schema However, it is common to discard certain operational details

12 Fact tables in 4 steps Method to design a dimensional model
1. The data mart 2. Fact table granularity 3. The dimensions 4. The facts

13 1. The data mart A data mart is a subset of a DW
It is not a mini-DW which, together with other isolated mini-DWs, “by chance” makes up an integrated DW To choose a data mart is to choose a data source Single source: orders, shipings, payments Multiple source: client revenue (profits + costs) One should start with a single source The idea is to reduce the data cleaning and consolidation tasks In the context of the conformed dimensions Data marts are combined in a second phase

14 2. Fact table granularity
Clearly define the meaning of a fact record Rule: granularity should be as fine as possible Not to loose information Get a more robust design Wrt future non-anticipated queries Wrt the addition of new data elements To choose month as the granularity of a data mart for product selling in a store, implies it is not possible to accurately analyze the impact of a 15 days long promotion

15 A fact record is … Each sales transaction
Each compensation asked to the insurance company Each ATM transaction Each daily total product sales Each monthly account balance Each order line Each delivery note line Each risk covered by an individual insurance policy

16 Granularity levels Individual transactions (first three)
Atomic facts, simple structure Arbitrary number (possibly zero) The measure is a single amount Summary, balance, snapshot (next two) Wait for the end of the period (day, month, …) Several measures: total sales, number of transactions (additive), final balance (semi-additive) On the daily case, the snapshot may coincide with an aggregation (this is redundant, by performance reasons) In the monthly balance there may be information meaningful just for the whole month and thus not dispensable

17 Granularity levels (cont.)
Control document items (last three) A fact table record following the whole life of an item Several temporal keys for the several item phases “Status” dimension tracking the evolution Due to the duration of the represented processes, these records are more subject to change than other kinds of facts

18 3. Dimensions Choice determined by the choice of granularity
Usually there is a minimum set of dimensions for a fact table to be understood Ex. Order item: order date, client, product, and order number (degenerate dimension) Many other dimensions may be added Each extra dimension gets just one value in the primary dimensions context Do not affect granularity Ex. Shipping date, terms of contract, promotions, meteorology

19 Characteristics of dimensions
Best dimension to associate to a set of measures: the one with coarser granularity which still gets a single value For daily facts, choose the day as dimension and not the year, which could have many values Do not choose the hour, which would be too fine, repeating the value Multi-valued dimensions Possible, but complicate questions and reports Require the definition of a way to make them additive, weighing each hypothesis (ex. Several possible diagnostics for a single treatment) New dimension just adds a key in the fact table; applications untouched Ex: add a weather status dimension

20 Granularity of dimensions
Dimension granularity cannot be finer than fact granularity If facts are monthly, the time dimension cannot be the day May be coarser, with no contradiction Ex. use the ‘brand’ for the dimension product, instead of the specific reference Loose information but without logical incoherence

21 4. The facts The facts must be specific of the chosen granularity, which determines their scope Individual transaction tables One fact (one column, besides keys), the amount Snapshot tables Several facts, several measures, extendable to new summaries Item tracking tables Several facts (ex. quantities, gross and liquid amounts) Do not mix aggregate facts or facts with other granularities Aggregations are kept in separate records and tables Avoid misleading analysis tools

22 Completing the selection
Fact table: set of simultaneous measures at a certain granularity Numeric measures are more useful but may be textual Define the measures, sometimes imposed by the operational system, and the respective dimensions Add all the available dimensions Specially if they get a single value for the measure context Do not take the “user needs” as the starting point Instead, study the “reality” of the organization (physical perspective) to become less dependent on subjectivity

23 Evolution of the dimensions
Situation: a person may change name but does not change ID card number (user key) Three answers Type 1 – change the attribute name in the dimension History is lost Error correction Type 2 – creates a new record in the dimension with a new anonymous key “Uniqueness” of natural key is lost Detailed tracking of evolution (start and end dates) Type 3 – creates an old name attribute in the dimension and keeps the last value Limited history Partition on time is fuzzy


Download ppt "Data Warehouse Systems"

Similar presentations


Ads by Google