Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.

Similar presentations


Presentation on theme: "Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono."— Presentation transcript:

1 Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

2  The issues associated with designing a data warehouse database  A technique for designing a data warehouse database called dimensionality modeling  How a dimensionality modeling differs from a an Entity-Relationship (ER) model.  A step-by-step methodology for designing a data warehouse database.  Criteria for assessing the degree of dimensionality provided by a data warehouse.

3  Highly complex.  Beginning with answering questions such as:  Which user requirement are most important and which data should be considered first?  Should the project be scaled down into something more manageable yet at the same time provide an infrastructure capable of ultimately delivering a full-scale enterprise-wide data warehouse?  Common Solution  Data marts.

4  A logical design technique that aims to present the data in a standart, intuitive from that allows for high-performance access.  Dimensionality modeling uses the concept of ER modeling with some important restrictions, i.e:  Every Dimension Model (DM) is composed of: ▪ Fact Table ▪ One tabel with a composite primary key ▪ Dimension Table ▪ Has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table.   Star Schema

5  The star schema exploit the characteristics of factual data such that facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed.  Aka star join “A logical structure that has a fact table containing factual data in the centre, surrounded by dimension tables containing reference data (which can be normalized)”

6

7 Snowflake schema A variant of the star schema where dimension tables do not contain denormalized data Starflake schema A hybrid structure that contains a mixture of star and snowflake schemas

8 1. Choosing the process2. Choosing the grain3. Identifying and conforming the dimensions4. Choosing the facts5. Storing pre-calculations in the fact table6. Rounding out the dimension tables7. Choosing the duration of the database8. Tracking slowly changing dimensions9. Deciding the query priorities and the query modes

9  ER modeling is a technique for identifying relationships among entities.  Goal: to remove redundancy in the data  Unefficient for ad-hoc end-user queries.  Traditional ER modeling does not support the main attraction of data warehousing, namely Intuitive and High performance retriaval of data  A single ER model normally decomposes into multiple DMs.  The multiple DMs are then associated through ‘shared’ dimensions tables.

10  The process (function) refers to the subject of a particular data mart.  Choose the main entities and relationship

11  Deciding exactly what a fact table record represents.  e.g. ProductSales  individual product sales  Only when the grain for the fact table is chosen can we identify the dimensions of the fact table.

12  Dimensions set the context for asking questions about the facts in the table.  A well-built set of dimensions makes the data mart understandable and easy to use.  Identify dimensions in sufficient detail to describe things.  A poorly presented of incomplete set of dimensions will reduce the usefulness of a data mart to an enterprise  If any dimensions occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other.

13  The grain of the fact table determines which facts can be used in the data mart.  All the facts must be expressed at the level implied by the grain.  Additional facts can be added to a fact table at any time provided they are consistent with the grain of the table.

14  Add derivative valuable information that can be calculated from the other facts.

15  Add as many text descriptions to the dimensions as possible.  The text descriptions should be as intuitive and understandable to users as possible.  The usefulness of a data mart is determined by the scope and nature of the attributes of the dimension table.

16

17  The duration measures how far back in time the fact table goes.

18  Three types of SCD:  1. Where a changed dimension attribute is overwritten.  2. where a changed dimension attribute causes a new dimension record to be created  3. where a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of attribute are simulataneously accesible in the same dimension record .

19  Consider physical design issues.  Physical sort order of fact table on disk and the presence of pre-stored summaries or aggregations.  Addministration, backup, indexing performance, and security.

20  Fact tables is where we keep the measurements.  We may keep the details at the lowest possible level. ▪ In the department store fact table for sales analysis, we may keep the units sold by individual transactions at the cashier’s checkout. ▪ Some fact tables may just contain summary data called aggregate fact tables.

21  Concatenated Fact Table Key  Grain or level of data Identified  Data grain is the level of detail for the measurements or metrics  Fully additive measures  Semi-additive measures  Large number of records  Table Deep, Not Wide  Only a few attributes  Sparsity of data  Degenerate dimensions  A Denegenerate dimension doesn’t have a dimension key

22  Look closely at attributes of order_number and order_line.  These are not measures or metrics or facts  Attributes that are neither fatcs nor strictly dimension attributes. E.g, reference number like order numbers, invoice numbers, order line numbers.  Example usage: looking for average number of products per order.

23  Fact tables that really do not need to contain fatcs. They are “factless” fact tables.  e.g. analyzing student attendance:

24  Moving a rapidly changing dimension attribute to the fact table as a degenerate dimension column

25

26

27  Dimensions Hierarchies

28

29  Hierarchies of the store, customer, and product dimensions

30


Download ppt "Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono."

Similar presentations


Ads by Google