Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse Fundamentals

Similar presentations


Presentation on theme: "Data Warehouse Fundamentals"— Presentation transcript:

1 Data Warehouse Fundamentals
Chapter 6: Relational Data Modeling CONCEPTS, PRINCIPLES & APPROACHES-- Paul Chen (Please reference white papers on Data Modeling at Seattle U teaching materials website)

2 Topics Levels of Modeling
Relational Data Modeling—What is it? Types of Models and Pre-Modeling Activities Understand Terms and Terminology –Tool Demonstration Conceptual Data Modeling: What, Why, When, Who? Activity Description Logical Data Modeling: What, Why, When, Who? Activity Description Physical Data Modeling- An Overview Prototyping and RAD

3 Databases & Modeling Databases & Modeling Type of Database New Trend
Constructs Characteristics Relational Database ERD & EER Row/ Column Multi-Dimensional Database Dimensional Modeling OLAP DW Cube Distributed Database Distributed Component Object Model Client Object (DCOM) XML Object-Oriented Database UML Class Diagram Object Object = Data + Operations(Services); Entity = Data only

4 Topic 1: Level of Modeling
Descriptive: The dealer sold 200 cars last month. Operational (OLTP) Primarily Two Dimensional Database System Explanatory: For every increase in 1 % in the interest, auto sales decrease by 5 %. Star Schema Cube Traditional DW (OLAP) Predictive: Predictions about future buyer behavior. Data Mining Cube + sophisticated analytical tools

5 Level of Analytical Processing
Explanatory “WHAT IF” PROCESSING ANALYZE WHAT HAS PREVIOUSLY OCCURRED TO BRING ABOUT THE CURRENT STATE OF THE DATA Predictive Descriptive SIMPLE QUERIES & REPORTS DETERMINE IF ANY PATTERNS EXIST BY REVIEWING DATA RELATIONSHIPS Statistical Analysis/Expert System/ Artificial Intelligence Normalized Tables Dimensional Tables + Query Roll-up; Drill Down Classification & Value Prediction

6 DESCRIPTIVE MODELING Relational Data Modeling using ER Diagram
Conceptual Data Model (Analysis - Requirements Gathering; What’s it?) Logical Data Model (Design-How is it?) Physical Data Model (Implementation)

7 EXPLANATORY MODELING Also called Dimensional Modelling (to be discussed in chapter 7) Ways to derive the database component of a data warehouse Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables.

8 PREDICTIVE MODELING (to be discussed in chapter 10)
Similar to the human learning experience Uses observations to form a model of the important characteristics of some phenomenon. Uses generalizations of ‘real world’ and ability to fit new data into a general framework. Can analyze a database to determine essential characteristics (model) about the data set.

9 Topic 2: Relational Data Modeling-What’s it?
A data model is a collection of constructs, business rules and sample data which together supports a dynamic representation of real world objects and events. Constructs: entity-relationship diagrams (conceptual & logical or functional) and tables Business rules: constraints such as referential integrity rules and operators (add, update, delete) Sample Data for verification and prototyping: Verifying the accuracy of the model.

10 Objectives Testing the real system before building it.
Assisting in understanding an organization’s data requirements. Facilitating physical data base design.

11 Types of Models Conceptual Data Model (Analysis - Requirements Gathering; What’s it?) Logical Data Model (Design-How is it?) Physical Data Model (Implementation)

12 Pre-modeling Activities
Data collection phase Facilitation techniques (for ex. JAD session) Roles and responsibilities Tools and repository Naming standards Modeling convention (What methods to use) Data protection/backup and recovery procedures

13 Data Collection Phase *
Sampling and Investigating Hard Data The Needs for Sampling: Containing costs; Speeding up the data gathering; Improving effectiveness; Reducing bias Sampling Design -Four steps: 1. Determine the data to be collected or described 2. Determine the population to be sampled 3. Choose the type of sample 4. Decide on the sample size * Please review chapter 5

14 Kinds of Information Sought in Investigation
Types of hard Data (other than interviewing and Observation) Memos Signs on bulletins boards or in work areas Corporate Web sites Manuals Policy handbooks Record layout

15 Retina Scan “That recent Tom Cruise movie, Minority Report, shows advertising that targets each individual consumer as they pass by the signage. That’s the extreme, but I can see it going that way,” said St. Denis.

16 Five Steps in Interview Preparation
Reading background material Establishing interview objectives Deciding when to interview Preparing the interviewee Deciding on question type and structure

17 Two Types of Questions – Open-End Questions vs. Closed Questions
Closed interview questions Such as “ How many subordinates do you have? Benefits: Getting to relevant data Keeping control over the interview Drawbacks: Failing to obtain rich detail Intimidating the interviewee

18 Three Basic ways of Structuring Interviews
Pyramid Structure: Starting from closed questions, then gradually expand into open territory. Funnel Structure: The reverse of pyramid structure approach. Diamond-Shaped: A combination of the two above structures.

19 JAD (Joint Application Development)
JAD sessions (also called facilitated session) are used to gather information and feedback and confirm the results of requirements gathering. JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users. Advantages: Achieving consensus during the session when multiple sources of information exist, raising and addressing issues or assigning them for resolution, and immediately confirming information.

20 Topic 3: Understand Terms and Terminology
Independent entity Dependent entity Associative Entity Identifying relationship Non-identifying relationship

21 Understand Terms and Terminology
Identifier An attribute distinctly identifies each occurrence of an entity. For ex., bank account Id. , and student Id. Association (relationships) An association is a relationship between two or more entities. Employee works for company Part has item

22 Understand Terms and Terminology
Cardinality(the form of relationship) Associations occur in there forms: one-to-one; one-to-many; many-to-many Tables A table is a two-dimensional representation of data consisting of columns and rows. Primary Key Used to identify entities. Unique identification for a row in a table. Allow no nulls and no duplicates. May be system assigned.

23 Understand Terms and Terminology
Foreign key A foreign key is one or more data elements whose value is based on the primary identifier of another entity, thus allowing the system to ‘join’ and get related information from other entities. The ‘joining’ of different entities in this manner eliminates the need of data repetition and redundancy. Normalization A technique to make sure that the data in a logical model is defined once and only once. Normalization helps minimum data redundancy, and minimize update abnormalities.

24 Topic 4: Conceptual Data Modeling: What?
What is it? 1.  It is a conceptual representation of data without concern for its logical (functional) or physical aspects. It is a set of high-level business data models which provides a framework for the data modeling activities at the next level.

25 Conceptual Data Modeling: When?
When should it be done? In support of the data requirements of a process model under development at the corresponding level. or Outside the system application lifecycle on a department, division, or company wide basis.

26 Conceptual Data Modeling: Who?
Who should do it? 1. The group responsible for assuring that data structure reflects business policies and rules. 2. It should be a joint effort between the owners and custodians of the data, the users of the data, and the analysts.

27 Conceptual Data Modeling: Why?
Documents the type of data (information) which must be represented in a system independent of specific application, organizations, or technology. Maximizes data sharing; minimizing redundancy. Provides foundations for physical database design. Describes the unique business enterprise specifically.

28 Conceptual Data Modeling: Why?
Outside of application life cycle on a company-wide basis. Data modeling expresses inherent associations which are the most part, independent of anyone one application. Data entities change very little even through the way they are used can change for each application. A complete maintained conceptual data model should shorten the requirements definition phases of system development life cycle.

29 Data Modeling: Approach
Data partitioning Use a top-down approach to define the data requirements of a system. The purpose is to divide and conquer (from subject to entity), and to evolve from the conceptual level to logical level until physical database is derived. Standard deliverables For each of the levels, there is a set of standard deliverables that must be produced. The documentation items must be well defined so that the data at each level is well understood.

30 Data Partitioning Via Modeling
(How) (What, Why, Who, Where) Conceptual Level Subjects Technical considerations Entities Relationships Logical Level Data Elements Frequencies Physical Level Data Definition Language -DDL(create, Alter, drop tables) Data Manipulation Language (select, insert, delete, update)

31 Conceptual Data Modeling --Activity Description
1. Define the system boundary – Data Context Diagram. 2. Partition a subject into entities. 3. Discover entities -super-type and subtype; part and whole. 4. Define associations between entities. 5. Define major attributes. 6. Define unique identifier for the entity. 7. Assign cardinalities and set up relationships between the entities. 8. Validate the model with the users.

32 1. Define the System Boundary by Subject Context Diagram
A subject (a group of entities with strong affinity) is a class of data objects representing the mission and resources of the organization. “Subjects” provides mechanisms for controlling how much of a model a reader (user, analyst, manager) is able to consider and comprehend at a time. For a small system, go directly to define entities. For ex: Library (subject) decomposed into book, member, and account. (entities).

33 ATM ERD –Subject Context Level
Customer ATM uses Has Owns Bank Consortium Account Affiliated Bank Holds Consists of

34 Data (Subject) Context Diagram
A Data context diagram is a special case of ERD in which a single diagram represents the problem domain in terms of data requirements. For example: The ATM diagram illustrates and highlights Several important characteristics of the system: The people and organization with which the system communicates. It documents the significant connections (relationships) between the data objects within as well as outside the problem domain.

35 2. Partition a Subject into Entities
Criteria for partitioning a subject into entities are: 1. The entities included in a subject all tend to describe the subject and have a strong affinity with the subject. The entities of a subject should be of equal importance, as measured by the range, complexities or importance of their data. Each entity should belong to only one subject.

36 3. Discover Entities Identifying data objects via business event analysis An individual stimulus from one data object to another is an event. Events are discovered by investigating the external influences that act upon a system, and the data transformation that occurs within a system that converts inputs into outputs. Thus, source and target data objects that interact with an event can be identified. For example, an event “customer buys a product” as depicted below is initiated by the data object “customer” who requires a response from the data object “sale”, “product”, and “payment”.

37 Customer buys A Product
Sale Customer buys A Product Payment Product

38 Types of Entities To find potential entities (entities are nouns), look for: Objects can be generalized or specialized The entities of a subject should be of equal importance, as measured by the range, complexities or importance of their data. Each entity should belong to only one subject

39 Subtype and Super-type
Faculty Full Time Faculty Part Time Faculty Part (day) Time Faculty Part (Night)Time Faculty

40 Generalization & Specialization
Aircraft Specialization Generalization Commercial Military 747 777 B52 B-1B

41 4. Define Associations Between Entities
Use a verb to describe associations between two or more entities that the user wants to keep track of. Each relationship must be specific as possible so that its meaning is clear. An individual owns a building. Owns: possession, rental; or management?

42 5. Define Major Attributes
All the attributes of an entity must have meaning for each and every one of the occurrence of the entity. Only elementary data are included in the model. Attributes resulting from process algorithms should not be included in the model. For ex. Entity ‘individual’ has attributes such as name, address; sex (male or female); no. of dependents, etc. But Derived attributes (such as percentage) are not attributes.

43 6. Define Unique Identifier for the Entity
This is an attribute that unambiguously identifies each occurrence of each entity. Some entities do no have their own identifier. These entities are qualified as dependent or week entities. The identifier of the entity to which the dependent entities are associated with must be used to uniquely identify their occurrences. For ex., a ‘child’ entity must have his or her parent identifier to be uniquely identified.

44 7. Assign Cardinalities & Set Up Relationships Between the Entities
Cardinality is the minimum and maximum number of times an occurrence of an entity occurs in relationship to another entity. The minimum number: 0 or 1 The maximum number: 1 or M There are three types of associations: One-to-one; one-to-many; many to many.

45 Types of Relationships By Degree (# of Attributes the Relation Contains)
The relationships (representing business rules) could be either as binary, recursive, or ternary. Doctor Patient Binary Part Order Ternary Part/Order Recursive Organization

46 Recursive Association
A recursive association is one in which there is a relationship between An entity and itself. Information Dev Accounting Engineering Payable Product Receivable Facility Nuclear Coal

47 8. Validate the Model with the Users
The approach requires that the users be involved at the outset of the data modeling activity until the end of its implementation. They work side-by-side with system developers, providing input and validating the accuracy of the data requirements. This will ensure that the delivered end-product meets the users’ need.

48 Validate the Model with the Users
To identify and document the integrity constraints given in the user’s view of the enterprise. This includes identifying: Required data Referential integrity Attribute domain constraints Enterprise constraints Entity integrity

49 Validate the Model With These Check Points
Attribute allocation Rules followed Cardinality necessity Relationship necessity Achievement of organization goals Contributions to expected benefits

50 Topic 5: Logical Data Modeling: What, Why, When, Who
Topic 5: Logical Data Modeling: What, Why, When, Who? And Activity Description What is it? It is a representation of data required to support the complete business needs for a particular business area, system or project. It is a set of data models that provides a framework for the physical database construction activities. It is a graphical representation of data objects that shows the relationship between the tables, views and functional core services used by modules in the application system.

51 Logical Data Modeling: What, Why, When, Who?
Why do it? Document the type of data which must be represented in a system with regard to specific system applications, organizations, or technologies. Assist in the orderly creation of a physical database design. Specifically describe the unique business enterprise. Accelerate and clarify communications between the functional analysis and DBA’s.

52 Logical Data Modeling: What, Why, When, Who?
When should it be done? Part of the system application lifecycle. In parallel with process modeling activities. Upon the completion of the conceptual data model to produce a first-cut database design that includes definitions of tables, columns, and constraints.

53 Logical Data Modeling: What, Why, When, Who?
Who should do it? The group responsible for ensuring that data structure reflects business data requirements. It should be a joint effort between the functional analysts and data administrators.

54 Logical Data Modeling: What, Why, When, Who?
Benefits Provide a definition of the data architecture of how the target system will be implemented. Model parts of the database schema that show how data structures are related to the processes. Provide program designers with the detail for the part of the database design that their modules use.

55 Logical Data Modeling: Activity Description
1. Define Data Architectural Standards 2. Position the conceptual data modeling and revise the definitions of the entities. 3. Define integrity rules for entities and relationships and Apply normalization rules to each entity. 4. Complete and standardize the data elements. Package the model for physical data modeling and system construction. 6. Evaluate quality of data for conversion. 7. Validate and Verify the Data Model

56 1. Define Data Architectural Standards
Data Access/retrieval guidelines Naming convention; SQL coding standards; Data integrity (package; trigger;commit; rollback) Error handling; record locking rule; update collision. Data Security Data access rule; data separation rule; Data recovery/backup Data base refresh/performance tuning

57 2. Position the Conceptual Data Modeling and Revise the Definitions of the Entities
Position the conceptual data modeling and revise the definitions of the entities. By taking architectural standards into consideration, the complete CDM is reexamined. As a result, new entities and relationships my be discovered. Partition data into entities at the table level.

58 Continued Map Local Conceptual Data Model to Local Logical Data Model
To refine the local conceptual data model to remove undesirable features and to map this model to a local logical data model. This involves: (1) Remove M:N relationships. (2) Remove complex relationships. (3) Remove recursive relationships. (4) Remove relationships with attributes. (5) Remove multi-valued attributes. (6) Re-examine 1:1 relationships. (7) Remove redundant relationships.

59 Removing M:N Relationship

60 Removing Complex Relationship

61 Removing Recursive Relationship

62 Removing Relationship with Attribute

63 Removing Multi-valued Attribute

64 Re-examine 1:1 relationships. Remove redundant relationships

65 3. Define Integrity Rules and Apply Normalization Rules
Integrity rules for entities indicate the context in which an entity occurrence may be created, modified, or deleted. They also ensure that the entity is consistent with other entities. This is accomplished by placing referential attributes in each appropriate entity on the model. For example, a Client (entity) holds an Account (entity). A client cannot be deleted if at least one of his accounts has a balance greater than 0. Apply normalization rules to each entity.

66 Formalizing a One-to-one Relationship with Referential Attribute
Husband Wife *Husband name Other attributes *Wife name Other attributes Husband name Married to Referential Attribute

67 Formalizing a One-to-Many Relationship with Referential Attribute
Dog Dog Owner * Dog Id Other attributes Dog Owner Id (1:M) (1:1) * Dog Owner Dog Owner Id Other attributes Referential Attribute

68 Formalizing a Many-to-Many Relationship with Referential Attribute
Part Order *Part Id Other attributes *Order No Other attributes Order/Part *Order No *Part Id Other attributes An associative entity may Participate in relationship With other entity. Referential Attributes

69 A Many-to-Many Relationship
A many-to-many relationships will result in the creation of a new entity. Order Order # Part Part # 1:M 1:M Part/Order Part #/Order #

70 Referential Integrity
Three options: Restrict: A primary key can not be deleted if there are any dependent foreign key rows. Cascade: Deleting a primary key row causes the deletion of all dependent foreign key rows. Set Null: Deleting a primary key row causes all dependent foreign keys values to be set null.

71 Apply Normalization Rules
A technique to make sure the data in a logical data models is defined once and only once. Normalization helps minimum data redundancy, and minimize update abnormalities. Three forms: First Normal Form Second Normal Form Third Normal Form

72 Normalization First Normal Form: Relationships between primary key and each attribute must be one-to-one; ie., remove repeating group. Second Normal Form: All non-key elements are dependent upon the entire primary key rather than any part thereof. Third Normal Form: Elimination of the dependence of non-key field upon any other field excepts the primary keys.

73 PK: Primary Key FK: Foreign Key NN: No Null ND: No duplicate
Part Order Relationship Order/Part OrderNo (PK) part-no Qty part-name PK PK NN ND FK 1 123 Nut 3 5 Bolt

74 First Normal Form Item Table
Qty-Store-1 Qty-Store-2 Qty-Store-3 Item No PK 3000 101 4000 5000 The above is an violation of first normal form because there exists a repeated group.

75 Rule Number 1 For each occurrence of an entity, there is only one and only one value for each its attributes. Attributes with repeating values form at least one new entity. N other words, relationship between primary key and each attribute must be one-to-one.

76 Possible Solution + Store Store/Item Store ID Store ID Item- No Qty
Sold + PK PK FK FK S1 S1 101 3000 S2 S2 102 4000

77 Second Normal Form Student/Course Course Name Course No Student No
Grade Teacher code PK + FK FK FK 3.0 ST01 100 Math T2 Lee ST02 200 4.0 T1 CS Doe Both course name and student name should be removed because They are not related to the entire student/course primary key.

78 Possible Solution Student Student/Course Student No Course Student
Name Student Student Name Course No Student/Course

79 Rule Number 2 Each attribute must be related to the entire primary key.

80 Second Normal Process Order Part Order/Part Part Name Order No
Order-Dt PartNo Pt-price PK PK 1/2/01 1 1 Nut 1.5 1/3/01 5 3 Bolts 2.0 Order/Part Order No Partno QTY How about Putting PartName In Order/part Table? PK + 1 123 1 1 5 3 123

81 Third Normal Form COURSE Course Id Course Name Teacher Code Dept Name
Dept -Id PK T1 MH400 Math DOE A1 Math CS CS401 DB T2 CS Lee The relationship between any two non-primary key components must not be one-to-one. What’s wrong with the above?

82 Rule Number 3 The relationship between any two non-primary key components must not be one-t-one; ie., remove tables within tables.

83 The Normal Process Order Customer
Cust-Name Order ID Order DT Cust-Id Cust-Id PK PK FK 1 Lee 1/2/ 01 1 1 3 Sato 1/5/21 3 5 It would be a violation of third normal form to place cust-name in the order table.

84 Why Reasons: One-to-one relationship between two non-primary key columns (Cus-Id and Cust-name). Redundancy An update anomaly (when a customer name was changed) Worse yet when a new name was added (the name could not be stored until the customer placed at least one order)

85 Identify Integrity Constraints
To identify and document the integrity constraints given in the user’s view of the enterprise. This includes identifying: Required data Referential integrity Attribute domain constraints Enterprise constraints Entity integrity

86 4. Complete and Standardize the Data Elements
As a result of the modeling process via the preceding procedure, an information model will emerge. The information model can be used as input (for ex., via Erwin Tool) to generate a data definition language (DDL) which in turn is used as input for physical data model. The information model (also called logical data model) fulfills the data requirements of the system.

87 Complete and Standardize the Data Elements (Continued)
For each entity, identify the associated list of attributes. For each element, specify the following: Permitted value Coding and editing rules Dimensions Length Value Frequency

88 5. Package the model for physical data modeling and system construction.
To do this, one must have: A normalized entity relationship diagram. A description of table and column definitions. A description of data architecture standards.

89 6. Evaluate Quality of Data For Conversion
If the data modeling is part of re-engineering efforts, we must also document: Condition of the data of the existing system Impacts on the new and enhanced system. Conversion rules

90 7. Validate and Verify the Data Model
Validation (dynamic): Prototyping is used to validate and refine the model. Verification (static): Inspection or walk-through Inspections for entity necessity, relationship necessity, and attribute allocation.

91 7 Validate and Verify the Data Model (continued)
Validate Model against User Transactions To ensure that the logical data model supports the transactions that are required by the user view. (Prototyping is a good tool) Draw Entity-Relationship Diagram To draw an Entity-Relationship (ER) diagram that is a logical representation of the data given in the user’s view of the enterprise.

92 Validate Model Against User Transaction
Example transactions (a) Insert details for new members of staff. (b) Delete details of a member of staff, given the staff number.

93 Topic 6: Physical Data Modeling-An Overview
Step 1 Translate global logical data model for target DBMS Step 2 Design physical representation Step 3 Design security mechanisms

94 Step 1: Translate global logical data model for target DBMS
To produce a basic working relational database schema from the global logical data model Design base relations for target DBMS To decide how to represent the base relations we have identified in the global logical data model in the target DBMS. Design enterprise constraints for target DBMS To design the enterprise constraints for the target DBMS.

95 Step 2 : Design physical representation
To determine the file organizations and access methods that will be used to store the base relations; that is, the way in which relations and tuples will be held on secondary storage. 2.1 Analyze transactions 2.2 Choose file organizations 2.3 Choose secondary indexes 2.4 Consider the introduction of controlled redundancy 2.4 Estimate disk space requirements

96 Step 2 : Design physical representation (Continued)
2.1 Analyze transactions To understand the functionality of the transactions that will run on the database and to analyze the important transactions. 2.2 Choose file organizations To determine an efficient file organization for each base relation.

97 Typical Disk Configuration

98 Analyze Transactions For each Transaction associated with the components of the data model (usually predefined queries including view, trigger, procedure, function and package), it needs to be broken down into further smaller units of work:

99 Transactions Analysis (continued)
A.    Transformation Rules: Describe the rules (R,U,I, D) or algorithms used to transform data received into data generated. B.    Edit and Error Rules: Define the rules validating data received and the method of processing erroneous data. C.    Sequence Analysis: Describe under what conditions this transaction is performed and what rules determine which transaction will be performed next.

100 Cross-referencing Transactions and Relations

101 Transactions Analysis (continued)
D.    Audit Rules: Describe the rules required to audit the activity performed within this transaction. E.    Security Rules: Define the security required to invoke the transaction or various facets of the transaction.

102 Transactions Analysis (continued)
F.    Frequency of execution: Define the number of times this transaction is performed in a fixed period of time. G.    Type of transaction mode: Describe whether the transaction is batch, on demand, or interactive.

103 Example - Sample Transactions
(A) Insert details for a new member of staff, given the branch address. (B) List rental properties handled by each staff member at a given branch address. (C) Assign a rental property to a member of staff, checking that a staff member does not manage more than 10 properties already. (D) List rental properties handled by each branch office.

104 ER Model for Sample Transactions showing Expected Occurrences

105 Analysis of Selected Transaction C

106 Step 2 Design Physical Representation (continued)
2.3 Choose secondary indexes To determine whether adding secondary indexes will improve the performance of the system. 2.4 Consider the introduction of controlled redundancy To determine whether introducing redundancy in a controlled manner by relaxing the normalization rules will improve the performance of the system.

107 Step 2.3 Choose secondary indexes
Data File: The file contains the logical record. Index File: The file contains the index file. The values in the index file are ordered per the indexing field which is usually based on a single attribute.

108 Indexes Primary index: The indexing field is guaranteed to have a unique value. Secondary Index: An index that is defined on a non-ordering field of of the data. Clustering index: If the index field is not a key field of the file, so that there can be more than one record corresponding to a value of the indexing field.

109 Step 2.4 Consider the introduction of controlled redundancy
Simplified Relation with Derived Attribute Duplicating Attribute Setting up Lookup Table Duplicating Foreign Key

110 Step 2 Design Physical Representation (Continued)
2.5 Estimate disk space requirements To estimate the amount of disk space that will be required by the database.

111 Step 3 Design Security Mechanisms
3.1 Design user views To design the user views that were identified in Step 1 of the conceptual database design methodology. 3.2 Design access rules To design the access rules to the base relations and user views.

112 Use Hotel Case for illustration
Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_no Guest_name Guest_address Registration Hotel Room Hotel_No Hotel_name City Room_no Hotel_no Type Price 1:1 1:M Identifying Relationship Dependent Entity (Attribute Entity)

113 Data Partitioning –using Hotel as a case study
(How) (What, Why, Who, Where) (Hotel) Conceptual Level Subjects Technical considerations Entities Relationships Logical Level Data Elements Frequencies Data Definition Language -DDL(create, Alter, drop tables) Data Manipulation Language (select, insert, delete, update)

114 Conceptual Data Modeling (Breaking the Subject Hotel into several entities.
Guest Guest_no What is XML and why would you want to use it? 1:M Books Hotel 1:M Room Hotel_No Has Room_no 1:1 1:M Identifying Relationship Dependent Entity (Attribute Entity)

115 Logical Data Modeling Booking Guest Hotel_no Guest_no Date_from
Date_to Room_no Guest_no Guest_name Guest_address Now that we have XML data, what can we do with it? Hotel Room Hotel_No Hotel_name City Room_no Hotel_no Type Price 1:1 1:M Identifying Relationship Dependent Entity (Attribute Entity)

116 Resolving Referential Attributes & Normalization
(such as TV, Bed) Item Qty_Hotel _no-1 Qty_ Hotel _no -2 Qty_ Hotel _no -3 Item No Buy a Book or cd: website looks like a single application, but is actually made up of several distinct and interacting web services. The online store example Authentication – for user access and authorization Personalization – to adapt web pages to each user’s preference PK 6 101 9 14 The above is an violation of first normal form because there exists a repeated group. Relationships between primary key and each attribute must be one-to-one.

117 Possible Solution + Hotel Hotel/Item Hotel ID Hotel name Hotel ID
No Qty + PK PK Min-nan FK FK H1 H1 101(TV) 6 H2 Xiamen H2 102 5

118 Second Normal Form Room/Hotel Hotel No Room No Price Hotel name Type
100 101 4 Xiamen double Hotel Name should be removed because it is not related to the entire room/hotel primary key. What happens if one of the hotel names is being changed?

119 Third Normal Form The relationship between any two non-primary key components must not be one-t-one; ie., remove tables within tables.

120 Third Normal Form City Hotel/City City ID City name Hotel ID Hotel -
PK PK Las Vegas H0 C1 Circus C1 H1 Flemingo C1 C2 Seattle H2 Holiday C2 What happens if a new City name is added to the Hotel/City Table?

121 What’s the Referential Integrity Hotel vs Room Tables?
Restrict Cascade Set Null

122 Physical Data Modeling
Data Definition Language -DDL(create, Alter, drop tables) Data Manipulation Language (select, insert, delete, update)

123 Data Manipulation Language
SELECT DISTINCT COUNT(Guest_No) FROM Booking WHERE Date_From > 08/01/2000 AND < 08/31/2000);

124 Data Definition Language -DDL
CREATE TABLE hotel ( hotel_no char(18) NOT NULL, hotel_name char(18) NULL ) Note: Primary key is not null.

125 Topic 7: Prototyping Purposes: To validate and refine the model of a system. Characteristics: Prototyping is a discipline of interactive experimentation to stimulate user feedback. Approach: A prototype is a materialization of modeling process by building rapidly and simulating the essential aspects of the system.

126 Software Modeling/Prototyping With Built-in Testing Validation/Project Evaluation & Control
Target System Existing System Physical Model Physical Model Built-In Testing Validation Prototyping Project Evaluation/Control Logical Model Logical Model Approach: Structured or Spiral or Iterative Approach Rapid Application Development CASE tools; Deliverable Templates

127 Types of Prototyping Nonworking Scale Prototyping (Mock-up)
Straw man (exploratory) prototyping (scale model approach) First Full-Scale Prototyping (Real world model)

128 Types of Prototyping Evaluative (mock-up) prototyping (Blue print approach) Use: Visualization; demonstration; inspection Advantages. Reduce risk; low cost; resolve uncertainty Early; fast; flexible How: Focuses on a relatively small number of questions to avoid becoming too complex. Generally thought of a throw-away.

129 Types of Prototyping Straw man (exploratory) prototyping (scale model approach) Use: Experiment; validating and refining the conceptual as well as logical data model. Advantages. Used to improve design; discover things about a proposed system that would otherwise not be revealed. How: Evaluate the impact on work flows; validate ease of use.

130 Types of Prototyping Evolution prototyping (real world model)
Use: Production use. Advantages. Part of production exercise. How: Explore functionality of subsystem and system; probe technical feasibility the performance and the integrity of the system.

131 Stages of Prototyping Planning Initial analysis and design
Construction Tryout Evaluation Disposal

132 Dimensions of Prototyping
Relationships of any prototyping to its eventual system are characterized along four dimensions: Focus – for example, a prototype may focus on the functionality, user interface, system integration, reliability, and performance. Scope– Is a measure of how much of the eventual system the prototype represents.

133 Dimensions of Prototyping
Depth – is a measure of how deeply it represents the behavior of the eventual system. For ex., a shallow prototype of a message system might display ‘canned’ messages, where a deeper prototype might actually perform communications to provide a more realistic surrogate for the eventual system. Scale– Is a measure of how its size or performance compares with that of the eventual system.

134 Use Hotel case for illustration
Guest Hotel_no Guest_no Date_from Date_to Room_no Guest_no Guest_name Guest_address Hotel Room Hotel_No Hotel_name City Room_no Hotel_no Type Price 1:1 1:M Identifying Relationship Dependent Entity (Attribute Entity)

135 Guest Registration Guest Registration

136 Prototyping for Hotel Case
Straw man (exploratory) prototyping (scale model approach)-use Microsoft Access to build a form to simulate the registration of guests. Straw man (exploratory) prototyping (scale model approach)- build a small set of tables to accept the data. Evolution prototyping (real world model) –actually building a mini-system with real tables to simulate the system.

137 Rapid Application Development (RAD)
RAD, or rapid application development, is an object-oriented approach to systems development that includes a method of development as well as software tools

138 RAD Phases There are three broad phases to RAD: Requirements planning
RAD design workshop Implementation

139 Requirements Planning Phase
Users and analysts meet to identify objectives of the application or system Oriented toward solving business problems

140 RAD Design Workshop Design and refine phase
Use group decision support systems to help users agree on designs Programmers and analysts can build and show visual representations of the designs and workflow to users Users respond to actual working prototypes Analysts refine designed modules based on user responses

141 Implementation Phase As the systems are built and refined, the new systems or partial systems are tested and introduced to the organization When creating new systems, there is no need to run old systems in parallel

142 RAD and the SDLC RAD tools are used to generate screens and exhibit the overall flow of the application Users approve the design and sign off on the visual model Implementation is less stressful since users helped to design the business aspects of the system

143 When to Use RAD RAD is used when
The team includes programmers and analysts who are experienced with it There are pressing reasons for speeding up application development The project involves a novel ecommerce application and needs quick results Users are sophisticated and highly engaged with the goals of the company

144 Using RAD Within the SDLC
RAD is very powerful when used within the SDLC It can be used as a tool to update, improve, or innovate selected portions of the system

145 Disadvantages of RAD May try and hurry the project too much
Loosely documented May not address pressing business problems Potentially steep learning curve for programmers inexperienced with RAD tools

146 Final Words Transform data into information by understanding the process Transform information into decisions with knowledge Transform decisions into results with actions


Download ppt "Data Warehouse Fundamentals"

Similar presentations


Ads by Google