Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6 Data Model Design (continued)

Similar presentations


Presentation on theme: "Lecture 6 Data Model Design (continued)"— Presentation transcript:

1 Lecture 6 Data Model Design (continued)
Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National Science Foundation Grants EPS and EPS

2 Objectives Identify and describe important entities and relationships to model data Develop data models to represent, organize, and store data Design and use relational databases to organize, store, and manipulate data

3 Present and discuss your preliminary designs

4 Naming Database Objects
Names should be Unique Have some meaning to the user Short No spaces or reserved characters Entity and Attribute names = nouns Relationship names = verbs Many Observations are made at a Site.

5 More on Attributes Attribute values should be atomic Allows for:
Present a single fact Allows for: simpler programming, greater reusability of data easier to implement changes

6 Atomic Attribute Example
Instead of 1 overloaded attribute: VariableName = “Dissolved Oxygen, mg/L, surface water” You might use three: VariableName = “Dissolved Oxygen” Units = “mg/L” SampleMedium = “surface water”

7 Common Attribute Atomicity Violations
Simple aggregation: Address = “8200 Old Main Hill, Logan, UT, 84322” Complex codes: VariableCode = “DO_mgL_Avg” Text fields: Free form text. Overreliance may mean that data requirements may not be met by the model. Mixed domains: Where the value of an attribute can have different meaning under different conditions.

8 Primary Keys Attribute or set of attributes that uniquely identify a specific instance of an entity (row in the table) Primary keys must: Have a non-null value for each instance of an entity Have a unique value for each instance of an entity Have values that do not change or become null

9 Normalization Organizing the fields and tables in a relational database to minimize redundancy and dependency Dividing large tables into smaller tables (with relationships) Isolate data so that additions, deletions, and modifications of a field or record can be made in one place Reduce the need for restructuring the database as new types of data are introduced

10 Unnormalized Data Example
SiteID SiteName VariableID VariableName DateTime Value 1 Logan River Temperature 1/1/2012 5 1/2/2012 2 pH 8 Spring Creek 7 7.5

11 Issues with Unnormalized Data
SiteID SiteName VariableID VariableName DateTime Value 1 Logan River Temperature 1/1/2012 5 1/2/2012 2 pH 8 INSERT: The fact that a site or variable exists cannot be asserted until a measurement has been made. DELETE: If a row is deleted, information may be lost about not only the measurement, but also the variable and the site. UPDATE: If a SiteName or VariableName changes, multiple records have to be updated with the new information

12 Normalization Example
1 * * SiteID SiteName 1 Logan River 2 Spring Creek SiteID VariableID DateTime Value 1 1/1/2012 5 1/2/2012 2 8 7 7.5 1 VariableID VariableName 1 Temperature 2 pH

13 Normalization Tradeoffs
Pros: Eliminates redundant data Saves space and can improve storage efficiency Inserts and updates are done in one place Can improve efficiency Cons: May complicate the code of common queries Abstracts tables using keys – can be harder for a human to “see” the data

14 Data Integrity Rules Entity Integrity
Primary key must exist, be unique, and not null ValueID SiteID VariableID DateTime Value 101 1 1/1/2012 5 102 1/2/2012 103 2 8 104 105 7 106 107 7.5 108 SiteID SiteName 1 Logan River 2 Spring Creek VariableID VariableName 1 Temperature 2 pH

15 Data Integrity Rules Referential Integrity
Every foreign key value must match a primary key value in an associated table Ensures that we can navigate relationships ValueID SiteID VariableID DateTime Value 101 1 1/1/2012 5 102 1/2/2012 103 2 8 104 105 7 106 107 7.5 108 SiteID SiteName 1 Logan River 2 Spring Creek VariableID VariableName 1 Temperature 2 pH

16 Data Integrity Rules Insert and Delete Rules
What happens to a parent entity when child entities are deleted? What happens to child entities when a parent is deleted? ValueID SiteID VariableID DateTime Value 101 1 1/1/2012 5 102 1/2/2012 103 2 8 104 105 7 106 107 7.5 108 SiteID SiteName 1 Logan River 2 Spring Creek VariableID VariableName 1 Temperature 2 pH

17 Data Integrity Rules Value Domains
Valid set of values for an attribute Controlled vocabulary, data type, length, range, constraints, default value Integer Fields Date Field Double Controlled Domain ValueID SiteID VariableID DateTime Value 101 1 1/1/2012 5.5 102 1/2/2012 5.678 103 2 8.0 104 8.9 VariableID VariableName 1 Temperature 2 pH

18 Specialization Designating entity subgroups within a higher level entity Entity subgroups have attributes or relationships that do not apply to the higher level entity Attributes are inherited A lower level entity inherits all of the attributes and relationship participation of the higher level entity to which it is linked

19 Specialization Example
A car is a vehicle A truck is a vehicle

20 Generalization Combine a number of entities that share features into a higher level entity Specialization and generalization are inversions of each other Specialization Generalization

21 Constraints on Specialization/Generalization
Constraints on which entities can be members of a given lower-level entity set Condition-defined – “all vehicles with a towing capacity of more than 10,000 lbs are trucks” Constraints on whether entities can belong to more than one lower-level entity set Disjoint – an entity can belong to only one Overlapping – an entity can belong to more than one Completeness constraint – must every higher level entity belong to at least one lower level entity

22 Mapping Specialization to Tables
Option 1: Put everything in one table There will be NULL values where attributes don’t apply

23 Mapping Specialization to Tables
Option2: Form tables for the higher level entity and the lower level entities Each lower level entity includes the primary key of the higher level entity set

24 Mapping Specialization to Tables
Option3: Model only the lower level entities Repeats attributes

25 Steps in Data Model Design
Identify entities Identify relationships among entities Determine the cardinality and participation of relationships Designate keys / identifiers for entities List attributes of entities Identify constraints and business rules Map 1-6 to a physical implementation

26 Physical Data Model The “physical” means a specific implementation of the data model Choice of hardware and operating system Choice of relational database management system Implementation of tables, relationships, constraints, triggers, indices, data types Database access and security Performance Storage

27 Relational Database Management Systems (RDBMS)
File vs. server based Free vs. commercial Different data types Potentially different syntax for SQL queries Security models and concurrent users

28 Reduction of an ER Diagram to Tables
Converting an ER diagram to table format is the basis for deriving a relational database Primary keys allows entities to be expressed as tables that contain data A database is a collection of tables Tables are assigned the same name as the entity Each table has columns that correspond to attributes – each column has a unique name Each column must have a single data type

29 Advanced Database Objects
Views Stored procedures Triggers Constraints Implementation of these objects may depend on your choice of RDBMS software

30 Database Views A View is equivalent to a table, but is defined by a SQL query Used to present a set of desired information, independent of the underlying database structure Can be used to hide complexities of the underlying data model from the user One way to address the Cons of normalization

31 Stored Procedures A set of structured query language (SQL) statements that are stored and executed on the server Useful for repetitive tasks Encapsulate functionality and isolate users from data tables Can provide a security layer – software applications have no access to the database directly, but can execute stored procedures

32 Triggers Special kind of stored procedure
Automatically executes on a table or view when an event occurs in the database Events include: CREATE, ALTER, INSERT, UPDATE, DELETE Mostly used to maintain the integrity of information in the database

33 Constraints Common way to enforce data integrity Examples:
Not NULL – value in a column must not be NULL Unique – value(s) in specified column(s) must be unique for each row in a table Primary Key – value(s) in the specified column(s) must be unique for each row in the table and not be NULL Foreign Key – values(s) in the specified column(s) must reference an existing record in another table via its primary key Check – an expression that validates data and must not be FALSE

34 Data Types Each attribute of an entity (column in a database table) must have a single data type Data types are enforced by RDBMS software Table: DataValues Attribute Data Type Sample Data ValueID Integer 1 SiteID 5 VariableID DateTime Date/Time 8/15/2013 4:30 PM DataValue Double 4.567

35 Data Types Data types can be specific to RDBMS software RDBMS Integer
Floating Point Decimal String Date/Time MS SQL Server TINYINT, SMALLINT, INT, BIGINT FLOAT, REAL NUMERIC, DECIMAL, SMALLMONEY, MONEY CHAR, VARCHAR, TEXT, NCHAR, NVARCHAR, NTEXT DATE, DATETIMEOFFSET, DATETIME2, SMALLDATETIME, DATETIME, TIME MySQL TINYINT (8-bit), SMALLINT (16-bit), MEDIUMINT (24-bit), INT (32-bit), BIGINT (64-bit) FLOAT (32-bit), DOUBLE (aka REAL) (64-bit) DECIMAL CHAR, BINARY, VARCHAR, VARBINARY, TEXT, TINYTEXT, MEDIUMTEXT, LONGTEXT DATETIME, DATE, TIMESTAMP, YEAR PostgreSQL SMALLINT (16-bit), INTEGER (32-bit), BIGINT (64-bit) REAL (32-bit), DOUBLE PRECISION (64-bit) DECIMAL, NUMERIC CHAR, VARCHAR, TEXT DATE, TIME (with/without TIMEZONE), TIMESTAMP (with/without TIMEZONE), INTERVAL Quick summary from:

36 Summary of 3 Levels of Data Model Design
Feature Conceptual Logical Physical Entity Names X Entity Relationships Attributes Primary Keys Foreign Keys Table Names Column Names Column Data Types Views Stored Procedures Triggers Constraints

37 Summary Simple rules for naming objects and specifying domains can help protect the integrity of data Normalization can help reduce redundancy, increase storage efficiency, and protect data integrity – but there are tradeoffs Data integrity rules include relationships and domains and protect the integrity of data in the database Specialization and generalization require special consideration in implementation A physical database implementation requires choices about hardware, software, security, formats and storage, and other factors


Download ppt "Lecture 6 Data Model Design (continued)"

Similar presentations


Ads by Google