Download presentation
Presentation is loading. Please wait.
1
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
2
Course logistics © 2002 by Prentice Hall
3
Administrivia Instructor: Dragomir R. Radev 3080 West Hall Connector, (734) Office hours: TBD Course page: Class time: Fridays, 1-4PM, 311 WH © 2002 by Prentice Hall
4
Book information Database Processing by David Kroenke (8th Edition, Prentice Hall, ISBN ) : Managing and Using MySQL by Reese, Yarger, and King (O'Reilly, ISBN ) : Optional reading: Database Management Systems by Ramakrishnan and Gerhke (McGraw-Hill, ISBN ) : Optional reading: Data Mining by Han and Kamber (Morgan Kaufmann, ISBN ): © 2002 by Prentice Hall
5
Assignments Assignment 1: Entity-Relationship Model, Relational Model, SQL Assignment 2: Database design using ERWin and Oracle Assignment 3: Database design using MySQL Assignment 4: XML, Data Mining, and other advanced topics © 2002 by Prentice Hall
6
Final project Proposal Database design Progress report Project
Final presentation © 2002 by Prentice Hall
7
Grading Four assignments: 40% (10% each) Project + presentation: 30%
Final exam: 25% Class participation: 5% © 2002 by Prentice Hall
8
Policies Class participation counts as 5% of the grade
Timely submission of assignments is important Syllabus can be amended during the semester Honors Code © 2002 by Prentice Hall
9
Notes on programming All students will do some programming as part of the assignments. For the final project, teams will be formed in ways to include students with diverse backgrounds. © 2002 by Prentice Hall
10
Syllabus - I DK Ch. 1. Introduction to Database Processing
DK Ch. 2. Introduction to Database Development DK Ch. 3. The Entity-Relationship Model DK Ch. 5. The Relational Model and Normalization DK Ch. 6. Database Design Using Entity-Relationship Models READING The ERWin System DK Ch. 8. Foundations of Relational Implementation DK Ch. 9. Structured Query Language RYK Ch. 1 MySQL DK Ch. 16. JDBC, Java Server Pages, and MySQL © 2002 by Prentice Hall
11
Syllabus - II RYK Ch. 3 SQL according to MySQL
DK Ch. 10. Database Application Design DK Ch. 11. Managing Multi-User Databases RYK Ch. 7 Database Design DK Ch. 12. Managing Databases with Oracle (DK Ch. 14). Networks, Multi-Tier Architectures, and XML READING XML and query languages for XML READING Data Mining DK App. A. Data Structures for Database Processing © 2002 by Prentice Hall
12
Introduction to Database Processing
Eighth Edition Chapter 1 David M. Kroenke Introduction to Database Processing © 2002 by Prentice Hall
13
Art or Engineering Database design and development involves both art and engineering Gathering and organizing user requirements is an art Transforming the resulting designs into physical applications involves engineering © 2002 by Prentice Hall
14
Types of Data Stored Today, most newer databases are able to store a large variety of data, including… Scalar data Names, dates, phone numbers Pictures Audio Video © 2002 by Prentice Hall
15
Database Example 1 Mary Richards Housepainting
Self Employed Entrepreneur Single User Database 3 Tables (Customers, Jobs, Source) Data Needs: Track how customers, jobs, and referrals relate Record bid estimates Track referral sources Produce mailing labels © 2002 by Prentice Hall
16
Mary Richards’ Tables SOURCE CUSTOMER JOB © 2002 by Prentice Hall
17
Database Example 2 Treble Clef Music Multi-User database on LAN
3 Tables (Customers, Instruments, Rentals) Data Needs: Track instrument rentals Handle multi-user issues © 2002 by Prentice Hall
18
Treble Clef Form 1 © 2002 by Prentice Hall
19
Treble Clef Form 2 © 2002 by Prentice Hall
20
Treble Clef Form 3 © 2002 by Prentice Hall
21
Database Example 3 State Licensing & Vehicle Registration Bureau
52 Centers, 37 Offices, Hundreds of Users 40 Tables Data Needs: Track drivers licensing issues traffic violations, accidents, arrests, limitations Track auto registration issues revenue, law enforcement Integrate the needs of many departments © 2002 by Prentice Hall
22
Database Example 4 Calvert Island Reservations Centre
Chamber of Commerce Promotional database provides access to data Customer and reservation database processes Data Needs: Store multimedia data (photos, video clips, sound clips) Must be Web / browser accessible Uses Web technologies including HTTP, DHTML, and XML © 2002 by Prentice Hall
23
Comparison Among Database Examples
© 2002 by Prentice Hall
24
Reading assignments 1/16 - Chapters 1 & 2 1/23 - Chapters 3 & 5
2/6 - Chapter 9 + ERWIN docs © 2002 by Prentice Hall
25
Applications versus Database Management Systems (DBMS)
The Database Management System (DBMS) provides functionality above and beyond the storage of information. Users want to see reports, forms, and query results – not simply data As such, application development is crucial to the design and development of the DBMS © 2002 by Prentice Hall
26
In the Beginning, There Were File-Processing Systems
The first business information systems stored information by grouping similar data into separate files. © 2002 by Prentice Hall
27
A File-Processing System
© 2002 by Prentice Hall
28
Problems with File-Processing Systems
Data separated and isolated Data often duplicated Application program dependent Incompatible data files Difficult to understand © 2002 by Prentice Hall
29
Duplication of Data When storing the same data in multiple locations, the likelihood of inconsistency is very high. What is my real name? Table 1: my name is Dan Table 2: my name is Danielle Table 3: my name is Daniel Table 4: my name is Don © 2002 by Prentice Hall
30
The Data in a DBMS Data is integrated Data duplication is reduced
Data is program independent Data is easy to understand © 2002 by Prentice Hall
31
A DMBS © 2002 by Prentice Hall
32
Database is Self-Describing
A database contains a data dictionary A data dictionary is data about the data (metadata) It describes the structure and format of the information contained within the database © 2002 by Prentice Hall
33
The Hierarchy of Data File-Processing DBMS © 2002 by Prentice Hall
34
DBMS –the Past 1970, E.F. Codd Normalization Process Compute Intensive
© 2002 by Prentice Hall
35
DBMS –the Present Ashton - Tate: dBase II, now Borland
Oracle, Focus, Ingress ported down Paradox, Revelation, MDBS, Helix, Foxpro, Access built specifically for microcomputers © 2002 by Prentice Hall
36
DBMS –the Future Trends
Client-Server Applications Integration of Internet Technology Distributed Processing Object-Oriented DBMS © 2002 by Prentice Hall
37
Introduction to Database Development
Database Processing Eighth Edition Chapter 2 David M. Kroenke Introduction to Database Development © 2002 by Prentice Hall
38
The Components of the Database System
The Database Contents The DBMS The Application Programs The Developers The Users The Database © 2002 by Prentice Hall
39
Database System Components
© 2002 by Prentice Hall
40
Database Contents User Data Metadata Indexes Application Metadata
© 2002 by Prentice Hall
41
User Data A table of data is called a relation
Columns are fields or attributes Rows are entities Relations must be structured properly © 2002 by Prentice Hall
42
Metadata Metadata describes the structure and format of the data and the overall database System tables store metadata number of tables and table names number of fields and field names primary key fields field names, data types, and length © 2002 by Prentice Hall
43
Indexes Improve performance Improve accessibility (Overhead data)
© 2002 by Prentice Hall
44
Application Metadata Stores the structure and format of forms reports
queries other application components © 2002 by Prentice Hall
45
The DBMS Design Tools Subsystem Run-Time Subsystem DBMS Engine
© 2002 by Prentice Hall
46
Design Tools Subsystem
Tools to design and develop tables forms queries reports Programming Languages macros languages © 2002 by Prentice Hall
47
Run-Time Subsystem Processes database components created by design tools © 2002 by Prentice Hall
48
DBMS Engine Intermediary between the design tools and run-time subsystems and the data Also handles . . . transaction management locking backup and recovery © 2002 by Prentice Hall
49
Creating the Database Defining the database schema Creating the tables
Defining the relationships among the tables © 2002 by Prentice Hall
50
The Database Schema Defines a database’s structure
Tables - subjects within the database Relationships - one-to-many or 1:N Domains - set of values a column may have Business rules - restrictions on data values © 2002 by Prentice Hall
51
Defining Tables using Microsoft Access
© 2002 by Prentice Hall
52
Defining Relationships Among the Tables using Microsoft Access
© 2002 by Prentice Hall
53
Components of Applications
Forms Queries Reports Menus Application Programs © 2002 by Prentice Hall
54
A Browser Data Entry Form
© 2002 by Prentice Hall
55
A Query in Microsoft Access
© 2002 by Prentice Hall
56
A Report in Microsoft Access
© 2002 by Prentice Hall
57
A Menu in Microsoft Access
© 2002 by Prentice Hall
58
Database Development Approaches
Prototype Top-down development Bottom-up development © 2002 by Prentice Hall
59
Prototype Development
Develop portions of the database and submit to users for feedback, refinement, and enhancement © 2002 by Prentice Hall
60
Top-down Development General requirements to specific requirements
A global perspective © 2002 by Prentice Hall
61
Bottom-up Development
Specific requirements to general requirements Typically faster and less risky © 2002 by Prentice Hall
62
The Data Model A data model defines and graphically depicts the data structure and relationships among the data © 2002 by Prentice Hall
63
Data Modeling Creation
Interviewing users Documenting requirements Building a data model Building a database prototype A process of inference Working backwards © 2002 by Prentice Hall
64
Common Data Models Entity-Relationship Model Semantic Object Model
© 2002 by Prentice Hall
65
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
66
The Entity-Relationship Model
Database Processing Eighth Edition Chapter 3 David M. Kroenke The Entity-Relationship Model © 2002 by Prentice Hall
67
Data Modeling Process of creating a logical representation of the structure of the database The most important task in database development © 2002 by Prentice Hall
68
Entity-Relationship Model (E-R Model)
An Entity-Relationship Model (E-R Model) consists of: Entities Attributes Identifiers Relationships © 2002 by Prentice Hall
69
An Entity An entity is an object that can be identified in the users’ work environment & that users want to track. Entities of a given type are grouped into entity classes. © 2002 by Prentice Hall
70
An Entity Example © 2002 by Prentice Hall
71
Attributes An attribute describes a characteristic of an entity
For example An entity: Employee Has attributes: EmployeeName Extension DateOfHire © 2002 by Prentice Hall
72
Identifier An identifier uniquely identifies a row in a table.
For an Employee, the SocialSecurityNumber may serve as the Indentifier. © 2002 by Prentice Hall
73
Relationships A relationship describes how one or more entities are related with each other. © 2002 by Prentice Hall
74
Relationship Cardinality
Entity-Instance Participation in relationships is shown by maximum cardinality minimum cardinality © 2002 by Prentice Hall
75
Maximum Cardinality The maximum cardinality indicates/depicts the maximum number of instances involved in a relationship. Alternatives include 1:1 (one-to-one) 1:N (one-to-many) N:M (many-to-many) © 2002 by Prentice Hall
76
Relationship Examples Showing Maximum Cardinality Alternatives
© 2002 by Prentice Hall
77
Minimum Cardinality The minimum cardinality indicates/depicts whether participation in the relationship is mandatory or optional. Alternatives include 0 (optional) 1 (mandatory) © 2002 by Prentice Hall
78
A Relationship Example Showing Minimum and Maximum Cardinality
© 2002 by Prentice Hall
79
A Recursive Relationship
A recursive relationship is when an entity has a relationship with itself. © 2002 by Prentice Hall
80
Entity-Relationship Diagram (E-R Diagram)
An entity-relationship diagram (E-R Diagram) is a graphical representation of the E-R model using a set of ‘somewhat’ standardized conventions © 2002 by Prentice Hall
81
An Entity-Relationship Diagram (E-R Diagram) Example
© 2002 by Prentice Hall
82
Weak Entity A weak entity is an entity whose instance survival depends (logically) on an associated instance in another entity © 2002 by Prentice Hall
83
Subtype Entities Some entities may have many common attributes and a few unique attributes. The common attributes may be grouped together in a supertype entity and the unique attributes may be grouped together in a subtype entity. © 2002 by Prentice Hall
84
CLIENT with Subtype Entities
© 2002 by Prentice Hall
85
E-R Diagram Computer Assisted Software Engineering (CASE) Tools
Several Computer Assisted Software Engineering (CASE) Tools exist to help create E-R Diagrams and the resulting physical database elements. Products include: IEW IEF DEFT ER-WIN Visio © 2002 by Prentice Hall
86
E-R Diagram Example: Jefferson Dance Club
© 2002 by Prentice Hall
87
E-R Diagram Example: San Juan Charters
© 2002 by Prentice Hall
88
The Relational Model and Normalization
Database Processing Eighth Edition Chapter 5 David M. Kroenke The Relational Model and Normalization © 2002 by Prentice Hall
89
The Relational Model Broad, flexible model
Basis for almost all DBMS products E.F. Codd defined well-structured “normal forms” of relations, “normalization” © 2002 by Prentice Hall
90
Components of the Relational Model
A two-dimensional table consisting of rows and columns Tuples The rows (or records) in a relation Attributes The columns (or fields) in a relation © 2002 by Prentice Hall
91
Terminology © 2002 by Prentice Hall
92
Functional Dependency
Functional dependencies are the relationships among the attributes within a relation. If attribute A functional depends on attribute B, then for every instance of B you will know the respective value of A. © 2002 by Prentice Hall
93
Functional Dependency Notation
Major is functionally dependent on SID SID Major Grade is functionally dependent on the combination of SID and ClassID (SID, ClassID) Grade © 2002 by Prentice Hall
94
Functional Dependency – an Example
EmployeeNumber Name EmployeeNumber Age EmployeeNumber Sex © 2002 by Prentice Hall
95
A Key A key is a group of one or more attributes that uniquely identifies a tuple © 2002 by Prentice Hall
96
A Combination Key Sometimes more than one attribute will be required to uniquely identify a tuple. If a key consists of more than one attribute, it is called a combination (or composite) key. © 2002 by Prentice Hall
97
Example of a Combination Key
© 2002 by Prentice Hall
98
Normalization Normalization is a process of evaluating and converting a relation to reduce modification anomalies Essentially, normalization detects and eliminates data redundancy © 2002 by Prentice Hall
99
An Anomaly An anomaly is an undesirable consequence of a data modification. © 2002 by Prentice Hall
100
Normal Forms Normal forms are state-classes of relations which identify the level of anomaly-avoidance © 2002 by Prentice Hall
101
Normal Forms Levels 1NF –First Normal Form 2NF –Second Normal Form
3NF –Third Normal Form BCNF –Boyce-Codd Normal Form 4NF –Fourth Normal Form 5NF –Fifth Normal Form DK/NF –Domain/Key Normal Form © 2002 by Prentice Hall
102
First Normal Form (1NF) To be in First Normal Form (1NF) a relation must have only single-valued attributes -- neither repeating groups nor arrays are permitted © 2002 by Prentice Hall
103
Second Normal Form (2NF)
To be in Second Normal Form (2NF) the relation must be in 1NF and each nonkey attribute must be dependent on the whole key (not a subset of the key) © 2002 by Prentice Hall
104
Third Normal Form (3NF) To be in Third Normal Form (3NF) the relation must be in 2NF and no transitive dependencies may exist within the relation. A transitive dependency is when an attribute is indirectly functionally dependent on the key (that is, the dependency is through another nonkey attribute) © 2002 by Prentice Hall
105
Violation of 3NF © 2002 by Prentice Hall
106
Boyce-Codd Normal Form (BCNF)
To be in Boyce-Codd Normal Form (BCNF) the relation must be in 3NF and every determinant must be a candidate key. © 2002 by Prentice Hall
107
Fourth Normal Form (4NF)
To be in Fourth Normal Form (4NF) the relation must be in BCNF and the relation may not contain multi-valued dependencies. © 2002 by Prentice Hall
108
Fifth Normal Form (5NF) The Fifth Normal Form concerns dependencies that are obscure and beyond the scope of this text. © 2002 by Prentice Hall
109
Domain/Key Normal Form (DK/NF)
To be in Domain/Key Normal Form (DK/NF) every constraint on the relation must be a logical consequence of the definition of keys and domains. © 2002 by Prentice Hall
110
DK/NF Terminology Constraint
A rule governing static values of attributes Key A unique identifier of a tuple Domain A description of an attribute’s allowable values © 2002 by Prentice Hall
111
Domain/Key Definition of Example Above
DK/NF Example Domain/Key Definition of Example Above © 2002 by Prentice Hall
112
DK/NF Example © 2002 by Prentice Hall
113
DK/NF Example © 2002 by Prentice Hall
114
Summary of Normal Forms
© 2002 by Prentice Hall
115
Synthesis of Relations
A B and B A one-to-one A B but B not A many-to-one A not B and B not A many-to-many © 2002 by Prentice Hall
116
Summary of Attribute Relationships
© 2002 by Prentice Hall
117
Optimization De-Normalization (a.k.a., Controlled Redundancy)
© 2002 by Prentice Hall
118
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
119
Database Design Using Entity-Relationship Models
Database Processing Eighth Edition Chapter 6 David M. Kroenke Database Design Using Entity-Relationship Models © 2002 by Prentice Hall
120
Entities & Relationships
Entities are those things that users want to track Relationships define how entities are associated with each other © 2002 by Prentice Hall
121
Representing an Entity
© 2002 by Prentice Hall
122
Representing a ‘Has-a’ Relationship
1:1 and 1:N relationships are saved by creating foreign keys A foreign key is when you take the primary key from one table (on the one-side) and place it into another table (on the many-side or into the other table for a 1:1 relationship) © 2002 by Prentice Hall
123
Representing a 1:1 Relationship
The foreign key can go on either side BUT it is on one side only © 2002 by Prentice Hall
124
Representing 1:N Relationships
In each of the following examples, the foreign key goes to the right (into the many-side) © 2002 by Prentice Hall
125
Foreign Key Placement ProfessorName goes into the Student table as a Foreign Key © 2002 by Prentice Hall
126
Representing a N:M Relationship
N:M relationships are saved by creating a new table. The primary key of the new table is a combination key composed of the primary keys from each of the tables involved in the relationship. © 2002 by Prentice Hall
127
Representing a M:N Relationship
© 2002 by Prentice Hall
128
An E-R Diagram Example © 2002 by Prentice Hall
129
The Representation of the E-R Diagram Example on previous slide
© 2002 by Prentice Hall
130
Common Relationship Patterns
Tree Simple Networks Complex Networks Bills of Materials © 2002 by Prentice Hall
131
Tree Relationship Pattern
A tree relationship pattern is a form of hierarchy The data structure elements have only one-to-many relationships © 2002 by Prentice Hall
132
A Tree Relationship Example
© 2002 by Prentice Hall
133
Simple Network Relationship Pattern
A simple network relationship pattern data structure has only 1:N relationships. The elements may have more than one parent as long as the parents are of different types © 2002 by Prentice Hall
134
Example of a Simple Network Relationship Pattern
© 2002 by Prentice Hall
135
A Complex Network Relationship Pattern
A complex network relationship pattern is where the data structure has at least one N:M relationship © 2002 by Prentice Hall
136
Example of a Complex Network Relationship Pattern
© 2002 by Prentice Hall
137
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
138
Foundations of Relational Implementation
Database Processing Eighth Edition Chapter 8 David M. Kroenke Foundations of Relational Implementation © 2002 by Prentice Hall
139
Review Relational Model Terminology
Relation is a two-dimensional table Attributes are single valued Each attribute belongs to a domain A domain is a physical and logical description of permittable values No two rows are identical Order is unimportant The row is called a tuple © 2002 by Prentice Hall
140
Data Definition Language (DDL)
In order to create the tables and structures within a database, the DBMS must provide (often using SQL) a data definition language (DDL). The DDL is used to define (i.e., Create, Drop, and Alter) everything in the database… Tables Columns Indexes Users, etc. © 2002 by Prentice Hall
141
Data Manipulation Language (DML)
When thinking about SQL, most people think of the Data Manipulation Language (DML) aspects of the language The DML allows users to insert, delete, modify, and retrieve information Select Delete Insert Update © 2002 by Prentice Hall
142
DML Alternatives A DBMS must provide at least one DML. Several options exist… Query/Update language (e.g., SQL) Query-by-Example Query-by-Form © 2002 by Prentice Hall
143
Query/Update Language
SELECT Name, Age FROM PATIENT WHERE Physician = ‘Levy’ © 2002 by Prentice Hall
144
Query by Form © 2002 by Prentice Hall
145
Application Program Interface (API)
Some applications provide an Application Program Interface (API) The API is a DML typically used by programmers To retrieve or update data contained within the application, a programmer submits requests to the application’s API. © 2002 by Prentice Hall
146
Relational Algebra Relational algebra defines a set of operators that may work on relations. Recall that relations are simply data sets. As such, relational algebra deals with set theory. The operators in relational algebra are very similar to traditional algebra except that they apply to sets. © 2002 by Prentice Hall
147
Relational Algebra Operators
Relational algebra provides several operators: Union Difference Intersection Product Projection Selection Join © 2002 by Prentice Hall
148
Union Operator The union operator adds tuples from one relation to another relation A union operation will result in combined relation This is similar to the logical operator ‘OR’ © 2002 by Prentice Hall
149
Union Operator JUNIOR and HONOR-STUDENT relations and their union:
Example of JUNIOR relation Example HONOR-STUDENT relation Union of JUNIOR and HONOR-STUDENT relations © 2002 by Prentice Hall
150
Difference Operator The difference operator produces a third relation that contains the tuples that appear in the first relation, but not the second This is similar to a subtraction © 2002 by Prentice Hall
151
Difference Operator JUNIOR relation HONOR-STUDENT relation
JUNIOR minus HONOR-STUDENT relation © 2002 by Prentice Hall
152
Intersection Operator
An intersection operation will produce a third relation that contains the tuples that are common to the relations involved. This is similar to the logical operator ‘AND’ © 2002 by Prentice Hall
153
Intersection Operator
JUNIOR relation HONOR-STUDENT relation Intersection of JUNIOR and HONOR-STUDENT relations © 2002 by Prentice Hall
154
Product Operator A product operator is a concatenation of every tuple in one relation with every tuple in a second relation The resulting relation will have n x m tuples, where… n = the number of tuples in the first relation and m = the number of tuples in the second relation This is similar to multiplication © 2002 by Prentice Hall
155
Projection Operator A projection operation produces a second relation that is a subset of the first. The subset is in terms of columns, not tuples The resulting relation will contain a limited number of columns. However, every tuple will be listed. © 2002 by Prentice Hall
156
Selection Operator The selection operator is similar to the projection operator. It produces a second relation that is a subset of the first. However, the selection operator produces a subset of tuples, not columns. The resulting relation contains all columns, but only contains a portion of the tuples. © 2002 by Prentice Hall
157
Join Operator The join operator is a combination of the product, selection, and projection operators. There are several variations of the join operator… Equijoin Natural join Outer join Left outer join Right outer join © 2002 by Prentice Hall
158
Data for Join Examples SID Name Major GradeLevel 123 Jones History JR
158 Parks Math GR 271 Smith 105 Anderson Management SN StudentNumber ClassName PositionNumber 123 H350 1 105 BA490 3 B490 7 © 2002 by Prentice Hall
159
Join Examples Equijoin Natural Join Left Outer Join
© 2002 by Prentice Hall
160
Expressing Queries in Relational Algebra
1. What are the names of all students? STUDENT [Name] 2. What are the student numbers of all students enrolled in a class? ENROLLMENT [StudentNumber] © 2002 by Prentice Hall
161
Expressing Queries in Relational Algebra
3. What are the student numbers of all students not enrolled in a class? STUDENT [SID] – ENROLLMENT [StudentNumber] 4. What are the numbers of students enrolled in the class ‘BD445’? ENROLLMENT WHERE ClassName = ‘BD445’[StudentNumber] © 2002 by Prentice Hall
162
Expressing Queries in Relational Algebra
5. What are the names of the students enrolled in class ‘BD445’? STUDENT JOIN (SID = StudentNumber) ENROLLMENT WHERE ClassName = ‘BD445’[STUDENT.Name] © 2002 by Prentice Hall
163
Expressing Queries in Relational Algebra
6. What are the names and meeting times of ‘PARKS’ classes? STUDENT WHERE Name = ‘PARKS’ JOIN (SID=StudentNumber) ENROLLMENT JOIN (ClassName = Name) CLASS [CLASS.Name, Time] © 2002 by Prentice Hall
164
Expressing Queries in Relational Algebra
7. What are the grade levels and meeting rooms of all students, including students not enrolled in a class? STUDENT LEFT OUTER JOIN (SID = StudentNumber) ENROLLMENT JOIN (ClassName = Name) CLASS [GradeLevel, Room] © 2002 by Prentice Hall
165
Summary of Relational Algebra Operators
© 2002 by Prentice Hall
166
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
167
Structured Query Language
Database Processing Eighth Edition Chapter 9 David M. Kroenke Structured Query Language © 2002 by Prentice Hall
168
Structure Query Language
Structure Query Language is known as either Its acronym, SQL, or SEQUEL, the name of the original version of SQL SEQUEL was developed by IBM in the mid-1970s. © 2002 by Prentice Hall
169
SQL, not a Procedural Programming Language
SQL is not a programming language itself, it is a data access language SQL may be embedded in traditional procedural programming languages (like COBOL) © 2002 by Prentice Hall
170
SQL Syntax SQL is not case sensitive.
SELECT field(s) ‘what columns will be retrieved FROM table(s); ‘which table contains the column data e.g., SELECT Name, Phone FROM Student; © 2002 by Prentice Hall
171
The DISTINCT Qualifier
Eliminating duplicate rows on the output… SELECT DISTINCT StateAddress FROM Employee; © 2002 by Prentice Hall
172
The WHERE Clause Reducing the output based on specified criteria…
SELECT StudentName FROM Students WHERE GradePointAverage >= 3.0; © 2002 by Prentice Hall
173
Comparison Operators Equals = Not equals <> Greater than >
Less than < Greater than or equal to >= Less than or equal to <= Within a list of values IN A logical NOT Within a range BETWEEN © 2002 by Prentice Hall
174
IN a List of Values SELECT StudentName FROM Student
WHERE State IN [‘PA’, ‘MA’, ‘CA’]; © 2002 by Prentice Hall
175
The Logical NOT SELECT StudentName FROM Students
WHERE State NOT IN [‘NJ’, ‘NM’, ‘NY’]; WHERE NOT GradePointAverage >= 3.0; © 2002 by Prentice Hall
176
Within a Range of Values
SELECT StudentName FROM Student WHERE StudentID BETWEEN 250 and 300; © 2002 by Prentice Hall
177
Using Wildcard Character Substitutions
The LIKE keyword is used in place of the = sign when you use wildcard characters. The underscore character (_) is a single character substitution The percent character (%) is a multi-character substitution © 2002 by Prentice Hall
178
Using LIKE SELECT StudentID FROM Student WHERE StudentName LIKE ‘K%’;
SELECT PartName FROM Part WHERE PartNumber LIKE ‘_ABC%’; © 2002 by Prentice Hall
179
NULL Means Nothing A NULL character means that nothing has been entered. This is different from a space or a zero. SELECT Name FROM Student WHERE Major IS NULL; © 2002 by Prentice Hall
180
ORDER BY… Sorting Outputs
Sorting in descending order… SELECT StudentID, Name FROM Student ORDER BY Name DESC; Sorting in ascending order… ORDER BY Name ASC; © 2002 by Prentice Hall
181
Built-in Functions Counting number of rows COUNT
Adding the values in a column SUM Averaging the values in a column AVG Finding the maximum value in a column MAX Finding the minimum value in a column MIN © 2002 by Prentice Hall
182
Built-in Functions SELECT Count (*) FROM Student WHERE State = ‘WI’;
SELECT Sum (Amount) FROM SalesReceipt; SELECT Max (Score) FROM Assignments; © 2002 by Prentice Hall
183
Grouping the Output SELECT Name, State FROM Student GROUP BY State;
© 2002 by Prentice Hall
184
Reducing the Groups Displayed
SELECT Name, State FROM Student GROUP BY State HAVING Count (*) > 4; © 2002 by Prentice Hall
185
Sub-Queries SELECT Name FROM Student WHERE SID IN
(SELECT StudentNumber FROM Enrollment WHERE ClassName = ‘MIS445’); © 2002 by Prentice Hall
186
Joining Tables SELECT Student.SID, Student.Name, Enrollment.ClassName
FROM Student, Enrollment WHERE Student.SID = Enrollment.StudentNumber AND Student.State = ‘OH’; © 2002 by Prentice Hall
187
EXISTS SELECT DISTINCT StudentNumber FROM Enrollment A WHERE EXISTS
FROM Enrollment B WHERE A.StudentNumber = B.StudentNumber AND A.ClassName NOT = B.ClassName); © 2002 by Prentice Hall
188
Entering Data INSERT INTO Enrollment VALUES (400, ‘MIS445’, 44);
© 2002 by Prentice Hall
189
Deleting Data DELETE Student WHERE Student.SID = 100;
© 2002 by Prentice Hall
190
Modifying Data UPDATE Enrollment SET SeatNumber = 44 WHERE SID = 400;
© 2002 by Prentice Hall
191
MySQL Ch1 & Ch3 © 2002 by Prentice Hall
192
Overview TcX - Michael Widenius (MySQL) Hughes - David Hughes (mSQL)
Features: Mostly ANSI SQL2 compliant Transactions Stored procedures Auto_increment fields © 2002 by Prentice Hall
193
More features Cross-database joins Outer joins
API: C/C++, Eiffel, Java, PHP, Perl, Python, TCL Runs on Windows, UNIX, and Mac High performance © 2002 by Prentice Hall
194
SQL syntax CREATE TABLE people (name CHAR(10))
INSERT INTO people VALUES (‘Joe’) SELECT name FROM people WHERE name like ‘J%’ © 2002 by Prentice Hall
195
SQL commands SHOW DATABASES SHOW TABLES
Data types: INT, REAL, CHAR(l), VARCHAR(l), TEXT(l), DATE, TIME ALTER TABLE mytable MODIFY mycolumn TEXT(100) ENUM(‘cat’,’dog’,’rabbit’,’pig’) © 2002 by Prentice Hall
196
SQL commands CREATE DATABASE dbname
CREATE TABLE tname (id NOT NULL PRIMARY KEY AUTO_INCREMENT) CREATE INDEX part_of_name ON customer (name(10)) INSERT INTO tname (c1, …, cn) values (v1, …, vn) © 2002 by Prentice Hall
197
JOINs and ALIASing SELECT book.title, author.name FROM author, book
WHERE books.author = author.id SELECT very_long_column_name AS col FROM tname WHERE col=‘5’ © 2002 by Prentice Hall
198
Loading text files Comma-separated files (*.csv)
LOAD DATA LOCAL INFILE "whatever.csv" INTO TABLE tname © 2002 by Prentice Hall
199
Aggregate queries SELECT position FROM people GROUP by position
SELECT position, AVG (salary) FROM people GROUP BY position HAVING AVG (salary) > © 2002 by Prentice Hall
200
Full text search CREATE TABLE WebCache (
url VARCHAR (255) NOT NULL PRIMARY KEY, ptext TEXT NOT NULL, FULLTEXT (ptext)); INSERT INTO WebCache (url, ptext) VALUES (‘index.html’, ‘Welcome to the University of Michigan’); SELECT url from WebCache WHERE MATCH (ptext) against (‘Michigan’); © 2002 by Prentice Hall
201
Advanced features Transactions Table locking Functions Unions
Outer joins © 2002 by Prentice Hall
202
Installing MySQL on Windows
© 2002 by Prentice Hall
203
Useful pointers Small example: MySQL documentation: (official) MySQL tutorial: Online, interactive tutorials: © 2002 by Prentice Hall
204
use test; CREATE TABLE STATION (ID INTEGER PRIMARY KEY, CITY CHAR(20), STATE CHAR(2), LAT_N REAL, LONG_W REAL); DESCRIBE STATION; INSERT INTO STATION VALUES (13, 'Phoenix', 'AZ', 33, 112); INSERT INTO STATION VALUES (44, 'Denver', 'CO', 40, 105); INSERT INTO STATION VALUES (66, 'Caribou', 'ME', 47, 68); SELECT * FROM STATION; SELECT * FROM STATION WHERE LAT_N > 39.7; © 2002 by Prentice Hall
205
SELECT ID, CITY, STATE FROM STATION; ID CITY STATE ;
WHERE LAT_N > 39.7; CREATE TABLE STATS (ID INTEGER REFERENCES STATION(ID), MONTH INTEGER CHECK (MONTH BETWEEN 1 AND 12), TEMP_F REAL CHECK (TEMP_F BETWEEN -80 AND 150), RAIN_I REAL CHECK (RAIN_I BETWEEN 0 AND 100), PRIMARY KEY (ID, MONTH)); INSERT INTO STATS VALUES (13, 1, 57.4, 0.31); INSERT INTO STATS VALUES (13, 7, 91.7, 5.15); INSERT INTO STATS VALUES (44, 1, 27.3, 0.18); INSERT INTO STATS VALUES (44, 7, 74.8, 2.11); INSERT INTO STATS VALUES (66, 1, 6.7, 2.10); INSERT INTO STATS VALUES (66, 7, 65.8, 4.52); SELECT * FROM STATS; © 2002 by Prentice Hall
206
© 2002 by Prentice Hall SELECT * FROM STATION, STATS
WHERE STATION.ID = STATS.ID; SELECT MONTH, ID, RAIN_I, TEMP_F FROM STATS ORDER BY MONTH, RAIN_I DESC; SELECT LAT_N, CITY, TEMP_F FROM STATS, STATION WHERE MONTH = 7 AND STATS.ID = STATION.ID ORDER BY TEMP_F; SELECT MAX(TEMP_F), MIN(TEMP_F), AVG(RAIN_I), ID GROUP BY ID; SELECT * FROM STATION WHERE 50 < (SELECT AVG(TEMP_F) FROM STATS WHERE STATION.ID = STATS.ID); © 2002 by Prentice Hall
207
CREATE VIEW METRIC_STATS (ID, MONTH, TEMP_C, RAIN_C) AS
SELECT ID, MONTH, (TEMP_F - 32) * 5 /9, RAIN_I * FROM STATS; SELECT * FROM METRIC_STATS; SELECT * FROM METRIC_STATS WHERE TEMP_C < 0 AND MONTH = 1 ORDER BY RAIN_C; UPDATE STATS SET RAIN_I = RAIN_I ; SELECT * FROM STATS; UPDATE STATS SET TEMP_F = 74.9 WHERE ID = 44 AND MONTH = 7; © 2002 by Prentice Hall
208
© 2002 by Prentice Hall SELECT * FROM STATS; COMMIT WORK;
UPDATE STATS SET RAIN_I = 4.50 WHERE ID = 44; ROLLBACK WORK; WHERE ID = 44 AND MONTH = 7; © 2002 by Prentice Hall
209
© 2002 by Prentice Hall DELETE FROM STATS WHERE MONTH = 7
OR ID IN (SELECT ID FROM STATION WHERE LONG_W < 90); DELETE FROM STATION WHERE LONG_W < 90; COMMIT WORK; SELECT * FROM STATION; SELECT * FROM STATS; SELECT * FROM METRIC_STATS; © 2002 by Prentice Hall
210
© 2002 by Prentice Hall http://www.mysql.com/doc/en/Tutorial.html
CREATE TABLE animals ( id MEDIUMINT NOT NULL AUTO_INCREMENT, name CHAR(30) NOT NULL, PRIMARY KEY (id) ); INSERT INTO animals (name) VALUES ("dog"),("cat"),("penguin"), ("lax"),("whale"),("ostrich"); SELECT * FROM animals; CREATE TABLE shop ( article INT(4) UNSIGNED ZEROFILL DEFAULT '0000' NOT NULL, dealer CHAR(20) DEFAULT '' NOT NULL, price DOUBLE(16,2) DEFAULT '0.00' NOT NULL, PRIMARY KEY(article, dealer)); INSERT INTO shop VALUES (1,'A',3.45),(1,'B',3.99),(2,'A',10.99),(3,'B',1.45),(3,'C',1.69), (3,'D',1.25),(4,'D',19.95); SELECT * FROM shop; © 2002 by Prentice Hall
211
© 2002 by Prentice Hall CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body) ); INSERT INTO articles VALUES (NULL,'MySQL Tutorial', 'DBMS stands for DataBase ...'), (NULL,'How To Use MySQL Efficiently', 'After you went through a ...'), (NULL,'Optimizing MySQL','In this tutorial we will show ...'), (NULL,'1001 MySQL Tricks','1. Never run mysqld as root '), (NULL,'MySQL vs. YourSQL', 'In the following database comparison ...'), (NULL,'MySQL Security', 'When configured properly, MySQL ...'); SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database'); © 2002 by Prentice Hall
212
© 2002 by Prentice Hall # What's the highest item number?
SELECT MAX(article) AS article FROM shop; # Find number, dealer, and price of the most expensive article. SELECT MAX(price) FROM shop; SELECT article, dealer, price FROM shop WHERE price=19.95; ORDER BY price DESC LIMIT 1; # What's the highest price per article? SELECT article, MAX(price) AS price GROUP BY article; © 2002 by Prentice Hall
213
© 2002 by Prentice Hall CREATE TEMPORARY TABLE tmp (
article INT(4) UNSIGNED ZEROFILL DEFAULT '0000' NOT NULL, price DOUBLE(16,2) DEFAULT '0.00' NOT NULL); LOCK TABLES shop READ; INSERT INTO tmp SELECT article, MAX(price) FROM shop GROUP BY article; SELECT shop.article, dealer, shop.price FROM shop, tmp WHERE shop.article=tmp.article AND shop.price=tmp.price; UNLOCK TABLES; DROP TABLE tmp; SELECT article, SUBSTRING( MAX( CONCAT(LPAD(price,6,'0'),dealer) ), 7) AS dealer, 0.00+LEFT( MAX( CONCAT(LPAD(price,6,'0'),dealer) ), 6) AS price FROM shop GROUP BY article; © 2002 by Prentice Hall
214
# find the articles with the highest and lowest price
SELECT FROM shop; SELECT * FROM shop WHERE OR # foreign keys CREATE TABLE person ( id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT, name CHAR(60) NOT NULL, PRIMARY KEY (id) ); CREATE TABLE shirt ( style ENUM('t-shirt', 'polo', 'dress') NOT NULL, color ENUM('red', 'blue', 'orange', 'white', 'black') NOT NULL, owner SMALLINT UNSIGNED NOT NULL REFERENCES person(id), © 2002 by Prentice Hall
215
INSERT INTO person VALUES (NULL, 'Antonio Paz');
INSERT INTO shirt VALUES (NULL, 'polo', 'blue', LAST_INSERT_ID()), (NULL, 'dress', 'white', LAST_INSERT_ID()), (NULL, 't-shirt', 'blue', LAST_INSERT_ID()); INSERT INTO person VALUES (NULL, 'Lilliana Angelovska'); (NULL, 'dress', 'orange', LAST_INSERT_ID()), (NULL, 'polo', 'red', LAST_INSERT_ID()), (NULL, 'dress', 'blue', LAST_INSERT_ID()), (NULL, 't-shirt', 'white', LAST_INSERT_ID()); SELECT * FROM person; SELECT * FROM shirt; © 2002 by Prentice Hall
216
© 2002 by Prentice Hall SELECT s.* FROM person p, shirt s
WHERE p.name LIKE 'Lilliana%' AND s.owner = p.id AND s.color <> 'white'; # unions select id, style from shirt where color = 'blue' union select id, style from shirt where color = 'orange' # visits per day CREATE TABLE t1 (year YEAR(4), month INT(2) UNSIGNED ZEROFILL, day INT(2) UNSIGNED ZEROFILL); INSERT INTO t1 VALUES(2000,1,1),(2000,1,20),(2000,1,30),(2000,2,2), (2000,2,23),(2000,2,23); SELECT year,month,BIT_COUNT(BIT_OR(1<<day)) AS days FROM t1 GROUP BY year,month; © 2002 by Prentice Hall
217
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
218
Database Application Design
Database Processing Eighth Edition Chapter 10 David M. Kroenke Database Application Design © 2002 by Prentice Hall
219
Functions of a Database Application
© 2002 by Prentice Hall
220
Four Basic Functions of Database Applications
The four basic functions are common to all database applications These basic functions are Create Read Update Delete The (unfortunate) acronym for these functions is CRUD © 2002 by Prentice Hall
221
Format/Materialize Function of a Database Application
The format/materialize function of a database application involves designing the appearance of the database application © 2002 by Prentice Hall
222
Enforce Constraints Function of a Database Application
Database application constraints typically involve validating the format, structure, and/or values of data. © 2002 by Prentice Hall
223
Provide Security and Control Function of a Database Application
In that database applications provide access to many people for many purposes, the application must provide security and control functions. This helps protects the data from being seen and/or modified by unauthorized persons. © 2002 by Prentice Hall
224
Execute Application Logic Function of a Database Application
Database applications satisfy one or more business function. As such, the business logic must be embedded into the database application. These logic rules and procedures constitute the execute application logic function of a database application. © 2002 by Prentice Hall
225
A View A view is a structured list of data attributes from the entities or semantic objects defined in the data model A view can be materialized or formatted as an on-line form or a hard-copy report © 2002 by Prentice Hall
226
A View CRUD Functions –Create
INSERT INTO CUSTOMER (CUSTOMER.Name, CUSTOMER.City) VALUES (NewCust.CUSTOMER.Name, NewCust.CUSTOMER.City) © 2002 by Prentice Hall
227
A View CRUD Functions –Read
SELECT CUSTOMER.CustomerID, CUSTOMER.Name FROM CUSTOMER, WORK WHERE CUSTOMER.CustomerID = WORK.CustomerID © 2002 by Prentice Hall
228
A View CRUD Functions –Update
INSERT INTO CUSTOMER (CUSTOMER.Name, CUSTOMER.City) VALUES (NewCust.CUSTOMER.Name, NewCust.CUSTOMER.City) © 2002 by Prentice Hall
229
A View CRUD Functions –Delete
Cascading deletions depend on relationship cardinality © 2002 by Prentice Hall
230
Form Design A form should...
Reflect the underlying structure of the view Make data associations graphically evident Encourage/Guide appropriate user action/response © 2002 by Prentice Hall
231
Graphical User Interface (GUI) Controls
Drop-down list box A drop-down list box provides a list of items from which the user may choose Option (or radio) button A set of option buttons allow the user to select one of a set of alternatives © 2002 by Prentice Hall
232
Graphical User Interface (GUI) Controls
Check box A check box allows the user to select or deselect the option. Cursor movement/Pervasive Keys Cursor movement defines the behavior of the cursor. The cursor should move naturally through the form. © 2002 by Prentice Hall
233
GUI Example © 2002 by Prentice Hall
234
Report Design The report should...
Reflect the underlying structure of the view Handle implied objects The implied objects are those real-world objects that provide meaning and purpose to the report and to the database application © 2002 by Prentice Hall
235
Enforcing Constraints within a Database Application
Domain constraints Uniqueness Referential integrity constraints Relationship cardinality Business rule Triggers © 2002 by Prentice Hall
236
Uniqueness Constraint
The uniqueness constraint determines if the value within the attribute must be unique for every tuple in the relation. Uniqueness is referred to as “no duplicates” within Microsoft Access © 2002 by Prentice Hall
237
Referential Integrity Constraint
Referential integrity defines the role and treatment of the foreign keys. For a foreign key to exist, the value of the foreign key must appear as a value in the primary key of the associated relation. © 2002 by Prentice Hall
238
Relationship Cardinality Constraint
Minimum relationship cardinality constraint Maximum relationship cardinality constraint © 2002 by Prentice Hall
239
Minimum Relationship Cardinality Constraint
The minimum relationship cardinality constraint defines whether participation in a relationship is mandatory or optional 0 = optional 1 = manditory A fragment is a parent that does not have a required child An orphan is a child that does not have a required parent © 2002 by Prentice Hall
240
Maximum Relationship Cardinality Constraint
The maximum relationship cardinality constraint defines the maximum level of participation in a relationship 1 = at most one N = zero or more © 2002 by Prentice Hall
241
The Relationship Between the Minimum and Maximum Relationship Cardinality Constraints
If the minimum cardinality constraint is optional (0), the maximum relationship cardinality constraint would mean: 1 = zero or one N = zero, one, or more © 2002 by Prentice Hall
242
The Relationship between the Minimum and Maximum Relationship Cardinality Constraints
If the minimum cardinality constraint is mandatory (1), the maximum relationship cardinality constraint would mean: 1 = one N = one or more © 2002 by Prentice Hall
243
Business Rule Constraints
Business rule constraints are those conditions that must be satisfied based on the rules, practices, and operating procedures of the organization. © 2002 by Prentice Hall
244
Triggers Triggers are stored procedures that are invoked based on an action. For instance a stored procedure may be invoked every time a record is added to the system. © 2002 by Prentice Hall
245
Security Functions within a Database Application
Typically, security exists on several levels within a database application To log into the system, the user needs an operating system (e.g., the Windows username/password) To log into the database, the user must supply a username and password To execute the database application, the user must be granted access to the appropriate application files © 2002 by Prentice Hall
246
Horizontal versus Vertical Security Schemes
Horizontal security refers to the practice of restricting access to certain tuples in the database. E.g., you may only see sales data in the NorthEast Vertical security refers to the practice of restricting access to certain columns in the database. E.g., you may only see the name and address fields © 2002 by Prentice Hall
247
Control Functions within a Database Application
Typically control functions are introduced into database applications through menus and by defining transaction boundaries. Using menus, the developer may control the access for a particular user. This access may change throughout a user’s session. Transaction boundaries are defined to coordinate user actions in a multi-user environment. © 2002 by Prentice Hall
248
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
249
Managing Multi-User Databases
Database Processing Eighth Edition Chapter 11 David M. Kroenke Managing Multi-User Databases © 2002 by Prentice Hall
250
Multi-User Databases Serving the needs of multiple users and multiple applications adds complexity in… design, development, and migration (future updates) © 2002 by Prentice Hall
251
Multi-User Database Issues include…
Interdependency Changes required by one user may impact others Concurrency People or applications may try to update the same information at the same time © 2002 by Prentice Hall
252
Multi-User Database Issues include… (continued)
Record Retention When information should be discarded Backup/Recovery How to protect yourself from losing critical information © 2002 by Prentice Hall
253
Common Multi-User DBMS
Windows 2000 Access 2000 SQL Server ORACLE UNIX ORACLE Sybase Informix © 2002 by Prentice Hall
254
Role of the Database Administrator
Organizations typically hire a database administrator (DBA) to handle the issues and complexities associated with multi-user databases. A DBA facilitates the development and use of one or more databases. © 2002 by Prentice Hall
255
Data Administrator versus Database Administrator
Handle the database functions and responsibilities for the entire organization. Data administrator responsibilities are discussed in Chapter 17. Database Administrator (DBA) Handle the functions associated with a specific database, including those applications served by the database. This chapter describes the responsibilities of the DBA. © 2002 by Prentice Hall
256
The Characteristics of a DBA
Technical The DBA is responsible for the performance and maintenance of one or more databases. Diplomatic The DBA must coordinate the efforts, requirements, and sometimes conflicting goals of various user groups to develop community-wide solutions. © 2002 by Prentice Hall
257
Technical Skills of the DBA
Managing the database structure Controlling concurrent processing Managing processing rights and responsibilities Developing database security Providing database recovery Managing the database management system (DBMS) Maintaining the data repository © 2002 by Prentice Hall
258
Managing the Database Structure
Managing the database structure includes configuration control and documentation regarding: The allocation of space Table creation Indices creation Storage procedures Trigger creation © 2002 by Prentice Hall
259
Configuration Control
The database configuration must reflect changes in organizational and user requirements Structural changes to the database often effect most, if not all, applications and users Sometimes configuration changes have unanticipated consequences Consequently, broad perspectives, careful analysis, and effective communication are essential. As well, the DBA must be prepared to debug and repair unforeseen issues. © 2002 by Prentice Hall
260
The Need for Documentation
When altering a databases structure, unanticipated issues are inevitable In recording the specific changes, dates, and times, it is easier to determine the root cause of issues and to resolve the issues When historical data is restored, it must be reformatted with all the changes in the database structure since the data was originally saved. © 2002 by Prentice Hall
261
Documentation All structural changes must be carefully documented with the following: Reason for change Who made the changes Specifically what was changed How and when the changes were implemented How were the changes tested and what were the results © 2002 by Prentice Hall
262
Documentation Aids Version Control and Computer Assisted Software Engineering (CASE) tools automate and/or manage many tedious documentation tasks. Printing the data dictionaries after structural changes also helps eliminate many tedious documentation tasks © 2002 by Prentice Hall
263
Controlling Concurrency Processing
Concurrency control ensures that one user’s actions do not adversely impact another user’s actions At the core of concurrency is accessibility. In one extreme, data becomes inaccessible once a user touches the data. This ensures that data that is being considered for update is not shown. In the other extreme, data is always readable. The data is even readable when it is locked for update. © 2002 by Prentice Hall
264
Aspects of Concurrency Control
Rollback/Commit: Ensuring all actions are successful before posting to the database Multitasking: Simultaneously serving multiple users Lost Updates: When one user’s action overwrites another user’s request © 2002 by Prentice Hall
265
Rollback/Commit A database operation typically involves several transactions. These transactions are atomic and are sometimes called logical units of work (LUW). Before an operation is committed to the database, all LUWs must be successfully completed. If one or more LUW is unsuccessful, a rollback is performed and no changes are saved to the database. © 2002 by Prentice Hall
266
Lost Update Problem If two or more users are attempting to update the same piece of data at the same time, it is possible that one update may overwrite another update. Resource locking scenarios are designed to address this problem © 2002 by Prentice Hall
267
Resource Locking A resource lock prevents a user from reading and/or writing to a piece of data The size of the piece of data (e.g., database, table, field) is termed the lock granularity © 2002 by Prentice Hall
268
Types of Resource Locks
Implicit versus Explicit Implicit locks are issued automatically by the DBMS based on an activity Explicit locks are issued by users requesting exclusive rights to the data Exclusive versus Shared An exclusive lock lock prevents others from reading or updating the data A shared lock allows others to read, but not update the data © 2002 by Prentice Hall
269
Two-Phased Resource Locking
Two-phased locking, whereby locks are obtained as they are needed A growing phase, whereby the transaction continues to request additional locks A shrinking phase, whereby the transaction begins to release the locks © 2002 by Prentice Hall
270
Deadlocks As a transaction begins to lock resources, it may have to wait for a particular resource to be released by another transaction. On occasions, two transactions maybe indefinitely waiting on one another to release resources. This condition is known as a deadlock or a deadly embrace. © 2002 by Prentice Hall
271
Avoiding Deadlocks Strategy 1:
Wait until all resources are available, then lock them all before beginning Strategy 2: Establish and use clear locking orders/sequences Strategy 3: Once detected, the DBMS will rollback one transaction © 2002 by Prentice Hall
272
Resource Locking Strategies
Optimistic Locking Read data Process transaction Issue update Look for conflict If conflict occurred, rollback and repeat or else commit Pessimistic Locking Lock required resources Read data Process transaction Issue update Release locks © 2002 by Prentice Hall
273
Consistent Transactions
Consistent transactions are often referred to by the acronym ACID Atomic Consistent Isolated Durable © 2002 by Prentice Hall
274
ACID: Atomic A transaction consists of a series of steps. Each step must be successful for the transaction to be saved. This ensures that the transaction completes everything it intended to do before saving the changes. © 2002 by Prentice Hall
275
ACID: Consistent No other transactions are permitted on the records until the current transaction finishes This ensures that the transaction integrity has statement level consistence among all records © 2002 by Prentice Hall
276
ACID: Isolation Within multi-user environments, different transactions may be operating on the same data. As such, the sequencing of uncommitted updates, rollbacks, and commits continuously change the data content. The 1992 ANSI SQL standards define four isolation levels and specify respective issues. © 2002 by Prentice Hall
277
Summary of Isolation Levels
© 2002 by Prentice Hall
278
ACID: Durable Durable transactions are saved to the data permanently
Interim calculations, views, and sub-queries are temporal rather than durable; that is to say that these temporal results are not saved © 2002 by Prentice Hall
279
Set-at-a-Time Versus Row-at-a-Time
SQL statements act as filters for the entire data set. A cursor may be defined within a SQL statement to point to a particular record. Several types of cursors have been defined. The cursor type defines how the cursor behaves. © 2002 by Prentice Hall
280
Types of Cursors © 2002 by Prentice Hall
281
Database Security Database security strives to ensure…
Only authorized users perform authorized activities at authorized times © 2002 by Prentice Hall
282
Managing Processing Rights and Responsibilities
Processing rights define who is permitted to do what, when The individuals performing these activities have full responsibility for the implications of their actions Individuals are identified by a username and a password © 2002 by Prentice Hall
283
Granting of Processing Rights
Database users are known as an individual and as a member of one or more role Access and processing rights/privileges may be granted to an individual and/or a role Users possess the compilation of rights granted to the individual and all the roles for which they are members © 2002 by Prentice Hall
284
Granting Privileges © 2002 by Prentice Hall
285
Providing Database Recovery
Common causes of database failures… Hardware failures Programming bugs Human errors/mistakes Malicious actions Since these issues are impossible to completely avoid, recovery procedures are essential © 2002 by Prentice Hall
286
Database Recovery Characteristics
Continuing business operations (Fall-back procedures/Continuity planning) Restore from backup Replay database activities since backup was originally made © 2002 by Prentice Hall
287
Fall-back Procedures/ Continuity Planning
The business will continue to operate even when the database is inaccessible The fall-back procedure defines how the organization will continue operations Careful attention must be paid to… saving essential data continuing to provide quality service © 2002 by Prentice Hall
288
Restoring from Backup In the event that the system must be rebuilt or reloaded, the database is restored from the last full backup. Since it is inevitable that activities occurred since the last full backup was made, subsequent activities must be replayed/restored. © 2002 by Prentice Hall
289
Recovery via Reprocessing
This is a brunt-force technique. Simply re-type all activities since the backup was performed. This procedure is costly because of the effort involved in re-entering the data. This procedure is risky in that human error is likely and in that paper record-keeping may not be accurate. © 2002 by Prentice Hall
290
Recovery via Rollback/Rollforward
Most database management systems provide a mechanism to record activities into a log file. © 2002 by Prentice Hall
291
Rollforward Activities recorded in the log files may be replayed. In doing so, all activities are re-applied to the database. This procedure is used to resynchronize restored database data. This procedure is termed a Rollforward. © 2002 by Prentice Hall
292
Rollback Since log files save activities in sequence order, it is possible to undo activities in reserve order that they were originally executed. This is performed to correct/undo erroneous or malicious transaction(s). This procedure is known as a Rollback. © 2002 by Prentice Hall
293
Managing the Database Management System (DBMS)
In addition to controlling and maintaining the users and the data, the DBA must also maintain and monitor the DBMS itself. Performance statistics (performance tuning/optimizing) System and data integrity Establishing, configuring, and maintaining database features and utilities © 2002 by Prentice Hall
294
Maintaining the Data Repository
The data repository contains metadata. Metadata is data about data. The data repository specifies the name, type, size, format, structure, definitions, and relationships among the data. They also contain the details about applications, users, add-on products, etc. © 2002 by Prentice Hall
295
Types of Data Repositories
Active data repository The development and management tools automatically maintain and upkeep the metadata. Passive data repository People manually maintain and upkeep the metadata © 2002 by Prentice Hall
296
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
297
Managing Databases with Oracle
Database Processing Eighth Edition Chapter 12 David M. Kroenke Managing Databases with Oracle © 2002 by Prentice Hall
298
What is Oracle? Oracle is the world’s most popular DBMS that…
Is extremely powerful and robust Runs on many different operating systems Can be configured and tailored Operates with most, if not all, add-on products © 2002 by Prentice Hall
299
Oracle Complexity The power and flexibility of Oracle makes it very complex: Installations are difficult The configuration options are numerous System requirements are high System maintenance is complex © 2002 by Prentice Hall
300
The Language of Oracle… SQL Plus
SQL Plus is used in Oracle to: Define the structure of a database and the definition of the data Insert, delete, and modify data Define the behavior of the system through stored procedures and triggers Retrieve data and generate reports © 2002 by Prentice Hall
301
Gaining Access to SQL Plus
To gain access to SQL Plus, you will need a username and password and possibly a host string (depending on your system configuration) When Oracle is first installed, it establishes several default accounts, namely... internal/oracle (a privileged account) sys/change_on_install (a privileged account) system/manager (a privileged account) scott/tiger (a non-privileged account) © 2002 by Prentice Hall
302
Creating the Database Ways to create an Oracle database:
Using SQL Plus Start button –> Programs –> Oracle –OraHome81 –> Applications Development –> SQL Plus Using Oracle’s Database Configuration Assistant Start button –> Programs –> Oracle –OraHome81 –> Database Administration –> Database Configuration Assistant © 2002 by Prentice Hall
303
Entering SQL Plus Commands
The SQL Plus Buffer As a user types commands, the commands are saved into the SQL Plus buffer. The SQL Plus Editor Users may edit or alter SQL Plus commands using a text editor. © 2002 by Prentice Hall
304
SQL Plus Buffer Commands
SQL Plus is not case sensitive (except within quotation marks). List – displays the content of the SQL Plus buffer List n – display line number n and changes the current line number to n Change – performs a search/replace operation for the current line number Semi-colon (;) or slash (/) executes © 2002 by Prentice Hall
305
SQL Plus Editor The SQL Plus Edit command will launch the SQL Plus text editor After the SQL statement is complete and correct, exit the editor To execute the statement, type the slash key (/) at the SQL prompt To retrieve an existing SQL file: SQL> Edit file1.sql © 2002 by Prentice Hall
306
SQL Plus Commands Desc – lists the fields in the specified table
Select – retrieve data Create – create objects Drop – delete objects Alter – change objects Insert – input data Delete – delete data Update – change data © 2002 by Prentice Hall
307
Select Syntax Select field1, field2 From table_a, table_b;
© 2002 by Prentice Hall
308
Create Syntax Create Table tablename (
field1 data_type(size) NOT NULL, field2 data_type (size) NULL); Create Sequence tableID Increment by 1 start with 1000; (this command creates a counter that automatically increments for each new record –does not ensure uniqueness) © 2002 by Prentice Hall
309
Alter Table Syntax Alter Table tablename1
Add Constraint FieldPK Primary Key (Field1, Field2); Alter Table tablename2 Add Constraint FieldFK Foreign Key (Field1, Field2) references tablename1 On Delete Cascade; © 2002 by Prentice Hall
310
Insert Syntax Insert into tablename (fieldID, field2) Values
(fieldID.NextVal, ‘data content’); © 2002 by Prentice Hall
311
Drop Syntax Drop Table tablename; Drop Sequence fieldID;
© 2002 by Prentice Hall
312
Indexes Indexes are used to enforce uniqueness and to enable fast retrieval of data. Create Unique Index fieldIndex on Table(field1, field2); © 2002 by Prentice Hall
313
Changing the Table Structures
Alter Table tablename add field4 datatype (size); Alter Table tablename Drop Column field2; –you will permanently lose the data in field2 Alter Table tablename Modify field3 not null; © 2002 by Prentice Hall
314
Changing the Data… Update Syntax
Update tablename Set field1 = ‘value_a’ Where field3 = value; © 2002 by Prentice Hall
315
Check Constraint Provide a list of valid values or a valid range…
Create Table tablename ( Field1 datatype (size) Not Null, Field2 datatype (size) Null Check (field2 in (‘value_a’, ‘value_b’))); © 2002 by Prentice Hall
316
Check Constraint Alter Table tablename Add Constraint DateChk Check (DateField1 <= DateField2); Alter Table tablename Add Constraint NumRange Check (Field1 Between 180 and 400); Alter Table tablename Drop Constraint constraintname; © 2002 by Prentice Hall
317
Views Displaying the data from the database just the way a user wants it… Create View View1 As Select * From Tablename With Read Only; © 2002 by Prentice Hall
318
PL/SQL Allowing SQL to act more like a programming language.
Row-at-a-time versus set-at-a-time. PL/SQL permits Cursors A stored procedure is a PL/SQL (or other program) stored in the database. A stored procedure may have parameters. © 2002 by Prentice Hall
319
PL/SQL Parameter Types
IN – specifies the input parameters OUT – specifies the output parameters IN OUT – a parameter that may be an input or an output © 2002 by Prentice Hall
320
PL/SQL Code Variables are declared following the AS keyword
The assignment operator is := as follows variable1 := ‘value’ Comments in PL/SQL are enclosed between /* and */ as follows… /* This is a comment */ © 2002 by Prentice Hall
321
PL/SQL Control Structures
FOR variable IN list_of_values LOOP Instructions END LOOP; IF condition THEN BEGIN END; © 2002 by Prentice Hall
322
Saving, Compiling, and Executing PL/SQL Code
The last line in the PL/SQL procedure should be a slash (/). The procedure must be saved to a file To compile the procedure, type the keyword Start, followed by the procedure filename START MyProg.SQL To see any reported errors, type SHOW ERRORS; To execute the procedure type EXEC MyProg (‘parameter1’, ‘parameter2’); © 2002 by Prentice Hall
323
Triggers A trigger is a stored procedure that is automatic invoked by Oracle when a specified activity occurs A trigger is defined relative to the activity which invoked the trigger BEFORE – execute the stored procedure prior to the activity AFTER – execute the stored procedure after the activity INSTEAD OF – execute the stored procedure in lue of the activity © 2002 by Prentice Hall
324
Trigger Example Create or Replace Trigger triggername
Before Insert or Update of fieldname on tablename For Each Row Begin /* instructions */ End; © 2002 by Prentice Hall
325
A Trigger Knows the Old and New Values for Fields
The variable :new.fieldname1 stores the new information for fieldname1 as entered by the user. The variable :old.fieldname1 stores the information in fieldname1 prior to the user’s request. © 2002 by Prentice Hall
326
Activating a Trigger The trigger must be saved to a file
To compile the trigger, type the keyword Start, followed by the trigger filename START MyTrigger.SQL To see any reported errors, type SHOW ERRORS; If no errors were encountered, the trigger is automatically activated © 2002 by Prentice Hall
327
Data Dictionary The data dictionary contains information that Oracle knows about itself… the metadata. It includes information regarding just about everything in the database including the structure and definition of tables, sequences, triggers, indexes, views, stored procedures, etc. The data dictionary table names are stored in the DICT table. © 2002 by Prentice Hall
328
Concurrency Control Since Oracle only reads committed changes, dirty reads and lost updates are avoided Transaction isolation levels: Read Committed Serializable Read-only Explicit Locks © 2002 by Prentice Hall
329
Read Committed Transaction Isolation
Reads may not be repeatable (2 reads may result in 2 data values, based on timing of updates and reads) Phantoms are possible (data from a read may be deleted after the read occurred) Uses exclusive locks Deadlocks are possible and are resolved by rolling-back one of the transactions © 2002 by Prentice Hall
330
Serializable Transaction Isolation
Reads are always repeatable Phantoms are avoided Must issue the following command: Set Transaction Isolation Level Serializable; or Alter Sessions Set Isolation Level Serializable; Coordinates activities in submission order. When this coordination detects difficulties, the application program(s) must intervene. © 2002 by Prentice Hall
331
Read-only Transaction Isolation
An Oracle-only isolation level No inserting, updating, or deleting is permitted © 2002 by Prentice Hall
332
Explicit Locking Not recommended
Oracle does not promote locks. As a result, a table may have many, many locks within it. Oracle manages these locks transparently. Issuing explicit locks may interfere with these transparent locks. © 2002 by Prentice Hall
333
Oracle Security Username and Password is used to manage DBMS access
Users may be assigned to one or more profiles Oracle provides extensive resource limitations and access rights. These restrictions may be applied to users or profiles. The SQL Grant operator provides additional access rights The SQL Revoke operator remove access rights © 2002 by Prentice Hall
334
Backup/Recovery Committed changes are saved to destination Tablespaces. Uncommitted changes are saved in the Rollback Tablespace. Redo Log files save all changes made in the Tablespaces. To start and/or recover from a system failure, the Control Files are read. © 2002 by Prentice Hall
335
Archivelog If Oracle is running in ARCHIVELOG mode, backup copies are made of the redo log files. Otherwise, the redo log files are periodically overwritten with new information. © 2002 by Prentice Hall
336
Types of Failures Application Failure
When a program bug is encountered or when a program does not correctly respond to current system conditions. Instance Failure When Oracle is unable to do what it needs to do. Media Failure When a disk becomes inaccessible to Oracle. © 2002 by Prentice Hall
337
Recovery of an Application Failure
Oracle rolls back uncommitted changes. © 2002 by Prentice Hall
338
Recovery of an Instance Failure
Oracle would be restarted using the following sequence… Read the Control File Restore system to last known valid state Roll forward changes not in system (replay the Redo Log Files) © 2002 by Prentice Hall
339
Recovery from a Media Failure
Restore system from Backup Read the Control File Roll forward changes using the Archive Log Files (from the ARCHIVELOG) Roll forward changes from the on-line Log Files (the most recent versions of the Logs) © 2002 by Prentice Hall
340
Types of Recoveries Consistent Backup
After the restoration, delete all uncommitted activities This ensures consistency, may lose recent changes Inconsistent Backup After the restoration, all uncommitted activities remain © 2002 by Prentice Hall
341
Networks, Multi-Tier Architectures, and XML
Database Processing Eighth Edition Chapter 14 David M. Kroenke Networks, Multi-Tier Architectures, and XML © 2002 by Prentice Hall
342
Networks A network is a collection of computers that communicate with one another using standard sets of rules, called protocols Common Network Environments: Internet Intranet Wireless Network Access © 2002 by Prentice Hall
343
The Internet Internet - a publicly accessible network of networks spanning the globe Uses communications protocol called Transmission Control Program/Internet Protocol (TCP/IP) © 2002 by Prentice Hall
344
Key Dates for the Internet
The Internet was born in the 1960’s by the US armed services and was called ARPANET HTTP: HyperText Transfer Protocol (used to create Web Pages) was created in 1989 by CERN Key HTTP characteristics: Request-based (waits for user action) Stateless (does not sequence or remember activities) © 2002 by Prentice Hall
345
HTTP: Stateless Property
In applications development, you may often wish to save the application state. Several Internet tools exist to help accomplish this: Microsoft Internet Information Server (IIS) Microsoft Active Server Pages (ASP) Java Servlets with Java ServerPages (JSP) © 2002 by Prentice Hall
346
The Intranet Some organizations use Internet technologies to create their own privately accessible network called an intranet. If a connection to the Internet does exists, it does so through a firewall An intranet is almost always faster than the Internet © 2002 by Prentice Hall
347
Firewall Firewall - a security gateway that protects an organization from unauthorized access via the Internet Consists of software and sometimes hardware components © 2002 by Prentice Hall
348
Wireless Network Access
Due to less reliability, inferior screen displays, and slower transfer rates, the traditional wired protocols are not appropriate for wireless environments. A few protocols have been developed which allow wireless devices to communicate via the Internet: Wireless Application Protocol (WAP) Wireless Markup Language (WML) © 2002 by Prentice Hall
349
Multi-tier Architectures
Tiers are the number of computers (serving a like function) that a user must use to satisfy his/her request. Common tiers include Web server and database server. © 2002 by Prentice Hall
350
A Three-Tier Architecture
© 2002 by Prentice Hall
351
Functions of Tiers © 2002 by Prentice Hall
352
Processing at the Different Tiers
Since each tier serves a different function, each tier may have a different operating system and different application software offerings. © 2002 by Prentice Hall
353
Processing Client Processing
Using the browser (e.g., Netscape Navigator) Server Processing Using Server Software (e.g., ASP) © 2002 by Prentice Hall
354
Windows 2000 Web Server Languages
JavaScript VBScript Perl ActiveX Control Java © 2002 by Prentice Hall
355
Standards & Languages Common With MS Web Server
© 2002 by Prentice Hall
356
Unix/Linux Web Server Environment
JavaScript Java Applets Java Servlets Java Server Pages Perl Java CGI © 2002 by Prentice Hall
357
N-Tier Processing The 3-Tier architecture may be extended to include additional tiers. This produces a distributed processing model using various servers on the Internet © 2002 by Prentice Hall
358
Markup Languages Markup Languages are used to specify the appearance and behavior of Web Pages Markup Language flavors: HTML – a subset of the SGML DHTML RDS/ADO XML © 2002 by Prentice Hall
359
HTML HyperText Markup Language PROS Simple Standardized CONS
Static content Limited connectivity Mixed structure/content © 2002 by Prentice Hall
360
DHTML Dynamic HyperText Markup Language
Encapsulates the entire HTML command set Provides access to objects on the page using the Document Object Model (DOM) Allows for Cascading Style Sheets (CSS) © 2002 by Prentice Hall
361
Data Services Data services allow Web pages to exchange data with databases RDS is a set of ObjectX controls The data exchanges must be relatively simple ADO is a set of ActiveX Data Controls These data exchanges may be more complex © 2002 by Prentice Hall
362
Extensible Markup Language –XML
XML clearly separates content from structure and allows developers to easily define their own elements. Rather than hard-coding Web pages, you create rules that govern how the document should look. Then merge the structure and the content files. So, the very nature of XML is dynamic. © 2002 by Prentice Hall
363
Document Type Declaration –DTD
A DTD defines the data content and may provide the data values While a DTD is desirable, it is not mandatory XML documents using DTDs are termed type-valid documents XML documents not using DTDs are termed not-type-valid documents © 2002 by Prentice Hall
364
XML & CSS Similar to DHTML, Cascading Style Sheets (CSS) may be used with XML documents to present a consistent, standardized Web site. © 2002 by Prentice Hall
365
Extensible Style Language Transformation –XMLT
XMLT is used to transform one document into another document © 2002 by Prentice Hall
366
XML Schema XML Schema is the next generation of DTD
The schema itself is an XML document A W3 standard is currently being developed A document that conforms to an XML Schema is termed schema-valid. © 2002 by Prentice Hall
367
XML Schema Concepts Simple Elements Consist of a single content value
Complex Elements Consist of multiple content values © 2002 by Prentice Hall
368
XML Namespaces Namespaces define where to look for files
An XML document may have: Up to one default namespace Many labeled namespaces Naming conventions: Must be unique within all schemas Typically resembles a URL, but is not a URL © 2002 by Prentice Hall
369
Wireless Application Protocol (WAP)
WAP has been developed to facilitate Web development for wireless devices such as Personal Data Assistants (PDA) or cellular phones © 2002 by Prentice Hall
370
WAP Server A WAP Server transforms XML documents into Wireless Markup Language (WML) – WML is a subset of XML A WML Scripting Language also exists © 2002 by Prentice Hall
371
XML and Database Applications
Any document that can process a DTD or XML Schema document can correctly interpret any arbitrary database view XML can easily process multiple multi-valued paths (several SQL statements would be required) © 2002 by Prentice Hall
372
XML and Database Applications
The separation of structure and content allows for: The same data to be displayed in many different ways The same structure (report) may be regenerated many times with different/updated data. Permits document validation checking © 2002 by Prentice Hall
373
OASIS Document structures may be published and made publicly available
Organization for the Advancement of Structured Information Standards (OASIS): A clearinghouse for XML publications and schema standards © 2002 by Prentice Hall
374
DBMS Integration of XML
Oracle XML DOM parser Xpath XSQL SQL Server ADO ASP © 2002 by Prentice Hall
375
Web-based databases © 2002 by Prentice Hall
376
Types of databases Textual databases Semi-structured databases
© 2002 by Prentice Hall
377
Indexing textual data Inverted files Boolean queries Signature files
Signature S1 matches signature S2 if S2&S1=S2 © 2002 by Prentice Hall
378
XML-QL © 2002 by Prentice Hall
379
XML-QL Two slides from Johannes Gehrke, Cornell University
<IMG SRC=“xysq.gif” ALT=“(x+y)^2”> <apply> <power/> <apply> <plus/> <ci>x</ci> <ci>y</ci> </apply> <cn>2</cn> </apply> WHERE <BOOK> <NAME><LAST>$1</LAST></NAME> </BOOK> in “ CONSTRUCT <RESULT> $1 </RESULT> © 2002 by Prentice Hall
380
XML-QL (continued) WHERE <BOOK> $b <BOOK> IN “ <AUTHOR> $n </AUTHOR> <PUBLISHED> $p </PUBLISHED> in $e CONSTRUCT <RESULT> <PUBLISHED> $p </PUBLISHED> WHERE <LAST> $l </LAST> IN $n CONSTRUCT <LAST> $l </LAST> </RESULT> © 2002 by Prentice Hall
381
XML-QL (continued) <!ELEMENT book (author+, title, publisher)>
<!ATTLIST book year CDATA> <!ELEMENT article (author+, title, year?, (shortversion|longversion))> <!ATTLIST article type CDATA> <!ELEMENT publisher (name, address)> <!ELEMENT author (firstname?, lastname)> © 2002 by Prentice Hall
382
XML-QL (continued) WHERE <book>
<publisher><name>Addison-Wesley</name></publisher> <title> $t</title> <author> $a</author> </book> IN " CONSTRUCT $a © 2002 by Prentice Hall
383
XML-QL (continued) WHERE <book>
<publisher><name>Addison-Wesley</></> <title> $t</> <author> $a</> </> IN " CONSTRUCT $a © 2002 by Prentice Hall
384
XML-QL (continued) WHERE <book>
<publisher><name>Addison-Wesley</></> <title> $t</> <author> $a</> </> IN " CONSTRUCT <result> </> © 2002 by Prentice Hall
385
XML-QL (continued) <bib> <book year="1995">
<!-- A good introductory text --> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> <book year="1998"> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> <author> <lastname> Darwen </lastname> </author> </bib> © 2002 by Prentice Hall
386
XML-QL (continued) <result>
<author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> <author> <lastname> Darwen </lastname> </author> © 2002 by Prentice Hall
387
XML-QL (continued) WHERE <book > $p</> IN " <title > $t</>, <publisher><name>Addison-Wesley</>> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a</> </> © 2002 by Prentice Hall
388
XML-QL (continued) <result>
<title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> </result> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> <author> <lastname> Darwen </lastname> </author> © 2002 by Prentice Hall
389
XML-QL (continued) WHERE <article> <author>
<firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </> </> CONTENT_AS $a IN " <book year=$y> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </> IN " y > 1995 CONSTRUCT <article> $a </> © 2002 by Prentice Hall
390
XML-QL (continued) © 2002 by Prentice Hall
391
XML-QL (continued) <!ATTLIST person ID ID #REQUIRED>
<!ATTLIST article author IDREFS #IMPLIED> © 2002 by Prentice Hall
392
XML-QL (continued) <person ID="o123">
<firstname>John</firstname> <lastname>Smith<lastname> </person> <person ID="o234"> . . . <article author="o123 o234"> <title> ... </title> <year> 1995 </year> </article> © 2002 by Prentice Hall
393
XML-QL (continued) © 2002 by Prentice Hall
394
XML-QL (continued) WHERE <article author=$i>
WHERE <article><author><lastname> $n</></></> IN "abc.xml” WHERE <article author=$i> <title> </> ELEMENT_AS $t </>, <person ID=$i> <lastname> </> ELEMENT_AS $l </> CONSTRUCT <result> $t $l</> © 2002 by Prentice Hall
395
Scalar values <title>A Trip to <titlepart> the Moon </titlepart></title> NOT! <title><CDATA> A Trip to </CDATA><titlepart><CDATA> the Moon</CDATA></titlepart></title> YES © 2002 by Prentice Hall
396
Tag variables WHERE <$p> <title> $t </title>
<year>1995</> <$e> Smith </> </> IN " $e IN {author, editor} CONSTRUCT <$p> </> © 2002 by Prentice Hall
397
Transforming data <!ELEMENT book (author+, title, publisher)>
<!ATTLIST book year CDATA> <!ELEMENT article (author+, title, year?, (shortversion|longversion))> <!ATTLIST article type CDATA> <!ELEMENT publisher (name, address)> <!ELEMENT author (firstname?, lastname)> <!ELEMENT person (lastname, firstname, address?, phone?, publicationtitle*)> © 2002 by Prentice Hall
398
Transforming data (cont’d)
WHERE <$> <author> <firstname> $fn </> <lastname> $ln </> </> <title> $t </> </> IN " CONSTRUCT <person ID=PersonID($fn, $ln)> <firstname> $fn </> <publicationtitle> $t </> © 2002 by Prentice Hall
399
Integrating data from different sources
WHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN " <taxpayer> <income></> ELEMENT_AS $i </> IN " CONSTRUCT <result> $n $i </> © 2002 by Prentice Hall
400
Query blocks WHERE <$e> <title> $t </>
<year> 1995 </> </> CONTENT_A $p IN " CONSTRUCT <result ID=ResultID($p)> <title> $t </> </> { WHERE $e = "journal-paper", <month> $m </> IN $p CONSTRUCT <result ID=ResultID($p)> <month> $m </> </> } { WHERE $e = "book", <publisher>$q </> IN $p CONSTRUCT <result ID=ResultID($p)> <publisher>$q </> </> © 2002 by Prentice Hall
401
WSQ © 2002 by Prentice Hall
402
Web-supported queries
SIGMOD2000 (Goldman and Widom) WebPages (SearchExp,T1,T2,…,Tn,URL,Rank, Date) SELECT NAME, COUNT FROM STATES, WEBCOUNT WHERE NAME = T1 ORDER BY COUNT DESC © 2002 by Prentice Hall
403
KDD: Data Mining © 2002 by Prentice Hall
404
The big problem Billions of records
A small number of interesting patterns “Data rich but information poor” © 2002 by Prentice Hall
405
Data mining Knowledge discovery Knowledge extraction
Data/pattern analysis © 2002 by Prentice Hall
406
Types of source data Relational databases Transactional databases
Web logs Textual databases © 2002 by Prentice Hall
407
Association rules 65% of all customers who buy beer and tomato sauce also buy pasta and chicken wings Association rules: X Y © 2002 by Prentice Hall
408
Association analysis IF 20 < age < 30 AND
20K < INCOME < 30K THEN Buys (“CD player”) SUPPORT = 2%, CONFIDENCE = 60% © 2002 by Prentice Hall
409
Basic concepts Minimum support threshold Minimum confidence threshold
Itemsets Occurrence frequency of an itemset © 2002 by Prentice Hall
410
Association rule mining
Find all frequent itemsets Generate strong association rules from the frequent itemsets © 2002 by Prentice Hall
411
Support and confidence
Support (X) Confidence (X Y) = Support(X+Y) / Support (X) © 2002 by Prentice Hall
412
Example TID List of items IDs T100 I1, I2, I5 T200 I2, I4 T300 I2, I3
I1, I2, I3, I5 T900 I1, I2, I3 © 2002 by Prentice Hall
413
Example (cont’d) Frequent itemset l = {I1, I2, I5}
I1 AND I2 I C = 2/4 = 50% I1 AND I5 I2 I2 AND I5 I1 I1 I2 AND I5 I2 I1 AND I5 I3 I1 AND I2 © 2002 by Prentice Hall
414
Example 2 TID date items T100 10/15/99 {K, A, D, B} T200
{D, A, C, E, B} T300 10/19/99 {C, A, B, E} T400 10/22/99 {B, A, D} min_sup = 60%, min_conf = 80% © 2002 by Prentice Hall
415
Correlations Corr (A,B) = P (A OR B) / P(A) P (B)
If Corr < 1: A discourages B (negative correlation) (lift of the association rule A B) © 2002 by Prentice Hall
416
Contingency table Game ^Game Sum Video 4,000 3,500 7,500 ^Video 2,000
2,500 6,000 10,000 © 2002 by Prentice Hall
417
Example P({game}) = 0.60 P({video}) = 0.75 P({game,video}) = 0.40
P({game,video})/(P({game})x(P({video})) = 0.40/(0.60 x 0.75) = 0.89 © 2002 by Prentice Hall
418
Example 2 hotdogs ^hotdogs Sum hamburgers 2000 500 2500 ^hamburgers
1000 1500 3000 5000 © 2002 by Prentice Hall
419
Classification using decision trees
Expected information need I (s1, s2, …, sm) = pi log (pi) s = data samples m = number of classes S © 2002 by Prentice Hall
420
RID Age Income student credit buys?
1 <= 30 High No Fair 2 Excellent 3 Yes 4 > 40 Medium 5 Low 6 7 8 9 10 11 12 13 14 no excellent © 2002 by Prentice Hall
421
Decision tree induction
I(s1,s2) = I(9,5) = = - 9/14 log 9/14 – 5/14 log 5/14 = = 0.940 © 2002 by Prentice Hall
422
Entropy and information gain
S S1j + … + smj E(A) = I (s1j,…,smj) s Entropy = expected information based on the partitioning into subsets by A Gain (A) = I (s1,s2,…,sm) – E(A) © 2002 by Prentice Hall
423
Entropy Age <= 30 s11 = 2, s21 = 3, I(s11, s21) = 0.971
Age in s12 = 4, s22 = 0, I (s12,s22) = 0 Age > 40 s13 = 3, s23 = 2, I (s13,s23) = 0.971 © 2002 by Prentice Hall
424
Entropy (cont’d) E (age) = 5/14 I (s11,s21) + 4/14 I (s12,s22) + 5/14 I (S13,s23) = 0.694 Gain (age) = I (s1,s2) – E(age) = 0.246 Gain (income) = 0.029, Gain (student) = 0.151, Gain (credit) = 0.048 © 2002 by Prentice Hall
425
Final decision tree age student credit yes no yes no yes > 40
student credit yes no yes excellent fair no yes no yes © 2002 by Prentice Hall
426
Other techniques Bayesian classifiers
X: age <=30, income = medium, student = yes, credit = fair P(yes) = 9/14 = 0.643 P(no) = 5/14 = 0.357 © 2002 by Prentice Hall
427
Example P (age < 30 | yes) = 2/9 = P (age < 30 | no) = 3/5 = P (income = medium | yes) = 4/9 = P (income = medium | no) = 2/5 = P (student = yes | yes) = 6/9 = P (student = yes | no) = 1/5 = P (credit = fair | yes) = 6/9 = P (credit = fair | no) = 2/5 = 0.400 © 2002 by Prentice Hall
428
Example (cont’d) P (X | yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P (X | no) = x x x = 0.019 P (X | yes) P (yes) = x = 0.028 P (X | no) P (no) = x = 0.007 Answer: yes/no? © 2002 by Prentice Hall
429
More types of data mining
Classification and prediction Cluster analysis Outlier analysis Evolution analysis © 2002 by Prentice Hall
430
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
431
XQuery © 2002 by Prentice Hall
432
Background Successor to XML-QL, YAML, Lorel, Quilt
Supported by the W3C Draft only © 2002 by Prentice Hall
433
DTD <!ELEMENT bib (book* )>
<!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED > <!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )> <!ELEMENT price (#PCDATA )> © 2002 by Prentice Hall
434
Sample database © 2002 by Prentice Hall <bib>
<book year="1994"> <title>TCP/IP Illustrated</title> <author> <last>Stevens</last> <first>W.</first> </author> <publisher>Addison-Wesley</publisher> <price> 65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <price>65.95</price> <book year="2000"> <title>Data on the Web</title> <last>Abiteboul</last> <first>Serge</first></author> <last>Buneman</last> <first>Peter</first> <last>Suciu</last> <first>Dan</first> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </bib> © 2002 by Prentice Hall
435
Sample query <bib> {
for $b in document(" where $b/publisher = "Addison-Wesley“ and > 1991 return <book year="{ }"> { $b/title } </book> } </bib> © 2002 by Prentice Hall
436
Expected result <bib> <book year="1994">
<title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </bib> © 2002 by Prentice Hall
437
Pointers and Demos http://www.w3.org/TR/xquery/
© 2002 by Prentice Hall
438
Database Application Design
Winter 2004 Dragomir R. Radev © 2002 by Prentice Hall
439
Data Mining (continued)
© 2002 by Prentice Hall
440
arff files @relation weather
@attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no © 2002 by Prentice Hall
441
Predictive models Inputs (e.g., medical history, age)
Output (e.g., will patient experience any side effects) Some models are better than others © 2002 by Prentice Hall
442
Operating curves success optimal practical random failure most likely
least likely © 2002 by Prentice Hall
443
Principles of data mining
Training/test sets Error analysis and overfitting Cross-validation Supervised vs. unsupervised methods error test training input size © 2002 by Prentice Hall
444
Representing data Vector space credit pay off default salary
© 2002 by Prentice Hall
445
Decision surfaces credit pay off default salary
© 2002 by Prentice Hall
446
Decision trees credit pay off default salary © 2002 by Prentice Hall
447
Linear boundary credit pay off default salary © 2002 by Prentice Hall
448
kNN models Assign each element to the closest cluster Demos:
© 2002 by Prentice Hall
449
Other methods Decision trees Neural networks Support vector machines
Demos © 2002 by Prentice Hall
450
arff files @relation weather
@attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no © 2002 by Prentice Hall
451
Weka http://www.cs.waikato.ac.nz/ml/weka Methods: rules.ZeroR
bayes.NaiveBayes trees.j48.J48 lazy.IBk trees.DecisionStump © 2002 by Prentice Hall
452
kMeans clustering java weka.clusterers.SimpleKMeans -t data/weather.arff © 2002 by Prentice Hall
453
More useful pointers http://www.kdnuggets.com/
© 2002 by Prentice Hall
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.