CS 185C/286: The History of Computing October 31 Class Meeting Department of Computer Science San Jose State University Fall 2011 Instructor: Ron Mak
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 2 Don Chamberlin History of Computing Speaker Wednesday, Nov. 2, 6:00-7:00 PM Auditorium ENGR 189 Reception before the talk in ENGR 294 at 5:00 PM “Fifty Years of Data: How Advances in Database Management Have Helped to Shape Our World” Co-inventor of the SQL and XQuery database languages
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 3 What is a Database? A collection of information that lasts over a long period of time Can be accessed simultaneously by multiple instances of an application or by instances of many applications Managed by a database management system (DBMS) _
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 4 Database Management System (DBMS) Users can create new databases Specify the structure of the data (schema) Users can query (ask questions about) the data Users can modify the data Store large amounts (terabytes) of data Store data for a long time (many years) Ensure reliability Recover from errors and failures Ensure data integrity Maintain proper relationships among data
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 5 DBMS, cont’d Control access to data by multiple users and applications Ensure data operations are completed (atomicity) roll back partially completed operations Maintain a data model, which determines: structure of the data operations on the data constraints on the data Types of data models - hierarchical - relational - object-oriented - object-relational
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 6 The Relational Data Model Data element: a value that is stored in the database values are typed a value can be null Entity: a group of data elements that together are meaningful for a person or an application Each data element is the value of an attribute of the entity _
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 7 The Relational Data Model, cont’d Table: a conceptual two-dimensional structure that contains entities of a particular type. Also called a relation Each row (also called a record) contains the attribute values of one entity. Each column (also called a field) holds an attribute value. Table relation Row entity Rows and columns records and fields
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 8 Logical Data Model Initial version IdNameClass_codeSubjectRoom 7003Rogers, Tom926Java programming Thompson, Art908Data structures Lane, John951Software engineering Lane, John974Operating systems Flynn, Mabel931Compilers222 John Lane teaches two classes. Each table has a primary key (PK) field whose value in each record uniquely identifies that record. IdNameTeacher_id_1Teacher_id_2Teacher_id_3 1001Doe, John Novak, Tim null 1009Klein, Leslienull 1014Jane, Mary7051null 1021Smith, Kim Student Teacher Student id name which teachers Teacher id name which classes taught Class class code subject name class room number PK
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 9 Normalization Relational tables need to be normalized. Improve the stability of the model More resilient to change Faster record insertions and updates Improve data quality There are six normal forms, but we will only consider the first two. Each normal form includes the lower normal forms Example: A database in second normal form is also in first normal form.
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 10 First Normal Form (1NF) Separate multi-valued data elements. Break the name fields into last name and first name fields. IdLastFirstTeacher_id_1Teacher_id_2Teacher_id_3 1001DoeJohn NovakTim null 1009KleinLeslienull 1014JaneMary7051null 1021SmithKim IdLastFirstClass_codeSubjectRoom 7003RogersTom926Java programming ThompsonArt908Data structures LaneJohn951Software engineering LaneJohn974Operating systems FlynnMabel931Compilers222 Student Teacher IdNameTeacher_id_1Teacher_id_2Teacher_id_3 1001Doe, John Novak, Tim null 1009Klein, Leslienull 1014Jane, Mary7051null 1021Smith, Kim IdNameClass_codeSubjectRoom 7003Rogers, Tom926Java programming Thompson, Art908Data structures Lane, John951Software engineering Lane, John974Operating systems Flynn, Mabel931Compilers222
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 11 First Normal Form, cont’d Move repeating data elements to a new table. IdLastFirst 1001DoeJohn 1005NovakTim 1009KleinLeslie 1014JaneMary 1021SmithKim Student_idTeacher_id Linking table IdLastFirstClass_codeSubjectRoom 7003RogersTom926Java programming ThompsonArt908Data structures LaneJohn951Software engineering LaneJohn974Operating systems FlynnMabel931Compilers222 Student Teacher Student_Teacher IdLastFirstTeacher_id_1Teacher_id_2Teacher_id_3 1001DoeJohn NovakTim null 1009KleinLeslienull 1014JaneMary7051null 1021SmithKim
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 12 Problem! Suppose Prof. Lane decides he doesn’t want to teach Operating Systems anymore and we delete that row. What other information do we lose as a result? We lose the fact that the class is taught in Room 109. The problem arises because the Teacher table really contains two separate sets of data: teacher data and class data IdLastFirstClass_codeSubjectRoom 7003RogersTom926Java programming ThompsonArt908Data structures LaneJohn951Software engineering LaneJohn974Operating systems FlynnMabel931Compilers222 Teacher
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 13 Second Normal Form Keep related data together (cohesiveness). IdLastFirst 7003RogersTom 7008ThompsonArt 7012LaneJohn 7051FlynnMabel Class_codeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 TeacherClass Primary key (PK) Foreign key (FK)
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 14 Final Database Structure IdLastFirst 1001DoeJohn 1005NovakTim 1009KleinLeslie 1014JaneMary 1021SmithKim CodeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 Student_idClass_code IdLastFirst 7003RogersTom 7008ThompsonArt 7012LaneJohn 7051FlynnMabel Teacher Student Class Student_Class John Doe takes Java programming, software engineering, and data structures. The Java Programming class has John Doe and Kim Smith. Mabel Flynn teaches compilers.
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 15 SQL Structured Query Language (SQL) An industry standard But has many proprietary extensions Language for managing data in a relational database Create and drop (delete) databases Create, alter, and drop tables of a database Retrieve, insert, update, and delete data in the tables. _
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 16 SQL Query Examples What is the class code of the Java programming class? CodeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 Class SELECT code FROM class WHERE subject = 'Java programming' | code | | 926 | Source tablesDesired fields Selection criteria Query Results
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 17 SQL Query Examples, cont’d Who is teaching Java programming? IdLastFirst 7003RogersTom 7008ThompsonArt 7012LaneJohn 7051FlynnMabel CodeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 ClassTeacher SELECT first, last FROM teacher, class WHERE id = teacher_id AND subject = 'Java programming' | first | last | | Tom | Rogers | Selecting from multiple tables is called a join.
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 18 SQL Query Examples, cont’d What classes does John Lane teach? SELECT code, subject FROM teacher, class WHERE last = 'Lane' AND first = 'John' AND id = teacher_id | code | subject | | 951 | Software engineering | | 974 | Operating systems | IdLastFirst 7003RogersTom 7008ThompsonArt 7012LaneJohn 7051FlynnMabel CodeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 ClassTeacher
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 19 SQL Query Examples, cont’d Who is taking Java programming? IdLastFirst 1001DoeJohn 1005NovakTim 1009KleinLeslie 1014JaneMary 1021SmithKim CodeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 Student_idClass_code SELECT id, last, first FROM student, class, student_class WHERE subject = 'Java programming' AND code = class_code AND id = student_id | id | last | first | | 1001 | Doe | John | | 1021 | Smith | Kim | Class Student_Class Student
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 20 SQL Query Examples, cont’d Who are John Lane’s students and in which subjects? IdLastFirst 1001DoeJohn 1005NovakTim 1009KleinLeslie 1014JaneMary 1021SmithKim CodeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 Student_idClass_code IdLastFirst 7003RogersTom 7008ThompsonArt 7012LaneJohn 7051FlynnMabel Teacher StudentClass Student_Class
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 21 SQL Query Examples, cont’d IdLastFirst 1001DoeJohn 1005NovakTim 1009KleinLeslie 1014JaneMary 1021SmithKim CodeTeacher_idSubjectRoom Data structures Java programming Compilers Software engineering Operating systems109 Student_idClass_code SELECT student.first, student.last, subject FROM student, teacher, class, student_class WHERE teacher.last = 'Lane' AND teacher.first = 'John' AND teacher_id = teacher.id AND code = class_code AND student.id = student_id ORDER BY subject, student.last | first | last | subject | | Tim | Novak | Operating systems | | Kim | Smith | Operating systems | | John | Doe | Software engineering | IdLastFirst 7003RogersTom 7008ThompsonArt 7012LaneJohn 7051FlynnMabel Teacher StudentClass Student_Class Who are John Lane’s students and in which subjects?
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 22 XML Data Data can also be stored as XML XQuery is designed to query XML data SQL : relational databases XQuery : XML _
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 23 XQuery Examples Query: What titles are in the bookstore? doc("books.xml")/bookstore/book/title Results: Everyday Italian Harry Potter XQuery Kick Start Learning XML Everyday Italian Giada De Laurentiis Harry Potter J.K. Rowling XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan Learning XML Erik T. Ray
Department of Computer Science Fall 2011: October 31 CS 185C/286: History of Computing © R. Mak 24 XQuery Examples, cont’d Everyday Italian Giada De Laurentiis Harry Potter J.K. Rowling XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan Learning XML Erik T. Ray Query: Which books cost less than $30? doc("books.xml")/bookstore/book[price<30] Results: Harry Potter J.K. Rowling