1 Relational Databases
2 Find Databases here…
3 And here…
4 The “Deep Web” Dynamic pages, generated from databases Not easily discovered using crawling Perhaps times larger than surface Web Fastest growing source of new information
5 Deep Web 60 Deep Sites Exceed Surface Web by 40 Times Name TypeURL Web Size (GBs) National Climatic Data Center (NOAA)Publichttp:// s.html 366,000 NASA EOSDISPublichttp://harp.gsfc.nasa.gov/~imswww/pub/imswelcome/ plain.html 219,600 National Oceanographic (combined with Geophysical) Data Center (NOAA) Public/Feehttp:// 32,940 AlexaPublic (partial) Right-to-Know Network (RTK Net)Publichttp:// MP3.comPublichttp://
6 Content of the Deep Web
7 Database Basics What is a database? –Collection of data, organized to support access –Models some aspects of reality Components of a relational database: –Field = an “atomic” unit of data –Record = a collection of related fields –Table = a collection of related records Each record is one row in the table Each field is one column in the table –Primary Key = the field that uniquely identifies a record –Database = a collection of tables
8 Why “Relational”? Databases model some aspects of reality A relational database views the world in terms of entities and relations between them
9 The Registrar Example What do we need to know (i.e., model)? –Something about the students (e.g., first name, last name, , department) –Something about the courses (e.g., course ID, description, enrolled students, grades) –Which students are in which courses
10 A First Try Put everything in a big table… Discussion: Why is this a bad idea?
11 Good Database Design Save space –Save each fact only once More rapid updates –Every fact only needs to be updated once More rapid search –Finding something once is good enough Avoid inconsistency –Changing data once changes it everywhere
12 Another Try... Student Table Department TableCourse Table Enrollment Table
13 Approaches to Normalization For simple problems: –Start with “binary relationships”: pairs of fields that are related –Group together wherever possible –Add keys where necessary For more complicated problems: –Entity relationship modeling (LBSC 670)
14 Some Lingo “Primary Key” uniquely identifies a record –e.g., student ID in the student table “Foreign Key” is primary key in the other table –It need not be unique in this table
15 The Data Model Student Table Department TableCourse Table Enrollment Table
16 Project SELECT Student ID, Department
17 Restrict WHERE Department ID = “HIST”
18 Join “Joined” Table Student Table Department Table
19 Relational Operations Choosing columns: SELECT –Based on their label Choosing rows: WHERE –Based on their contents Joining tables: JOIN These can be specified together department ID = “HIST” SELECT Student ID, Dept WHERE Dept = “History”
20 Some SQL SQL = Structured Query Language Used in many types of database systems
21 Select query SELECT LastName, FirstName from StudentTable StudentTable
22 Select with Restriction SELECT LastName, FirstName from StudentTable where DeptID = ‘HIST’ Will return –Peters, Kathy –Smith, Chris
23 Select with Restriction SELECT StudentID from EnrollmentTable where Grade > 81 Enrollment Table
24 Select with Join SELECT LastName, FirstName from StudentTable JOIN EnrollmentTable on StudentTable.StudentID =EnrollmentTable.StudentID where EnrollmentTable.Grade > 95 Results: Smith, John Student Table Enrollment Table
25 Discussion Point How is a relational database different from a spreadsheet?