Download presentation
Presentation is loading. Please wait.
1
1 CS351 Introduction to Database systems quan.wang@wsu.edu
2
2 About me Quan Wang – developer at Oracle Corp. JDBC Web Services Cloud
3
3 Content Introduction – Why databases? Data Model
4
4 Why databases? It used to be about boring stuff: employee records, bank records, etc. Today, the field covers all the largest sources of data, with many new ideas. Web search. Data mining. Scientific and medical databases. Integrating information.
5
5 Why databases? You may not notice it, but databases are behind almost everything you do on the Web. Google searches. Queries at Amazon, eBay, etc.
6
6 Why databases? Databases often have unique concurrency-control problems. Many activities (transactions) at the database at all times. Must not confuse actions, e.g., two withdrawals from the same account must each debit the account.
7
7 Why databases? Changing landscape System R: Oracle, MySQL, IBM, MS NoSQL: MangoDB NewSQL: VoltDB Cloud: Amazon RDS, Google Bigquery Bigdata: Hadoop, MapR
8
8 Why databases? Job perspectives – Startups – Massive industry
9
9 Content Introduction – Why database? – What to cover? Data Model
10
10 What to cover? Database – a set of inter-related data pertaining to some organization – airline, university, store Database Management System (DBMS) – one or more databases – algorithms accessing the databases
11
11
12
12 What to cover? Database Theory – relational model and algebra – norm forms Design of databases. – normalization – E/R, UML
13
13 What to cover? Database programming – SQL, stored procedures. Integration? – ODBC, JDBC – O-R Mapping
14
14 What not covered? Not covered – DB tuning – DB administration(DBA) – DB implementation – Programming languages May need to learn languages and the DB related aspects
15
15 Content Introduction – Why database? – What to cover? – Logistics Data Model
16
16 Logistics Email: quan.wang@wsu.edu Office hour – Tuesday & Thursday – 1 pm to 2pm – 201L
17
17 Logistics No TA No detailed comments on assignments Distribute answer keys when appropriate Check discrepancies Raise issues
18
18 Logistics Course page: – Http://encs.vancouver.wsu.edu/~quw ang Http://encs.vancouver.wsu.edu/~quw ang – weekly schedule, assignments Submission page – http://lms.wsu.edu – Use drop box for assignments and project submission
19
19 Logistics Assignments 30% Team project 20% Midterms 25% Final 25%
20
20 Logistics Team project – mini-ebay – Max 3 people each team – Single grade for each team Assignments and project – Late submission 10% penalty
21
21 Logistics Databases – MySQL – SQLite /usr/bin/sqlite3
22
22 Logistics Exams – Lectures, textbook, and assigned readings – Only areas covered in the lectures – Not curved
23
23 Content Introduction Why database? What to cover? Logistics Data Model – Why not flat files?
24
24 Why not flat files? data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 operations – check balance – deposit or withdraw – change address
25
25 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – code coupled with data format changing delimiter would require code change
26
26 Why not flat files? data Savings.data: David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 Checkings.data: David, 789 NE Sany Blvd, 1500, 08/20/1998, 0.001! Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0 problems – redundancy leads to inconsistency
27
27 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – concurrency withdraw calls for file lock –withdraw $100 from the same account at the same time, synchronized – withdraw $100 from different accounts at the same time, synchronized unnecessarily
28
28 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – user interface file based algorithms
29
29 Why not flat files? summary – code coupled with data – redundancy leading to inconsistency – low concurrency – unfriendly user interface
30
30 Why not flat files? database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency
31
31 Content Introduction Data Model – why not flat files? – relational data model relations schema keys
32
32 What is a Data Model? 1.Underlying structure of a database. 2.Mathematical representation of data. Examples: relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs. 3.Operations on data. 4.Constraints.
33
33 A Relation is a Table name manf WinterbrewPete’s Bud LiteAnheuser-Busch Beers Attributes (column headers) Tuples (rows) Relation name
34
34 Schemas Relation schema = relation name and attribute list. Optionally: types of attributes. Example: Beers(name, manf) or Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all relation schemas in the database.
35
35 Why Relations? Very simple model. Often matches how we think about data. Abstract model that underlies SQL, the most important database language today.
36
36 Example Beers(name, manf) Bars(name, addr) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar) Underline = key (tuples cannot have the same value in all key attributes). Excellent example of a constraint.
37
37 Example Sells(bar,beer,price )Bars(name,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00
38
38 Database Schemas in SQL SQL is primarily a query language, for getting information from a database. But SQL also includes a data-definition component, DDL, for describing database schemas.
39
39 Creating (Declaring) a Relation Simplest form is: CREATE TABLE ( ); To delete a relation: DROP TABLE ;
40
40 Elements of Table Declarations Most basic element: an attribute and its type. The most common types are: INT or INTEGER (synonyms). REAL or FLOAT (synonyms). CHAR(n ) = fixed-length string of n characters. VARCHAR(n ) = variable-length string of up to n characters.
41
41 Example: Create Table CREATE TABLE Sells ( barCHAR(20), beerVARCHAR(20), priceREAL );
42
42 SQL Values Integers and reals are represented as you would expect. Strings are too, except they require single quotes. Two single quotes = real quote, e.g., ’Joe’’s Bar’. Any value can be NULL.
43
43 Dates and Times DATE and TIME are types in SQL. The form of a date value is: DATE ’yyyy-mm-dd’ Example: DATE ’2007-09-30’ for Sept. 30, 2007.
44
44 Times as Values The form of a time value is: TIME ’hh:mm:ss’ with an optional decimal point and fractions of a second following. Example: TIME ’15:30:02.5’ = two and a half seconds after 3:30PM.
45
45 Declaring Keys An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE. Either says that no two tuples of the relation may agree in all the attribute(s) on the list. There are a few distinctions to be mentioned later.
46
46 Declaring Single-Attribute Keys Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute. Example: CREATE TABLE Beers ( nameCHAR(20) UNIQUE, manfCHAR(20) );
47
47 Declaring Multiattribute Keys A key declaration can also be another element in the list of elements of a CREATE TABLE statement. This form is essential if the key consists of more than one attribute. May be used even for one-attribute keys.
48
48 Example: Multiattribute Key The bar and beer together are the key for Sells: CREATE TABLE Sells ( barCHAR(20), beerVARCHAR(20), priceREAL, PRIMARY KEY (bar, beer) );
49
49 PRIMARY KEY vs. UNIQUE 1.There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes. 2.No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.
50
50 Content Introduction Data Model – why not flat files? – relational data model relations schema keys – why relations?
51
51 Why relations? data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
52
52 Terminalogy DDL: Data Definition Language CREATE TABLE DML: Data Manipulation Language INSERT SQL: Structured Query Language SELECT...FROM...WHERE
53
53 Why relations? schema Savings( name CHAR(50), addr CHAR(50), balance REAL, opened DATE, interest REAL);
54
54 Why relations? relation operations – check balance – deposit or withdraw – change address nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006
55
55 Why relations? relation advantage – code coupled with data format? nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006 nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006
56
56 Why relations? relation advantage – concurrency simultaneous withdraw calls nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006
57
57 Why relations? relations advantage – redundancy? Savings(name, addr, balance, opened, interest) Checkings(name, addr, balance, opened, interest)
58
58 Why relations? relations advantage – redundancy? nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006 nameaddrbalanceopenedinterest David123123 Sandy Blvd150019980.001 Lisa432 Alberta St350019900 Savings Checkings
59
59 Why relations? relations advantage – redundancy? Customers(cust_id, name, addr) Savings(cust_id, balance, opened, interest) Checkings(cust_id, balance, opened, interest)
60
60 Why relations? relations advantage – redundancy? cust_id nameaddr c1 David123 NW Flander St c2 Lisa432 Alberta St cust_idbalanceopenedinterest c1120019980.001 c2350019900 cust_idbalanceopenedinterest c1500.7520100.005 c24000.0020010.006 Customers Savings Checkings
61
61 Why relations? relation advantage – user interface natural language like support desirable nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006
62
62 Why relations? relation advantage – user interface SELECT name, addr FROM savings WHERE balance>2000 NAMEADDRBALANCEOPENEDINTEREST David123 NW Flander St 500.7520100.005 Lisa432 Alberta St 4000.0020010.006
63
63 Why relations? relation advantage – user interface SELECT name, addr, balance FROM savings ORDER BY balance NAMEADDRBALANCEOPENEDINTEREST David123 NW Flander St 500.7520100.005 Lisa432 Alberta St 4000.0020010.006
64
64 Content Introduction Data Model – why not flat files? – relational data model – why relations?
65
65 Content Introduction Data Model – why not flat files? – relational data model – why relations? – semi-structured data models XML and JSON
66
66 Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 Springer XML
67
67 JSON { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } }
68
68 XML and JSON, side-by-side Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 Springer { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } }
69
69 XML Schema
70
70 JSON Schema { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }
71
71 String type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }
72
72 List type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }
73
73 Year type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }
74
74 Enumeration type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }
75
75 Semi-Structured Data Models XML Adoption – Web Services (SOAP) – XML DB JSON Adoption – Web Services (REST) – MongoDB
76
76 Content Introduction Data Model – why not flat files? – relational data model – why relations? – semi-structured data models – history
77
77 History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.