Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS351 Introduction to Database systems

Similar presentations


Presentation on theme: "1 CS351 Introduction to Database systems"— Presentation transcript:

1 1 CS351 Introduction to Database systems quan.wang@wsu.edu

2 2 About me Quan Wang – developer at Oracle Corp. JDBC Web Services Cloud

3 3 Content  Introduction – Why databases?  Data Model

4 4 Why databases?  It used to be about boring stuff: employee records, bank records, etc.  Today, the field covers all the largest sources of data, with many new ideas.  Web search.  Data mining.  Scientific and medical databases.  Integrating information.

5 5 Why databases?  You may not notice it, but databases are behind almost everything you do on the Web.  Google searches.  Queries at Amazon, eBay, etc.

6 6 Why databases?  Databases often have unique concurrency-control problems.  Many activities (transactions) at the database at all times.  Must not confuse actions, e.g., two withdrawals from the same account must each debit the account.

7 7 Why databases?  Changing landscape  System R: Oracle, MySQL, IBM, MS  NoSQL: MangoDB  NewSQL: VoltDB  Cloud: Amazon RDS, Google Bigquery  Bigdata: Hadoop, MapR

8 8 Why databases?  Job perspectives – Startups – Massive industry

9 9 Content  Introduction – Why database? – What to cover?  Data Model

10 10 What to cover?  Database – a set of inter-related data pertaining to some organization – airline, university, store  Database Management System (DBMS) – one or more databases – algorithms accessing the databases

11 11

12 12 What to cover?  Database Theory – relational model and algebra – norm forms  Design of databases. – normalization – E/R, UML

13 13 What to cover?  Database programming – SQL, stored procedures.  Integration? – ODBC, JDBC – O-R Mapping

14 14 What not covered?  Not covered – DB tuning – DB administration(DBA) – DB implementation – Programming languages May need to learn languages and the DB related aspects

15 15 Content  Introduction – Why database? – What to cover? – Logistics  Data Model

16 16 Logistics Email: quan.wang@wsu.edu Office hour – Tuesday & Thursday – 1 pm to 2pm – 201L

17 17 Logistics No TA No detailed comments on assignments Distribute answer keys when appropriate Check discrepancies Raise issues

18 18 Logistics Course page: – Http://encs.vancouver.wsu.edu/~quw ang Http://encs.vancouver.wsu.edu/~quw ang – weekly schedule, assignments Submission page – http://lms.wsu.edu – Use drop box for assignments and project submission

19 19 Logistics Assignments 30% Team project 20% Midterms 25% Final 25%

20 20 Logistics Team project – mini-ebay – Max 3 people each team – Single grade for each team Assignments and project – Late submission 10% penalty

21 21 Logistics Databases – MySQL – SQLite /usr/bin/sqlite3

22 22 Logistics Exams – Lectures, textbook, and assigned readings – Only areas covered in the lectures – Not curved

23 23 Content  Introduction Why database? What to cover? Logistics  Data Model – Why not flat files?

24 24 Why not flat files? data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 operations – check balance – deposit or withdraw – change address

25 25 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – code coupled with data format changing delimiter would require code change

26 26 Why not flat files? data Savings.data: David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 Checkings.data: David, 789 NE Sany Blvd, 1500, 08/20/1998, 0.001! Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0 problems – redundancy leads to inconsistency

27 27 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – concurrency withdraw calls for file lock –withdraw $100 from the same account at the same time, synchronized – withdraw $100 from different accounts at the same time, synchronized unnecessarily

28 28 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – user interface file based algorithms

29 29 Why not flat files? summary – code coupled with data – redundancy leading to inconsistency – low concurrency – unfriendly user interface

30 30 Why not flat files? database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency

31 31 Content  Introduction  Data Model – why not flat files? – relational data model relations schema keys

32 32 What is a Data Model? 1.Underlying structure of a database. 2.Mathematical representation of data.  Examples: relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs. 3.Operations on data. 4.Constraints.

33 33 A Relation is a Table name manf WinterbrewPete’s Bud LiteAnheuser-Busch Beers Attributes (column headers) Tuples (rows) Relation name

34 34 Schemas  Relation schema = relation name and attribute list.  Optionally: types of attributes.  Example: Beers(name, manf) or Beers(name: string, manf: string)  Database = collection of relations.  Database schema = set of all relation schemas in the database.

35 35 Why Relations?  Very simple model.  Often matches how we think about data.  Abstract model that underlies SQL, the most important database language today.

36 36 Example Beers(name, manf) Bars(name, addr) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar)  Underline = key (tuples cannot have the same value in all key attributes).  Excellent example of a constraint.

37 37 Example Sells(bar,beer,price )Bars(name,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00

38 38 Database Schemas in SQL  SQL is primarily a query language, for getting information from a database.  But SQL also includes a data-definition component, DDL, for describing database schemas.

39 39 Creating (Declaring) a Relation  Simplest form is: CREATE TABLE ( );  To delete a relation: DROP TABLE ;

40 40 Elements of Table Declarations  Most basic element: an attribute and its type.  The most common types are:  INT or INTEGER (synonyms).  REAL or FLOAT (synonyms).  CHAR(n ) = fixed-length string of n characters.  VARCHAR(n ) = variable-length string of up to n characters.

41 41 Example: Create Table CREATE TABLE Sells ( barCHAR(20), beerVARCHAR(20), priceREAL );

42 42 SQL Values  Integers and reals are represented as you would expect.  Strings are too, except they require single quotes.  Two single quotes = real quote, e.g., ’Joe’’s Bar’.  Any value can be NULL.

43 43 Dates and Times  DATE and TIME are types in SQL.  The form of a date value is: DATE ’yyyy-mm-dd’  Example: DATE ’2007-09-30’ for Sept. 30, 2007.

44 44 Times as Values  The form of a time value is: TIME ’hh:mm:ss’ with an optional decimal point and fractions of a second following.  Example: TIME ’15:30:02.5’ = two and a half seconds after 3:30PM.

45 45 Declaring Keys  An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE.  Either says that no two tuples of the relation may agree in all the attribute(s) on the list.  There are a few distinctions to be mentioned later.

46 46 Declaring Single-Attribute Keys  Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute.  Example: CREATE TABLE Beers ( nameCHAR(20) UNIQUE, manfCHAR(20) );

47 47 Declaring Multiattribute Keys  A key declaration can also be another element in the list of elements of a CREATE TABLE statement.  This form is essential if the key consists of more than one attribute.  May be used even for one-attribute keys.

48 48 Example: Multiattribute Key  The bar and beer together are the key for Sells: CREATE TABLE Sells ( barCHAR(20), beerVARCHAR(20), priceREAL, PRIMARY KEY (bar, beer) );

49 49 PRIMARY KEY vs. UNIQUE 1.There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes. 2.No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.

50 50 Content  Introduction  Data Model – why not flat files? – relational data model relations schema keys – why relations?

51 51 Why relations? data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

52 52 Terminalogy DDL: Data Definition Language CREATE TABLE DML: Data Manipulation Language INSERT SQL: Structured Query Language SELECT...FROM...WHERE

53 53 Why relations? schema Savings( name CHAR(50), addr CHAR(50), balance REAL, opened DATE, interest REAL);

54 54 Why relations? relation operations – check balance – deposit or withdraw – change address nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

55 55 Why relations? relation advantage – code coupled with data format? nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006 nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

56 56 Why relations? relation advantage – concurrency simultaneous withdraw calls nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

57 57 Why relations? relations advantage – redundancy? Savings(name, addr, balance, opened, interest) Checkings(name, addr, balance, opened, interest)

58 58 Why relations? relations advantage – redundancy? nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006 nameaddrbalanceopenedinterest David123123 Sandy Blvd150019980.001 Lisa432 Alberta St350019900 Savings Checkings

59 59 Why relations? relations advantage – redundancy? Customers(cust_id, name, addr) Savings(cust_id, balance, opened, interest) Checkings(cust_id, balance, opened, interest)

60 60 Why relations? relations advantage – redundancy? cust_id nameaddr c1 David123 NW Flander St c2 Lisa432 Alberta St cust_idbalanceopenedinterest c1120019980.001 c2350019900 cust_idbalanceopenedinterest c1500.7520100.005 c24000.0020010.006 Customers Savings Checkings

61 61 Why relations? relation advantage – user interface natural language like support desirable nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

62 62 Why relations? relation advantage – user interface SELECT name, addr FROM savings WHERE balance>2000 NAMEADDRBALANCEOPENEDINTEREST David123 NW Flander St 500.7520100.005 Lisa432 Alberta St 4000.0020010.006

63 63 Why relations? relation advantage – user interface SELECT name, addr, balance FROM savings ORDER BY balance NAMEADDRBALANCEOPENEDINTEREST David123 NW Flander St 500.7520100.005 Lisa432 Alberta St 4000.0020010.006

64 64 Content  Introduction  Data Model – why not flat files? – relational data model – why relations?

65 65 Content  Introduction  Data Model – why not flat files? – relational data model – why relations? – semi-structured data models XML and JSON

66 66 Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 Springer XML

67 67 JSON { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } }

68 68 XML and JSON, side-by-side Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 Springer { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } }

69 69 XML Schema

70 70 JSON Schema { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

71 71 String type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

72 72 List type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

73 73 Year type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

74 74 Enumeration type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

75 75 Semi-Structured Data Models  XML Adoption – Web Services (SOAP) – XML DB  JSON Adoption – Web Services (REST) – MongoDB

76 76 Content  Introduction  Data Model – why not flat files? – relational data model – why relations? – semi-structured data models – history

77 77 History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present


Download ppt "1 CS351 Introduction to Database systems"

Similar presentations


Ads by Google