1 CS351 Introduction to Database systems

1 CS351 Introduction to Database systems quan.wang@wsu.edu

2 About me Quan Wang – developer at Oracle Corp. JDBC Web Services Cloud

3 Content  Introduction – Why databases?  Data Model

4 Why databases?  It used to be about boring stuff: employee records, bank records, etc.  Today, the field covers all the largest sources of data, with many new ideas.  Web search.  Data mining.  Scientific and medical databases.  Integrating information.

5 Why databases?  You may not notice it, but databases are behind almost everything you do on the Web.  Google searches.  Queries at Amazon, eBay, etc.

6 Why databases?  Databases often have unique concurrency-control problems.  Many activities (transactions) at the database at all times.  Must not confuse actions, e.g., two withdrawals from the same account must each debit the account.

7 Why databases?  Changing landscape  System R: Oracle, MySQL, IBM, MS  NoSQL: MangoDB  NewSQL: VoltDB  Cloud: Amazon RDS, Google Bigquery  Bigdata: Hadoop, MapR

8 Why databases?  Job perspectives – Startups – Massive industry

9 Content  Introduction – Why database? – What to cover?  Data Model

10 What to cover?  Database – a set of inter-related data pertaining to some organization – airline, university, store  Database Management System (DBMS) – one or more databases – algorithms accessing the databases

12 What to cover?  Database Theory – relational model and algebra – norm forms  Design of databases. – normalization – E/R, UML

13 What to cover?  Database programming – SQL, stored procedures.  Integration? – ODBC, JDBC – O-R Mapping

14 What not covered?  Not covered – DB tuning – DB administration(DBA) – DB implementation – Programming languages May need to learn languages and the DB related aspects

15 Content  Introduction – Why database? – What to cover? – Logistics  Data Model

16 Logistics Email: quan.wang@wsu.edu Office hour – Tuesday & Thursday – 1 pm to 2pm – 201L

17 Logistics No TA No detailed comments on assignments Distribute answer keys when appropriate Check discrepancies Raise issues

18 Logistics Course page: – Http://encs.vancouver.wsu.edu/~quw ang Http://encs.vancouver.wsu.edu/~quw ang – weekly schedule, assignments Submission page – http://lms.wsu.edu – Use drop box for assignments and project submission

19 Logistics Assignments 30% Team project 20% Midterms 25% Final 25%

20 Logistics Team project – mini-ebay – Max 3 people each team – Single grade for each team Assignments and project – Late submission 10% penalty

21 Logistics Databases – MySQL – SQLite /usr/bin/sqlite3

22 Logistics Exams – Lectures, textbook, and assigned readings – Only areas covered in the lectures – Not curved

23 Content  Introduction Why database? What to cover? Logistics  Data Model – Why not flat files?

24 Why not flat files? data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 operations – check balance – deposit or withdraw – change address

25 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – code coupled with data format changing delimiter would require code change

26 Why not flat files? data Savings.data: David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 Checkings.data: David, 789 NE Sany Blvd, 1500, 08/20/1998, 0.001! Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0 problems – redundancy leads to inconsistency

27 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – concurrency withdraw calls for file lock –withdraw $100 from the same account at the same time, synchronized – withdraw $100 from different accounts at the same time, synchronized unnecessarily

28 Why not flat files? data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – user interface file based algorithms

29 Why not flat files? summary – code coupled with data – redundancy leading to inconsistency – low concurrency – unfriendly user interface

30 Why not flat files? database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency

31 Content  Introduction  Data Model – why not flat files? – relational data model relations schema keys

32 What is a Data Model? 1.Underlying structure of a database. 2.Mathematical representation of data.  Examples: relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs. 3.Operations on data. 4.Constraints.

33 A Relation is a Table name manf WinterbrewPete’s Bud LiteAnheuser-Busch Beers Attributes (column headers) Tuples (rows) Relation name

34 Schemas  Relation schema = relation name and attribute list.  Optionally: types of attributes.  Example: Beers(name, manf) or Beers(name: string, manf: string)  Database = collection of relations.  Database schema = set of all relation schemas in the database.

35 Why Relations?  Very simple model.  Often matches how we think about data.  Abstract model that underlies SQL, the most important database language today.

36 Example Beers(name, manf) Bars(name, addr) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar)  Underline = key (tuples cannot have the same value in all key attributes).  Excellent example of a constraint.

37 Example Sells(bar,beer,price )Bars(name,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00

38 Database Schemas in SQL  SQL is primarily a query language, for getting information from a database.  But SQL also includes a data-definition component, DDL, for describing database schemas.

39 Creating (Declaring) a Relation  Simplest form is: CREATE TABLE ( );  To delete a relation: DROP TABLE ;

40 Elements of Table Declarations  Most basic element: an attribute and its type.  The most common types are:  INT or INTEGER (synonyms).  REAL or FLOAT (synonyms).  CHAR(n ) = fixed-length string of n characters.  VARCHAR(n ) = variable-length string of up to n characters.

41 Example: Create Table CREATE TABLE Sells ( barCHAR(20), beerVARCHAR(20), priceREAL );

42 SQL Values  Integers and reals are represented as you would expect.  Strings are too, except they require single quotes.  Two single quotes = real quote, e.g., ’Joe’’s Bar’.  Any value can be NULL.

43 Dates and Times  DATE and TIME are types in SQL.  The form of a date value is: DATE ’yyyy-mm-dd’  Example: DATE ’2007-09-30’ for Sept. 30, 2007.

44 Times as Values  The form of a time value is: TIME ’hh:mm:ss’ with an optional decimal point and fractions of a second following.  Example: TIME ’15:30:02.5’ = two and a half seconds after 3:30PM.

45 Declaring Keys  An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE.  Either says that no two tuples of the relation may agree in all the attribute(s) on the list.  There are a few distinctions to be mentioned later.

46 Declaring Single-Attribute Keys  Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute.  Example: CREATE TABLE Beers ( nameCHAR(20) UNIQUE, manfCHAR(20) );

47 Declaring Multiattribute Keys  A key declaration can also be another element in the list of elements of a CREATE TABLE statement.  This form is essential if the key consists of more than one attribute.  May be used even for one-attribute keys.

48 Example: Multiattribute Key  The bar and beer together are the key for Sells: CREATE TABLE Sells ( barCHAR(20), beerVARCHAR(20), priceREAL, PRIMARY KEY (bar, beer) );

49 PRIMARY KEY vs. UNIQUE 1.There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes. 2.No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.

50 Content  Introduction  Data Model – why not flat files? – relational data model relations schema keys – why relations?

51 Why relations? data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

52 Terminalogy DDL: Data Definition Language CREATE TABLE DML: Data Manipulation Language INSERT SQL: Structured Query Language SELECT...FROM...WHERE

53 Why relations? schema Savings( name CHAR(50), addr CHAR(50), balance REAL, opened DATE, interest REAL);

54 Why relations? relation operations – check balance – deposit or withdraw – change address nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

55 Why relations? relation advantage – code coupled with data format? nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006 nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

56 Why relations? relation advantage – concurrency simultaneous withdraw calls nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

57 Why relations? relations advantage – redundancy? Savings(name, addr, balance, opened, interest) Checkings(name, addr, balance, opened, interest)

58 Why relations? relations advantage – redundancy? nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006 nameaddrbalanceopenedinterest David123123 Sandy Blvd150019980.001 Lisa432 Alberta St350019900 Savings Checkings

59 Why relations? relations advantage – redundancy? Customers(cust_id, name, addr) Savings(cust_id, balance, opened, interest) Checkings(cust_id, balance, opened, interest)

60 Why relations? relations advantage – redundancy? cust_id nameaddr c1 David123 NW Flander St c2 Lisa432 Alberta St cust_idbalanceopenedinterest c1120019980.001 c2350019900 cust_idbalanceopenedinterest c1500.7520100.005 c24000.0020010.006 Customers Savings Checkings

61 Why relations? relation advantage – user interface natural language like support desirable nameaddrbalanceopenedinterest David123 NW Flander St500.7520100.005 Lisa432 Alberta St4000.0020010.006

62 Why relations? relation advantage – user interface SELECT name, addr FROM savings WHERE balance>2000 NAMEADDRBALANCEOPENEDINTEREST David123 NW Flander St 500.7520100.005 Lisa432 Alberta St 4000.0020010.006

63 Why relations? relation advantage – user interface SELECT name, addr, balance FROM savings ORDER BY balance NAMEADDRBALANCEOPENEDINTEREST David123 NW Flander St 500.7520100.005 Lisa432 Alberta St 4000.0020010.006

64 Content  Introduction  Data Model – why not flat files? – relational data model – why relations?

65 Content  Introduction  Data Model – why not flat files? – relational data model – why relations? – semi-structured data models XML and JSON

66 Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 Springer XML

67 JSON { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } }

68 XML and JSON, side-by-side Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 Springer { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } }

69 XML Schema

70 JSON Schema { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

71 String type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

72 List type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

73 Year type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

74 Enumeration type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false }

75 Semi-Structured Data Models  XML Adoption – Web Services (SOAP) – XML DB  JSON Adoption – Web Services (REST) – MongoDB

76 Content  Introduction  Data Model – why not flat files? – relational data model – why relations? – semi-structured data models – history

77 History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present

1 CS351 Introduction to Database systems

Similar presentations

Presentation on theme: "1 CS351 Introduction to Database systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CS351 Introduction to Database systems

Similar presentations

Presentation on theme: "1 CS351 Introduction to Database systems"— Presentation transcript:

Similar presentations

About project

Feedback