Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structured/System Query Language -- SQL September 20, 2004.

Similar presentations


Presentation on theme: "Structured/System Query Language -- SQL September 20, 2004."— Presentation transcript:

1 Structured/System Query Language -- SQL September 20, 2004

2 Lecture Outline History Basics of SQL Data Definition Language Query Language Data Manipulation Language Miscellany

3 History Originally it is was called SEQUEL for Structured English QUEry Language Designed and implemented at IBM Research San Jose Research (now IBM Almaden Research Center) as the interface for an experimental relational database system called SYSTEM R. The language for IBM’s DB2

4

5 A number of variations of SQL have been implemented by most commercial DBMS vendors A joint effort by ANSI (American National Standards Institute) and ISO (International Standards Organization) led the first standard version of SQL (ANSI 1986) called SQL-1

6 Revised and expanded in SQL-92 (SQL- 2) and SQL-1999 (SQL-3) (newer?)

7 Basics Language developed specifically to query databases It is a comprehensive database language (data definition, query and update) Based on set and relational operations with certain modifications and enhancements

8 A typical SQL query has the form: (PROJECTION) select A 1, A 2,..., A n (CART. PROD.) from r 1, r 2,..., r m (SELECTION) where P A i s represent attributes r i s represent relations P is a predicate. The result of an SQL query is a relation

9 Data Definition Language DDL

10 Introduction Allows the specification of a set of relations along with information about each relation, including The schema for the relations. The domain of values associated with each attribute in every relation. Integrity constraints The set of indices to be maintained for each relation. Security and authorization information for each relation. The physical storage structure of each relation on disk.

11 SQL uses the terms schema, table, row and column for database, relation, tuple and attribute, respectively The main DDL commands are create alter drop

12 Schema and Catalog Concepts Prior to SQL-2, there was no concept of a relational database schema All tables (relations) were considered part of the same schema In SQL-2, the schema concept was incorporated in order to group together tables and other constructs (e.g. views) that belong to the same database application

13 Formally, an SQL schema is identified by a schema name and authorization as well as descriptions for each element in the schema Examples of elements include tables and views Two ways to create a schema using create schema Including all schema element definitions (rare)

14 Assign a name and an authorization identifier but the elements are defined later create schema Bank authorization xyz A collection of schemas is known as a catalogue

15 create table keyword An SQL relation is defined using the create table command: create table r (A 1 D 1, A 2 D 2,..., A n D n, (integrity-constraint 1 ),..., (integrity-constraint k )) r is the name of the relation each A i is an attribute name in the schema of relation r D i is the data type of values in the domain of attribute A i Example: create table branch (branch-namechar(15), branch-citychar(30), assetsinteger)

16 Data Types in SQL char(n). Fixed length character string, with user-specified length n. varchar(n). Variable length character strings, with user-specified maximum length n. int. Integer (a finite subset of the integers that is machine- dependent). smallint. Small integer (a machine-dependent subset of the integer domain type). numeric(p,d). Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point. real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision. float(n). Floating point number, with user-specified precision of at least n digits. Null values are allowed in all the domain types. Declaring an attribute to be not null prohibits null values for that attribute. create domain construct in SQL-92 creates user-defined domain types create domain person-name as char(20) not null

17 date. Dates, containing a (4 digit) year, month and date E.g. date ‘2001-7-27’ (on MySQL ‘YYYY-MM-DD’) time. Time of day, in hours, minutes and seconds. E.g. time ’09:00:30’ timestamp: date plus time of day E.g. timestamp ‘2001-7-27 09:00:30’ Interval: period of time E.g. Interval ‘1’ day Subtracting a date/time/timestamp value from another gives an interval value Interval values can be added to date/time/timestamp values Can extract values of individual fields from date/time/timestamp E.g. extract (year from r.starttime) Can cast string types to date/time/timestamp E.g. cast as date

18 Integrity Constraints in Create Table not null (after domain) Unique (after domain) primary key (A 1,..., A n ) foreign Key Create table Table_YYY( A1 char(3), A2 int, A3 int UNIQUE, A4float NOT NULL, PRIMARY KEY(A1,A2), FOREIGN KEY (A1) REFERENCES TABLE_XXX(A1) ON DELETE CASCADE ON UPDATE CASCADE);

19 check (P), where P is a predicate Example: Declare branch-name as the primary key for branch and ensure that the values of assets are non-negative. create table branch (branch-name char(15), branch-citychar(30), assets integer, primary key (branch-name), check (assets >= 0))

20 The drop table and alter table keywords The drop table command deletes all information about the dropped relation from the database. The alter table command is used to add attributes to an existing relation. alter table r add A D where A is the name of the attribute to be added to relation r and D is the domain of A. All tuples in the relation are assigned null as the value for the new attribute The alter table command can also be used to drop attributes of a relation alter table r drop A where A is the name of an attribute of relation r

21 Query Language SQL

22 Schema Used in Examples

23 The select clause The select clause list the attributes desired in the result of a query corresponds to the projection operator E.g. find the names of all branches in the loan relation select branch-name from loan SQL names are case insensitive, i.e. you can use capital or small letters.

24 SQL allows duplicates in query results hence in general an SQL table is not a set of tuples but rather a multiset (a.k.a bag) of tuples To force the elimination of duplicates, insert the keyword distinct after select. Find the names of all branches in the loan relations, and remove duplicates select distinct branch-name from loan An asterisk in the select clause denotes “all attributes” select * from loan

25 The select clause can contain arithmetic expressions involving the operation, +, –, , and /, and operating on constants or attributes of tuples. select loan-number, branch-name, amount  100 from loan would return a relation which is the same as the loan relations, except that the attribute amount is multiplied by a 100.

26 The from clause The from clause lists the relations involved in the query corresponds to the Cartesian product operator Find the Cartesian product borrower x loan select  from borrower, loan select * from borrower, loan where borrower.loan-number = loan.loan- number

27 The where clause The where clause specifies conditions that the result must satisfy corresponds to the selection operator To find all loan numbers for loans made at the Fargo branch with loan amounts greater than $1200. select loan-number from loan where branch-name = “Fargo” and amount > 1200 Comparison results can be combined using the logical connectives and, or, and not.

28 SQL includes a between comparison operator E.g. Find the loan number of those loans with loan amounts between $90,000 and $100,000 (that is,  $90,000 and  $100,000) (i.e. inclusive) select loan-number from loan where amount between 90000 and 100000

29 SQL also includes an in operator used for set inclusion testing E.g. Find the loan number of those loans with loan amounts equal to $90000, $100000, $ 110000 or $120000 select loan-number from loan where amount in (90000, 100000, 110000, 120000)

30 Ambiguous Names and Aliasing SQL allows renaming relations and attributes using the as clause: old-name as new-name Find the name, loan number and loan amount of all customers; rename the column name loan-number as loan-id. select customer-name, borrower.loan-number as loan-id, amount from borrower, loan where borrower.loan-number = loan.loan-number

31 Find the customer names, their loan numbers and loan amounts for all customers having a loan at some branch. select customer-name, T.loan-number, S.amount from borrower as T, loan as S where T.loan-number = S.loan-number

32 n Find the names of all branches that have greater assets than some branch located in Fargo. select distinct T.branch-name from branch as T, branch as S where S.branch-city =‘”Fargo” and T.assets > S.assets

33 String Operations SQL includes a string-matching operator for comparisons on character strings. (LIKE) Patterns are described using two special characters: percentage (%). The % character matches any substring. underscore (_). The _ character matches any character. Find the names of all customers whose street includes the substring “Main”. select customer-name from customer where customer-street like ‘ %Main% ’

34 Find the names of all customers whose street includes the substring “Main” with length 5 (don’t use %) select customer-name from customer where customer-street like ‘ Main_ ’ or like ‘_ Main ’ SQL supports a variety of string operations such as concatenation (using “||”) converting from upper to lower case (and vice versa) finding string length, extracting substrings, etc.

35 Soundex operation Database developers have long struggled with the problem of matching words that might not look alike, but actually sound alike…Useful for finding strings for which the sound is known but the precise spelling is not. Soundex (argument) Returns a 4 character code representing the sound of the words in the argument. This result can be used to compare with the sound of other strings. The argument can be a character string that is either a CHAR or VARCHAR not exceeding 4,000 bytes.

36 The result of the function is CHAR(4). The result is null if the argument is null How does it work: A,E,I,O,U,Y,W,H are ignored along with double letters Ignore everything after the 4 th character The first letter of the soundex code corresponds to the first letter of the argument Every subsequent character in the name is looked up according to the scheme presented on the next slide and encoded as a digit between 1 and 6

37 if there are insufficient characters remaining in the name, the coding is padded with zeros. Two adjacent identically coded letters Nonalphabetic characters terminate the soundex evaluation 1 = B,P,F,V 2 = C,S,G,J,K,Q,X,Z 3 = D,T 4 = L 5 = M,N 6 = R All other letters (A,E,I,O,U,Y,W,H) ignored Select Soundex ('Smith'), Soundex ('Smythe')  S530 S530

38 There is also a related function called DIFFERENCE function Used to compare strings based upon their Soundex values. Difference(Argument1,Argument2) Usually returns an integer result ranging in value from 0 (least similar) to 4 (most similar).

39 Let's suppose you overheard a male employee talking in the hallway but didn't quite catch his name. You might overhear a name that sounded like "Ann" but you're positive it was a male. The DIFFERENCE function is ideal for this type of situation. We'd probably begin by specifying a threshold value of 4 to limit the number of results returned.

40 Select firstname, difference(firstname,'ann') as 'difference' from employees where difference(firstname,'ann')=4 FirstName Difference ---------- ----------- Ann 4 Anne 4 Unfortunately, none of the results were male names. So we try to lower the threshold of 2

41 Select firstname, difference(firstname,'ann') as 'difference' from employees where difference(firstname,'ann')>=2 FirstName Difference ---------- ----------- Ann 4 Andrew 2 Janet 2 Laura 2 Anne 4 looks like Andrew might be our man!

42 The order by clause List in alphabetic order the names of all customers having a loan in Fargo branch select distinct customer-name from borrower, loan where borrower loan-number - loan.loan-number and branch-name = “ Fargo” order by customer-name We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default. E.g. order by customer-name desc

43 Set Operations The set operations union, intersect, and except operate on relations and correspond to the relational algebra operations  and  Each of the above operations automatically eliminates duplicates; to retain all duplicates use the corresponding versions union all, intersect all and except all Suppose a tuple occurs m times in r and n times in s, then, it occurs: m + n times in r union all s min(m,n) times in r intersect all s max(0, m – n) times in r except all s

44 Find all customers who have a loan, an account, or both: (select customer-name from depositor) union (select customer-name from borrower) Find all customers who have both a loan and an account. (select customer-name from depositor) intersect (select customer-name from borrower)

45 Find all customers who have an account but no loan (select customer-name from depositor) except (select customer-name from borrower)

46 Divide operator No SQL implementation For instance, if you need to find all students who are taking the three courses {7,6,5} the answer would be E/C

47 Aggregate Functions These functions avg: average value min: minimum value max: maximum value sum: sum of values count: number of values Produce numbers

48 Find the average account balance at the Fargo branch. n Find the distinct number of depositors in the bank. n Find the number of tuples in the customer relation. select avg (balance) from account where branch-name = “Fargo” select count (*) from customer select count (distinct customer-name) from depositor

49 The group by clause Find the number of depositors for each branch. Note: Attributes in select clause outside of aggregate functions must appear in group by list select branch-name, count (customer-name) from depositor, account where depositor.account-number = account.account-number group by branch-name

50 The having clause Find the names of all branches where the average account balance is more than $1,200. Note: predicates in the having clause are applied after the formation of the aggregation groups whereas predicates in the where clause are applied before forming groups select branch-name, avg (balance) from account group by branch-name having avg (balance) > 1200

51 Null Values It is possible for tuples to have a null value, denoted by null, for some of their attributes null signifies an unknown value or that a value does not exist. The predicate is null can be used to check for null values. E.g. Find all loan numbers which appear in the loan relation with null values for amount. select loan-number from loan where amount is null

52 The result of any arithmetic expression involving null is null E.g. 5 + null returns null However, aggregate functions simply ignore nulls (except for count) Any comparison with null returns unknown E.g. 5 null or null = null

53 Three-valued logic using the truth value unknown: OR: (unknown or true) = true (unknown or false) = unknown (unknown or unknown) = unknown AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown NOT: (not unknown) = unknown Result of where clause predicate is treated as false if it evaluates to unknown

54 Total all loan amounts select sum (amount) from loan Above statement ignores null amounts result is null if there is no non-null amount All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.

55 Nested Sub-queries Some queries require that existing values in the DB be fetched and then used in a comparison condition SQL provides a mechanism for the nesting of sub-queries. A sub-query is a select-from-where expression that is nested within another query (called outer query) A common use of sub-queries is to perform tests for set membership, set comparisons, and set cardinality.

56 Find all customers who have both an account and a loan at the bank. n Find all customers who have a loan at the bank 0 but do not have an account at the bank select customer-name from borrower where customer-name not in (select customer-name from depositor) select customer-name from borrower where customer-name in (select customer-name from depositor)

57 Find all customers who have both an account and a loan at the Fargo branch Another way? select distinct customer-name from borrower, loan where borrower.loan-number = loan.loan-number and branch-name = “Fargo” and (branch-name, customer-name) in (select branch-name, customer-name from depositor, account where depositor.account-number = account.account-number) And customer-name in (select customer-name from depositor, account where depositor.account-number = account.account-number and branch-name = “Fargo”)

58 The some keyword - Existential F some r  t  r  s.t. (F t) Where can be:  0 5 6 (5< some) = true 0 5 0 ) = false 5 0 5 (5  some) = true (since 0  5) (read: 5 < some tuple in the relation) (5< some ) = true (5 = some (= some)  in

59 The all keyword - Universal F all r  t  r  (F t) 0 5 6 (5< all) = false 6 10 4 ) = true 5 4 6 (5  all) = true (since 5  4 and 5  6) (5< all ) = false (5 = all

60 Find all branches that have greater assets than some branch located in Brooklyn. n Same query using > some clause select branch-name from branch where assets > some (select assets from branch where branch-city = ‘ Brooklyn ’ ) select distinct T.branch-name from branch as T, branch as S where S.branch-city = ‘Brooklyn’ and T.assets > S.assets

61 Find the names of all branches that have greater assets than all branches located in Brooklyn. select branch-name from branch where assets > all (select assets from branch where branch-city = “Brooklyn”)

62 The exists keyword Used to test for empty relations returns the value true if the argument sub-query is nonempty. exists r  r  Ø Outer Query exists (inner query) not exists r  r = Ø Outer Query not exists (inner query)

63 Find the names of all branches that have greater assets than some branch located in Brooklyn. select branch-name from branch b1 where EXISTS (select * from branch b2 where b2.branch-city = “Brooklyn” and b1.assets>b2.assets)

64 Find the names of all branches that have greater assets than all branches located in Brooklyn. select branch-name from branch b1 where NOT EXISTS (select * from branch b2 where branch-city = “Brooklyn” and b1.assets<b2.assets) Like saying give me branches such that no branch in Brooklyn has greater assets

65 The unique keyword Used to test for absence of duplicate tuples The unique keyword tests whether a sub-query has any duplicate tuples in its result. Find all customers who have at most one account at the Fargo branch. select T.customer-name from depositor as T where unique ( select R.customer-name from account, depositor as R where account.account.branch-name = ‘”Fargo” and R.account-number = account. account-number and T.customer-name = R.customer-name)

66 Views Provide a mechanism to hide certain data from the view of certain users. To create a view we use the command: create view v as where: is any legal expression The view name is represented by v

67 Example Queries A view consisting of branches and their customers create view all-customer as (select branch-name, customer-name from depositor, account where depositor.account-number = account.account-number) union (select branch-name, customer-name from borrower, loan where borrower.loan-number = loan.loan-number)

68 We can query views just like we query tables Find all customers of the Fargo branch select customer-name from all-customer where branch-name = ‘Fargo’

69 The with keyword The with clause allows views to be defined locally to a query, rather than globally. Find all accounts with the maximum balance with max-balance (value) as select max (balance) from account select account-number from account, max-balance where account.balance = max-balance.value

70 Revision of SQL queries Can consist of up to 6 clauses where the first 2 are mandatory The order is important SELECT FROM [WHERE ] [GROUP BY ] [HAVING ] [ORDER By ]

71 To evaluate the query, we apply 1 Evaluate FROM : produces Cartesian product, A, of tables in FROM list 2 Evaluate WHERE : produces table, B, consisting of rows of A that satisfy WHERE condition 3 Evaluate GROUP BY : partitions B into groups that agree on attribute values in GROUP BY list

72 4 Evaluate HAVING : eliminates groups in B that do not satisfy HAVING condition 5 Evaluate SELECT : produces table C containing a row for each group. Attributes in SELECT list limited to those in GROUP BY list and aggregates over group 6 Evaluate ORDER BY : orders rows of C

73 Restrictions SELECT Attr list, aggregates FROM relation list WHERE where cond GROUP BY group list HAVING group cond ORDER BY ordered attr list Attr list must be subset of group list Ordered attr list must be a subset of attr list

74 Joined Relations Join operations take two relations and return as a result another relation. The relations are typically specified in the from clause Join condition – defines which tuples in the two relations match, and what attributes are present in the result of the join. Join type – defines how tuples in each relation that do not match any tuple in the other relation (based on the join condition) are treated.

75 Join Conditions natural no condition, an implicit equi-join is used for each pair of attributes in R and S having the same name using (A1, A2,..., An) like natural but restricted to the specified attributes only on

76 Join Types inner join (only those records from tables on both sides of the join that match the join criteria… Inner joins are the most common type of join ) left outer join (Retrieve all records from the table on the left side of the join and only those records that match the join criteria from the table on the right side of the join) right outer join (Retrieve only those records from the table on the left side of the join condition that match the join criteria but all records from the right side of the join condition) full outer join (Retrieve all records from tables on both sides of the join condition regardless of whether records match the join criteria)

77 Relation loan nRelation borrower customer-nameloan-number Jones Smith Hayes L-170 L-230 L-155 amount 3000 4000 1700 branch-name Downtown Redwood Perryridge loan-number L-170 L-230 L-260 nNote: borrower information missing for L-260 and loan information missing for L-155

78 loan inner join borrower on loan.loan-number = borrower.loan-number loan left outer join borrower on loan.loan-number = borrower.loan-number branch-nameamount Downtown Redwood 3000 4000 customer-nameloan-number Jones Smith L-170 L-230 loan-number L-170 L-230 branch-nameamount Downtown Redwood Perryridge 3000 4000 1700 customer-nameloan-number Jones Smith null L-170 L-230 null loan-number L-170 L-230 L-260

79 loan natural inner join borrower nloan natural right outer join borrower branch-nameamount Downtown Redwood 3000 4000 customer-name Jones Smith loan-number L-170 L-230 branch-nameamount Downtown Redwood null 3000 4000 null customer-name Jones Smith Hayes loan-number L-170 L-230 L-155

80 loan full outer join borrower using (loan-number) nFind all customers who have either an account or a loan (but not both) at the bank. (customer_number, account_number, loan_number) branch-nameamount Downtown Redwood Perryridge null 3000 4000 1700 null customer-name Jones Smith null Hayes loan-number L-170 L-230 L-260 L-155 select customer-name from (depositor natural full outer join borrower) where account-number is null or loan-number is null

81 Data Manipulation Language DML

82 The insert clause Add a new tuple to account insert into account values (‘A-9732’, ‘Fargo’,1200) Add a new tuple to account with balance set to null insert into account values (‘A-777’,‘Fargo’, null) or equivalently insert into account (branch-name, account- number) values (‘Fargo’, ‘A-9732’)

83 The select into statement The SELECT INTO statement is most often used to create backup copies of tables or for archiving records select column_name(s) into newtable [in external database] from source The following example makes a backup copy of the “Account" table: select * into Account_backup from Account The IN clause can be used to copy tables into another database: select Account.* into Account in 'backup.mdb' from Account

84 If you only want to copy a few fields, you can do so by listing them after the SELECT statement: select customer-name into borrower_backup from Borrower You can also add a where clause. The following example creates a " Borrower_backup" table with one column (customer-name) by extracting those with acct # = “1120” from the “Accounts" table: select customer-name into borrower_backup from borrower where loan-number=‘1120'

85 Selecting data from more than one table is also possible. The following example creates a new table “Borrower_Rec_backup" that contains data from the two tables Borrower and Loan: select customer-name,amount into borrower_rec_backup from (borrower inner join loan on borrower.loan-number=loan.loan-number)

86 In MySQL MySQL Server doesn't support the Sybase SQL extension: SELECT... INTO TABLE.... Instead, MySQL Server supports the standard SQL syntax INSERT INTO... SELECT Insert Into table_X (A1, A2,…,An) SELECT table_Y.A1, table_Y.A2,…, table_Y.An FROM table_Y WHERE table_Y.Ai > 100 The select from where statement is fully evaluated before any of its results are inserted into the relation

87 Provide as a gift for all loan customers of the Fargo branch, a $200 savings account. Let the loan number serve as the account number for the new savings account insert into account select loan-number, branch-name, 200 from loan where branch-name = ‘Fargo’ insert into depositor select customer-name, loan-number from loan, borrower where branch-name = ‘Fargo’ and loan.account-number = borrower.account-number

88 LOAD DATA INFILE.. For Bulk Insert The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. Load data infile 'file_name.txt' [replace | ignore] into table tbl_name [fields [terminated by '\t']] [lines [terminated by '\n']] … a lot of other options

89 data.txt 1,William,Hock;2,Sami,Hajjar;4,George,Nixon; Load data infile ‘data.txt' ignore into table Client fields terminated by ‘,' lines terminated by ‘;'

90 The update clause Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%. Write two update statements: update account set balance = balance  1.06 where balance > 10000 update account set balance = balance  1.05 where balance  10000 Can be done better using the case statement (next)

91 Same query as before: Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%. update account set balance = case when balance <= 10000 then balance*1.05 else balance * 1.06 end

92 Insert into Views Create a view of all loan data in loan relation, hiding the amount attribute create view branch-loan as select branch-name, loan-number from loan Add a new tuple to branch-loan insert into branch-loan values (‘Perryridge’, ‘L-307’) This insertion must be represented by the insertion of the tuple (‘L-307’, ‘Perryridge’, null) into the loan relation Most SQL implementations allow updates/inserts only on simple views (without aggregates) defined on a single relation Updates on more complex views are difficult or impossible to translate, and hence are disallowed.

93 The delete clause Delete all account records at the Fargo branch delete from account where branch-name = ‘ Fargo ’ Delete all accounts at every branch located in Moorhead city. delete from account where branch-name in (select branch-name from branch where branch-city = ‘ Moorhead ’ )

94 Miscellany

95 Transactions A transaction is a sequence of queries and I/U/D statements executed as a single unit Transactions are terminated by one of commit work: makes all updates of the transaction permanent in the database rollback work: undoes all updates performed by the transaction. Motivating example Transfer of money from one account to another involves two steps: deduct from one account and credit to another If one steps succeeds and the other fails, database is in an inconsistent state Therefore, either both steps should succeed or neither should If any step of a transaction fails, all work done by the transaction can be undone by rollback work. Rollback of incomplete transactions is done automatically, in case of system failures

96 In most database systems, each SQL statement that executes successfully is automatically committed. Each transaction would then consists of a single statement Further discussion is outside the scope of this presentation

97 Embedded SQL The SQL standard defines embeddings of SQL in a variety of programming languages such as Pascal, PL/I, Fortran, C, and Cobol. A language to which SQL queries are embedded is referred to as a host language, and the SQL structures permitted in the host language comprise embedded SQL.

98 EXEC SQL statement is used to identify embedded SQL request to the preprocessor EXEC SQL END-EXEC

99 References http://www.w3schools.com/sql/default.asp Has tutorials and online tests Soundex paper: A. Richard Miller. 1990. What’s in a name? An MMSFORTH implementation of the Russell-SOUNDEX method. In 1990 Rochester FORTH Conference:Embedded Systems, Thomas Hess, ed., p. 101–3.


Download ppt "Structured/System Query Language -- SQL September 20, 2004."

Similar presentations


Ads by Google