Download presentation
Presentation is loading. Please wait.
1
Big Data Yuan Xue (yuan.xue@vanderbilt.edu) CS 292 Special topics on
2
Part I Relational Database (SQL) Yuan Xue (yuan.xue@vanderbilt.edu)
3
Creating and Using a Relational Database Steps in creating and using a (relational) database 1. Design schema (using DDL – data definition language) 2. Initialization: “Bulk load” initial data 3. Operation: execute queries and modifications (using DML – data manipulation language) Data Meta-data: database definition
4
SQL Introduction Programming language for data management in a relational database management system(RDBMS) Both Data Definition Language (DDL) and Data Manipulation Language (DML) DDL: create, drop table DML: query (select), insert, update and delete data from table Standardized and supported by all major commercial database systems One of the major reasons for commercial success of RDBMS Interactive via GUI or command line, or embedded in programs
5
Data Definition in SQL
6
CREATE in SQL CREATE TABLE MiniTwitter.User (IDVARCHAR(20)NOT NULL, NameVARCHAR(20)NOT NULL, … PRIMARY KEY (ID) FOREIGN KEY (ID) REFERENCE Follow(Followee) FOREIGN KEY (ID) REFERENCE Follow(Follower) ); User IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ CREATE SCHEMA MiniTwitter; FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow
7
CREATE in SQL Data types in SQL Numeric INT, FLOAT, DEC Character, or String CHAR, VARCHAR Bit-string BIT, BLOB (binary large object) Boolean Date, Time DATE, TIME, TIMESTAMP CREATE TABLE MiniTwitter.Tweet (IDVARCHAR(20)NOT NULL, TimestampTIMESTAMPNOT NULL, … PRIMARY KEY (ID) ); Tweet IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure..
8
CREATE in SQL CREATE TABLE MiniTwitter.Follow (FolloweeVARCHAR(20)NOT NULL, FollowerVARCHAR(20)NOT NULL, TimestampTIMESTAMPNOT NULL, PRIMARY KEY (Followee, Follower) ); FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow
9
Data Manipulation in SQL
10
SELECT in SQL SELECT-FROM-WHERE structure of basic SQL queries SELECT A 1, A 2,…,A n FROM R 1, R 2,…,R m WHERE Condition; Attribute to return Relationship/table Conditional expression
11
Example Retrieve the timestamp and content of all tweets whose author is “alice00” User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
12
Example IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow SELECT Timestamp, Content FROM Tweet WHERE Author = ‘Alice00’; Select condition
13
Example Retrieve the content of all tweets whose author is followed by “Alice00” User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
14
Example IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow SELECT content FROM Tweet, Follow WHERE Follower = ‘Alice00’ AND Author = Follower; Select condition Join condition Select-project-join query
15
Example Retrieve the timestamp and content of all tweets whose author is followed by “Alice00” User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Qualify the attribute name to prevent ambiguity
16
Example IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow SELECT Tweet.Timestamp, Content FROM Tweet, Follow WHERE Follower = ‘Alice00’ AND Author = Follower; Select condition Join condition
17
Aggregate Functions in SQL Aggregate function: summarize information from multiple tuples Basic aggregate operations in SQL COUNT, SUM, MAX, MIN, AVG Retrieve the number of people that “Alice00” is following Retrieve the number of people who are following “Alice00”
18
Aggregate Functions in SQL Aggregate function: summarize information from multiple tuples Basic aggregate operations in SQL COUNT, SUM, MAX, MIN, AVG SELECT COUNT(*) FROM Follow WHERE Follower = ‘Alice00’; Retrieve the number of people that “Alice00” is following SELECT COUNT(*) FROM Follow WHERE Followee = ‘Alice00’; Retrieve the number of people who are following “Alice00”
19
Nested SQL query Show the names of users with more than 10 followers User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
20
Nested SQL query User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 SELECT Name FROM User WHERE (SELECT COUNT (*) FROM Follow WHERE Followee = ID)>10);
21
GROUP in SQL Show the names of all users with the number of tweets from them User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
22
GROUP in SQL User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 SELECT User.Name, COUNT(Tweet.ID) FROM User, Tweet WHERE User.ID = Tweet.Author GROUP BY Tweet.Author;
23
ORDER BY in SQL Show the names of users who follow “Alice00” based on the time of “following” relationship User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
24
ORDER BY in SQL User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 SELECT User.Name, FROM User, Follow WHERE User.ID = Follow.Follower AND Follow.Followee = “Alice00” ORDER BY Follow.Timestamp;
25
INSERT in SQL INSERT INTO Follow VALUES (“Cathy123”, “Bob2013”, 2013.12.1.2.3.3) ; Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Cathy123Bob20132013.12.1.2.3.3 Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
26
DELETE in SQL DELETE FROM Follow WHERE Followee = ‘Alice00’ AND FOllower = ‘Bob2013’; Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow FolloweeFollowerTimestamp Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
27
UPDATE in SQL UPDATE Follow SET Timestamp = ‘2013.1.1.3.6.6’ WHERE Followee = ‘Alice00’ AND Follower = ‘Bob2013’; Follow FolloweeFollowerTimestamp Alice00Bob20132013.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7
28
More on SQL Drop table Outer Join Indexes, Constraints, Views, Triggers, Transactions, Authorization Substring Pattern Matching and Arithmetic Operators Check out: http://cse.unl.edu/~sscott/ShowFiles/SQL/CheatSheet/SQLCheatSheet.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.