MIS2502: Data Analytics SQL – Getting Information Out of a Database.

Slides:



Advertisements
Similar presentations
Advanced SQL (part 1) CS263 Lecture 7.
Advertisements

What is a Database By: Cristian Dubon.
30-Jun-15 SQL A Brief Introduction. SQL SQL is Structured Query Language Some people pronounce SQL as “sequel” Other people insist that only “ess-cue-ell”
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
SQL Intermediate Workshop Authored by Jay Mussan-Levy.
MIS2502: Data Analytics Relational Data Modeling
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
 SQL stands for Structured Query Language.  SQL lets you access and manipulate databases.  SQL is an ANSI (American National Standards Institute) standard.
MIS2502: Data Analytics Coverting ERD into a DB Schema David Schuff
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
SQL 1: GETTING INFORMATION OUT OF A DATABASE MIS2502 Data Analytics.
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Structure Query Language SQL. Database Terminology Employee ID 3 3 Last name Small First name Tony 5 5 Smith James
Using Special Operators (LIKE and IN)
MIS2502: Data Analytics SQL – Getting Information Out of a Database David Schuff
SQL queries ordering and grouping and joins
SQL “Structured Query Language; standard language for relational data manipulation” DB2, SQL/DS, Oracle, INGRES, SYBASE, SQL Server, dBase/Win, Paradox,
CIS 375—Web App Dev II SQL. 2 Introduction SQL (Structured _______ Language) is an ANSI standard language for accessing databases.ANSI SQL can execute.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
CIS 375—Web App Dev II SQL. 2 Introduction SQL (Structured _______ Language) is an ANSI standard language for accessing databases.ANSI SQL can execute.
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Getting to Know SQL. © Jim Hope 2004 All Rights Reserved Data Manipulation SELECT statement INSERT INTO statement UPDATE statement DELETE statement UNION.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Data Analytics SQL – Putting Information Into a Database David Schuff
RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world.
Database: SQL, MySQL, LINQ and Java DB © by Pearson Education, Inc. All Rights Reserved.
MIS2502: Data Analytics Relational Data Modeling David Schuff
IS2803 Developing Multimedia Applications for Business (Part 2) Lecture 5: SQL I Rob Gleasure robgleasure.com.
1 Chapter 3 Single Table Queries. 2 Simple Queries Query - a question represented in a way that the DBMS can understand Basic format SELECT-FROM Optional.
Exam 2 Review: SQL In, Dimensional Modeling, Pivot Tables, ETL Describe data cube elements Understand facts, dimensions, granularity Create/read a Star.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
COM621: Advanced Interactive Web Development Lecture 11 MySQL – Data Manipulation Language.
Chapter 12 Introducing Databases. Objectives What a database is and which databases are typically used with ASP.NET pages What SQL is, how it looks, and.
SQL SQL Ayshah I. Almugahwi Maryam J. Alkhalifa
How to: SQL By: Sam Loch.
Web Systems & Technologies
MIS2502: Data Analytics SQL – Putting Information Into a Database
MIS2502: Data Analytics Relational Data Modeling
The Database Exercises Fall, 2009.
MIS5101: Business Intelligence Relational Data Modeling
MIS2502: Data Analytics SQL – Putting Information Into a Database
MIS2502: Data Analytics SQL – Getting Information Out of a Database
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Data Analytics SQL – Putting Information Into a Database
MIS2502: Review for Exam 1 JaeHwuen Jung
MIS2502: Data Analytics SQL – Getting Information Out of a Database
MIS2502: Data Analytics Converting ERDs to Schemas
MIS2502: Data Analytics SQL – Getting Information Out of a Database Part 2: Advanced Queries Aaron Zhi Cheng
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall
Access: SQL Participation Project
MIS2502: Data Analytics SQL – Putting Information Into a Database
MIS2502: Review for Exam 1 Aaron Zhi Cheng
MIS2502: Data Analytics SQL – Putting Information Into a Database
Structured Query Language – The Fundamentals
MIS2502: Data Analytics SQL – Putting Information Into a Database
Lecture 3 Finishing SQL
M1G Introduction to Database Development
Introduction To Structured Query Language (SQL)
Section 4 - Sorting/Functions
MIS2502: Data Analytics SQL 4– Putting Information Into a Database
MIS2502: Data Analytics SQL – Getting Information Out of a Database Part 1: Basic Queries Aaron Zhi Cheng
MIS2502: Data Analytics SQL – Getting Information Out of a Database Part 2: Advanced Queries Zhe (Joe) Deng
MIS2502: Data Analytics Relational Data Modeling 3
Presentation transcript:

MIS2502: Data Analytics SQL – Getting Information Out of a Database

The relational database Core of Online Transaction Processing (OLTP) A series of tables Linked together through primary/foreign key relationships

What do we want to do? Database Management System Put information into the database (change) Get information out of the database (retrieve)

To do this we use SQL Structured Query Language A high-level set of commands that let you communicate with the database With SQL, you can – Retrieve records – Join (combine) tables – Insert records – Delete records – Update records – Add and delete tables A statement is any SQL command that interacts with a database. A SQL statement that retrieves information is referred to as a query. A statement is any SQL command that interacts with a database. A SQL statement that retrieves information is referred to as a query. We will be doing this.

Some points about SQL It’s not a true programming language It is used by programming languages to interact with databases There is no standard syntax MySQL, Oracle, SQL Server, and Access all have slight differences There are a lot of statements and variations among them We will be covering the basics, and the most important ones This is a great online reference for SQL syntax: Here’s the one specifically for MySQL, but it’s not as well- written: man/5.6/en/sql-syntax.html This is a great online reference for SQL syntax: Here’s the one specifically for MySQL, but it’s not as well- written: man/5.6/en/sql-syntax.html

SELECT statement SELECT column_name(s) FROM schema_name.table_name; Example: SELECT FirstName FROM orderdb.Customer; CustomerIDFirstNameLastNameCityStateZip 1001GregHousePrincetonNJ LisaCuddyPlainsboroNJ JamesWilsonPittsgroveNJ EricForemanWarminsterPA19111 Customer FirstName Greg Lisa James Eric This returns the FirstName column for every row in the Customer table. Called a “View.” This returns the FirstName column for every row in the Customer table. Called a “View.” A schema is a collection of tables. It is, essentially, the database. A schema is a collection of tables. It is, essentially, the database. Your schema will use your mx MySQL ID (i.e., m999orderdb.Customer)

Retrieving multiple columns SELECT FirstName, State FROM orderdb.Customer; SELECT * FROM orderdb.Customer; FirstNameState GregNJ LisaNJ JamesNJ EricPA CustomerIDFirstNameLastNameCityStateZip 1001GregHousePrincetonNJ LisaCuddyPlainsboroNJ JamesWilsonPittsgroveNJ EricForemanWarminsterPA19111 The * is called a wildcard. It means “return every column.” The * is called a wildcard. It means “return every column.” It’s good practice to end every statement with a semicolon, especially when entering multiple statements.

Retrieving unique values SELECT DISTINCT State FROM orderdb.Customer; SELECT DISTINCT City, State FROM orderdb.Customer; State NJ PA Returns only one occurrence of each value in the column. CityState PrincetonNJ PlainsboroNJ PittsgroveNJ WarminsterPA In this case, each combination of City AND State is unique, so it returns all of them.

Counting records SELECT COUNT(FirstName) FROM orderdb.Customer; SELECT COUNT(CustomerID) FROM orderdb.Customer; SELECT COUNT(*) FROM orderdb.Customer; 4 Total number of records in the table where the field is not empty. (don’t forget the parentheses!) Total number of records in the table where the field is not empty. (don’t forget the parentheses!) 4 Why is this the same number as the previous query? ? What number would be returned?

Fancier counting of records SELECT State, COUNT(FirstName) FROM orderdb.Customer GROUP BY State; StateCOUNT(FirstName) NJ3 PA1 GROUP BY organizes the results by column values. So it looks for unique State values and then counts the number of records for each of those values. GROUP BY organizes the results by column values. So it looks for unique State values and then counts the number of records for each of those values. Asks: How many customers from each state are there in the Customer table?

Counting and sorting SELECT State, COUNT(FirstName) FROM orderdb.Customer GROUP BY State ORDER BY COUNT(FirstName); StateCOUNT(FirstName) PA1 NJ3 GROUP BY organizes the results by column values. ORDER BY sorts results from lowest to highest based on a field (in this case, COUNT(FirstName))

ORDER BY ASC and DESC SELECT State, COUNT(FirstName) FROM orderdb.Customer GROUP BY State ORDER BY COUNT(FirstName) DESC; StateCOUNT(FirstName) NJ3 PA1 Forces the results to be sorted in DESCending order SELECT State, COUNT(FirstName) FROM orderdb.Customer GROUP BY State ORDER BY COUNT(FirstName) ASC; StateCOUNT(FirstName) PA1 NJ3 Forces the results to be sorted in ASCending order

Functions: Retrieving highest, lowest, average, and sum SELECT MAX(Price) FROM orderdb.Product; SELECT MIN(Price) FROM orderdb.Product; SELECT AVG(Price) FROM orderdb.Product; SELECT SUM(Price) FROM orderdb.Product; ProductIDProductNamePrice 2251Cheerios Bananas Eggo Waffles2.99 Product Price 3.99 Price 1.29 Price Price 8.27

Returning only certain records We don’t always want every record in the table use: SELECT * FROM schema_name.table_name WHERE condition; so SELECT * FROM orderdb.Customer WHERE State= 'NJ'; returns this: CustomerIDFirstNameLastNameCityStateZip 1001GregHousePrincetonNJ LisaCuddyPlainsboroNJ JamesWilsonPittsgroveNJ EricForemanWarminsterPA19111 Customer Let’s retrieve only those customers who live in New Jersey. CustomerIDFirstNameLastNameCityStateZip 1001GregHousePrincetonNJ LisaCuddyPlainsboroNJ JamesWilsonPittsgroveNJ09121

More conditional statements SELECT * FROM orderdb.Customer WHERE State <> 'NJ'; SELECT ProductName, Price FROM orderdb.Product WHERE Price > 2; CustomerIDFirstNameLastNameCityStateZip 1004EricForemanWarminsterPA19111 ProductIDProductNamePrice 2251Cheerios Eggo Waffles2.99 Put single quotes around string (non-numeric) values. The quotes are optional for numeric values. Put single quotes around string (non-numeric) values. The quotes are optional for numeric values. > means “greater than” means “not equal to” > means “greater than” means “not equal to”

Combining WHERE and COUNT SELECT COUNT(FirstName) FROM orderdb.Customer WHERE State= 'NJ'; SELECT COUNT(ProductName) FROM orderdb.Product WHERE Price < 3; 3 2 Review: Does it matter which field in the table you use in the SELECT COUNT query? Asks: How many customers live in New Jersey? Asks: How many products cost less than $3?

Querying multiple tables Right now, you can answer – How many customers live in New Jersey? – What is the most expensive product sold? Because those two questions can be answered looking at only a single table. But what if we want to find out the orders a customer placed? You need a construct a query that combines two (or more) tables.

The (Inner) Join We’ve seen this before We matched the Order and Customer tables based on the common field (CustomerID) We can construct a SQL query to do this `Order` Number OrderDateCustomer ID FirstNameLastNameCityStateZip GregHousePrincetonNJ LisaCuddyPlainsboroNJ GregHousePrincetonNJ EricForemanWarminsterPA19111 Order Table Customer Table

Joining tables using WHERE SELECT * FROM orderdb.Customer, orderdb.`Order` WHERE Customer.CustomerID=`Order`.CustomerID; Returns this: Customer. CustomerID FirstNameLastNameCityStateZipOrder Number OrderDateOrder. CustomerID 1001GregHousePrincetonNJ LisaCuddyPlainsboroNJ GregHousePrincetonNJ EricForemanWarminsterPA Note that all the fields are there, but depending on the database system, the field order may be different.

A closer look at the JOIN syntax SELECT * FROM orderdb.Customer, orderdb.`Order` WHERE Customer.CustomerID=`Order`.CustomerID; SELECT *Return all the columns from both tables FROM m1orderdb.Customer, m1orderdb.`Order` The two tables to be joined WHERE Customer.CustomerID = `Order`.CustomerID Only choose records where the CustomerID exists in both tables Another way to say it: Choose customers that have placed an order Another way to say it: Choose customers that have placed an order The “.” notation is Table.Field We need this when two tables have the same field name. The “.” notation is Table.Field We need this when two tables have the same field name.

Why is Order surrounded by “back quotes”? SELECT * FROM orderdb.Customer, orderdb.`Order` WHERE Customer.CustomerID=`Order`.CustomerID; Order is a reserved word in SQL. It is a command. – As in “ORDER BY” The back quotes tell MySQL to treat `Order` as a database object and not a command. Sometimes it can figure out the difference without the back quotes, but including them doesn’t hurt. For a list of reserved words in MySQL, go to: For a list of reserved words in MySQL, go to:

A more complex join Let’s say we want to find out what each customer ordered We want to wind up with this view of the database OrderNumberFirstNameLastNameProductNameQuantityPrice 101GregHouseCheerios GregHouseBananas GregHouseEggo Waffles LisaCuddyCheerios LisaCuddyBananas GregHouseEggo Waffles EricForemanEggo Waffles82.99

How to do it? We need information from Customer and Product (and Order-Product) So we need to link all of the tables together – To associate Customers with Products we need to follow the path from Customer to Product

Here’s the query SELECT `Order`.OrderNumber, Customer.FirstName, Customer.LastName, Product.ProductName, `Order-Product`.Quantity, Product.Price FROM orderdb.Customer, orderdb.`Order`, orderdb.Product, orderdb.`Order-Product` WHERE Customer.CustomerID=`Order`.CustomerID AND `Order`.OrderNumber=`Order-Product`.OrderNumber AND Product.ProductID=`Order-Product`.ProductID; It looks more complicated than it is! Note that we have three conditions in the WHERE clause, and we have three relationships in our schema. It looks more complicated than it is! Note that we have three conditions in the WHERE clause, and we have three relationships in our schema.

Now there are endless variations The total cost of all products bought by the customer “Greg House”? SELECT SUM(Product.Price*`Order-Product`.Quantity) FROM orderdb.Customer, orderdb.`Order`, orderdb.Product, orderdb.`Order-Product` WHERE Customer.CustomerID=`Order`.CustomerID AND `Order`.OrderNumber=`Order-Product`.OrderNumber AND Product.ProductID=`Order-Product`.ProductID AND Customer.CustomerID=1001; Answer: You could have also said Customer.LastName=‘House’, but it’s better to use the unique identifier.

What’s with the SUM() function? Notice that we’ve introduced something new SELECT SUM(Product.Price*`Order-Product`.Quantity) This multiplies price by quantity for each returned record, and then adds them together. You can perform arithmetic operations as long as the fields are numeric Question: What do you think would get returned if you left off the SUM() and just had SELECT Product.Price * Product.Quantity? Question: What do you think would get returned if you left off the SUM() and just had SELECT Product.Price * Product.Quantity?

LIMITing Results We know that this… Gives us this… SELECT * FROM orderdb.Product ORDER BY Price DESC; What if we want the two most expensive products? SELECT * FROM orderdb.Product WHERE Price >= 2.99; Only works if we know all the prices beforehand… …but then we wouldn’t need the query! ProductIDProductNamePrice 2251Cheerios Eggo Waffles Bananas1.29 Product

The LIMIT clause… SELECT * FROM orderdb.Product ORDER BY Price DESC LIMIT 2; This says: Give me all the columns Put rows in descending order by price But only give me the first two results ProductIDProductNamePrice 2251Cheerios Eggo Waffles2.99 Product What would we get if we left out DESC?

SQL Subselects We could also try to use LIMIT to find the least expensive product: SELECT * FROM orderdb.Product ORDER BY Price ASC LIMIT 1; But what if there is more than one product with the lowest value for price AND we don’t know how many there are?

Where MIN() alone fails us… SELECT MIN(price) FROM orderdb.Product; BUT SELECT MIN(price),ProductName FROM orderdb.Product; Price 1.29 PriceProductName 1.29Cheerios So what’s going on??

What’s going on… SELECT MIN(price),ProductName FROM orderdb.Product; PriceProductName 1.29Cheerios It returns the MIN(price) MIN() will always return only one row It chooses the first row in the Product column And it will do this for any function (AVG, SUM, etc.)

So we need a SQL subselect statement It’s where you have a SELECT statement nested inside another SELECT statement! SELECT Price,ProductName FROM orderdb.Product WHERE Price= (SELECT MIN(Price) FROM orderdb.Product); This is a temporary table from the database with one column and one row. Now you get all records back with that (lowest) price and avoid the quirk of the MIN() function.

Subselects come in handy in other situations too… We want to get a COUNT of how many DISTINCT states there are in the table. SELECT COUNT(*) FROM (SELECT DISTINCT State FROM orderdb.Customer) AS tmp1; To see how this works: – Start with the SELECT DISTINCT… – …then COUNT those values State NJ PA 2

Why do we need AS? SELECT COUNT(*) FROM (SELECT DISTINCT State FROM orderdb.Customer) AS tmp1; You’re basically SELECTing from the temporary table generated by the nested query. But since you’re SELECTing FROM that temporary table you have to give it a name (i.e., tmp1)