MIS2502: Data Analytics SQL – Getting Information Out of a Database Part 2: Advanced Queries Aaron Zhi Cheng http://community.mis.temple.edu/zcheng/ acheng@temple.edu.

Slides:



Advertisements
Similar presentations
© 2007 by Prentice Hall (Hoffer, Prescott & McFadden) 1 Joins and Sub-queries in SQL.
Advertisements

MIS2502: Data Analytics Relational Data Modeling
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
MIS2502: Data Analytics Coverting ERD into a DB Schema David Schuff
SQL 1: GETTING INFORMATION OUT OF A DATABASE MIS2502 Data Analytics.
MIS2502: Data Analytics SQL – Getting Information Out of a Database David Schuff
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Data Analytics SQL – Putting Information Into a Database David Schuff
RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world.
Course title: Database-ii Chap No: 03 “Advanced SQL” Course instructor: ILTAF MEHDI.
MIS2502: Data Analytics SQL – Getting Information Out of a Database.
MIS2502: Data Analytics Relational Data Modeling David Schuff
QUERY CONSTRUCTION CS1100: Data, Databases, and Queries CS1100Microsoft Access1.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
SQL LANGUAGE TUTORIAL Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha.
Day 5 - More Complexity With Queries Explanation of JOIN & Examples Explanation of JOIN & Examples Explanation & Examples of Aggregation Explanation &
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Select Complex Queries Database Management Fundamentals LESSON 3.1b.
Lec-7. The IN Operator The IN operator allows you to specify multiple values in a WHERE clause. SQL IN Syntax SELECT column_name(s) FROM table_name WHERE.
How to: SQL By: Sam Loch.
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
MIS2502: Data Analytics SQL – Putting Information Into a Database
MIS2502: Data Analytics Relational Data Modeling
Order Database – ER Diagram
MIS2502: Data Analytics Dimensional Data Modeling
MIS5101: Business Intelligence Relational Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics SQL – Putting Information Into a Database
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
MIS2502: Data Analytics Dimensional Data Modeling
Structured Query Language (SQL) William Klingelsmith
CHAPTER 7: ADVANCED SQL.
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
MIS2502: Data Analytics SQL – Getting Information Out of a Database
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Data Analytics SQL – Putting Information Into a Database
MIS2502: Review for Exam 1 JaeHwuen Jung
MIS2502: Data Analytics SQL – Getting Information Out of a Database
MIS2502: Data Analytics Converting ERDs to Schemas
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
MIS2502: Data Analytics The Information Architecture of an Organization Aaron Zhi Cheng Acknowledgement:
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
Exam 2 Exam 2 Study Guide is posted on the Course Site
MIS2502: Data Analytics SQL – Putting Information Into a Database
MIS2502: Data Analytics MySQL and SQL Workbench
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Review for Exam 1 Aaron Zhi Cheng
MIS2502: Data Analytics SQL – Putting Information Into a Database
Structured Query Language – The Fundamentals
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics SQL – Putting Information Into a Database
M1G Introduction to Database Development
Introduction To Structured Query Language (SQL)
ER Diagram Master How to use this template
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Data Analytics Introduction to Advanced Analytics
Advanced Joins IN ( ) Expression Subqueries with IN ( ) Expression
MIS2502: Data Analytics SQL 4– Putting Information Into a Database
Joins and other advanced Queries
MIS2502: Data Analytics SQL – Getting Information Out of a Database Part 1: Basic Queries Aaron Zhi Cheng
MIS2502: Data Analytics SQL – Getting Information Out of a Database Part 2: Advanced Queries Zhe (Joe) Deng
MIS2502: Data Analytics Relational Data Modeling 3
Manipulating Data Lesson 3.
Use of SQL – The Patricia database
Instructor: Zhe He Department of Computer Science
Presentation transcript:

MIS2502: Data Analytics SQL – Getting Information Out of a Database Part 2: Advanced Queries Aaron Zhi Cheng http://community.mis.temple.edu/zcheng/ acheng@temple.edu Acknowledgement: David Schuff

Where we are… Now we’re here… Data entry Transactional Database Data extraction Analytical Data Store Data analysis Stores real-time transactional data Stores historical transactional and summary data Called OLTP: Online transaction processing Called OLAP: Online analytical processing

The relational database Core of Online Transaction Processing (OLTP) A series of tables Linked together through primary/foreign key relationships

Querying multiple tables Right now, you can answer with data from a single table What if you need to combine two (or more) tables? For example, what if we want to find out the orders a customer placed?

The (Inner) Join We’ve seen this before We matched the Order and Customer tables based on the common field (CustomerID) We can construct a SQL query to do this Order Table Customer Table Order Number OrderDate Customer ID FirstName LastName City State Zip 101 2011-3-2 1001 Greg House Princeton NJ 09120 102 2011-3-3 1002 Lisa Cuddy Plainsboro 09123 103 2011-3-4 104 2011-3-6 1004 Eric Foreman Warminster PA 19111 Recall that: We will need to use back quotes for `Order`

Joining tables using WHERE SELECT * FROM orderdb.Customer, orderdb.`Order` WHERE Customer.CustomerID=`Order`.CustomerID; Returns this: Customer. CustomerID FirstName LastName City State Zip Order Number OrderDate Order. CustomerID 1001 Greg House Princeton NJ 09120 101 2011-3-2 1002 Lisa Cuddy Plainsboro 09123 102 2011-3-3 103 2011-3-4 1004 Eric Foreman Warminster PA 19111 104 2011-3-6 Note that all the fields are there, but depending on the database system, the field order may be different.

A closer look at the JOIN syntax SELECT * FROM orderdb.Customer, orderdb.`Order` WHERE Customer.CustomerID=`Order`.CustomerID; SELECT * Return all the columns from both tables FROM orderdb.Customer, orderdb.`Order` The two tables to be joined WHERE Customer.CustomerID = `Order`.CustomerID Only choose records where the CustomerID exists in both tables Another way to say it: Choose customers that have placed an order The “.” notation is table_name.column_name We need this when two tables have the same column name.

What If We Don’t Have the WHERE condition? It will fetch every possible combination (pair) of records from the two tables SELECT * FROM orderdb.Customer, orderdb.`Order` WHERE Customer.CustomerID=`Order`.CustomerID; Returns this: Customer. CustomerID FirstName LastName City State Zip Order Number OrderDate Order. CustomerID 1001 Greg House Princeton NJ 09120 101 2011-3-2 1002 Lisa Cuddy Plainsboro 09123 1004 Eric Foreman Warminster PA 19111 102 2011-3-3 103 2011-3-4 104 2011-3-6

A more complex join We want to wind up with this view of the database Question: What products did each customer order? We want to wind up with this view of the database OrderNumber FirstName LastName ProductName Quantity Price 101 Greg House Cheerios 2 3.99 Bananas 3 1.29 Eggo Waffles 1 2.99 102 Lisa Cuddy 5 103 104 Eric Foreman 8

How to do it? We need information from Customer and Product (and OrderProduct) To associate Customer table with Product table, we need to follow the path from Customer to Product

It looks more complicated than it actually is! Here’s the query SELECT `Order`.OrderNumber, Customer.FirstName, Customer.LastName, Product.ProductName, OrderProduct.Quantity, Product.Price FROM orderdb.Customer, orderdb.`Order`, orderdb.Product, orderdb.OrderProduct WHERE Customer.CustomerID=`Order`.CustomerID AND `Order`.OrderNumber=OrderProduct.OrderNumber AND Product.ProductID=OrderProduct.ProductID; It looks more complicated than it actually is! Note that we have three conditions in the WHERE clause, and we have three relationships in our schema.

Now there are endless variations Question: What is the total cost (prices) of all products bought by the customer “Greg House”? SELECT SUM(Product.Price*OrderProduct.Quantity) FROM orderdb.Customer, orderdb.`Order`, orderdb.Product, orderdb.OrderProduct WHERE Customer.CustomerID=`Order`.CustomerID AND `Order`.OrderNumber=OrderProduct.OrderNumber AND Product.ProductID=OrderProduct.ProductID AND Customer.CustomerID=1001; Answer: You could have also said Customer.LastName=‘House’, but it’s better to use the unique identifier. 23.81

What’s with the SUM() function? Notice that we’ve introduced something new SELECT SUM(Product.Price*OrderProduct.Quantity) This multiplies price by quantity for each returned record, and then adds them together. You can perform arithmetic operations as long as the fields are numeric Question: What do you think would get returned if you left off the SUM() and just had SELECT Product.Price * Product.Quantity?

Recall the LIMIT Clause We could try to use LIMIT to find the least expensive product: SELECT * FROM orderdb.Product ORDER BY Price ASC LIMIT 1; But what if there is more than one product with the lowest value for price AND we don’t know how many there are?

Where MIN() alone fails us… SELECT MIN(price) FROM orderdb.Product; BUT SELECT MIN(price),ProductName FROM orderdb.Product; Price 1.29 Price ProductName 1.29 Cheerios Wait…. Cheerios’ price should be 3.99. So what’s going on??

And it will do this for any function (AVG, SUM, etc.) What’s wrong… SELECT MIN(price),ProductName FROM orderdb.Product; It returns the MIN(price): $1.29 MIN() will always return only one row So for ProductName, it chooses the first row in the Product column, i.e., Cheerios Price ProductName 1.29 Cheerios And it will do this for any function (AVG, SUM, etc.)

So we need a SQL subselect statement It’s where you have a SELECT statement nested inside another SELECT statement! SELECT Price,ProductName FROM orderdb.Product WHERE Price= (SELECT MIN(Price) FROM orderdb.Product); Now you get all records back with that (lowest) price and avoid the quirk of the MIN() function. This is a temporary table from the database with one column and one row.

How would SQL execute this query? SELECT Price,ProductName FROM orderdb.Product WHERE Price= (SELECT MIN(Price) FROM orderdb.Product); SELECT MIN(Price) FROM orderdb.Product Step 1: Execute what is in the parentheses to find the lowest price Step 2: Plug the lowest price into the main query, and execute the main query MIN(Price) 1.29 SELECT Price,ProductName FROM orderdb.Product WHERE Price=1.29; Price ProductName 1.29 Bananas

Subselects come in handy in other situations too… We want to get a COUNT of how many DISTINCT states there are in the table. SELECT COUNT(*) FROM (SELECT DISTINCT State FROM orderdb.Customer); To see how this works: Start with what is in the parentheses SELECT DISTINCT State FROM orderdb.Customer …then COUNT those values State NJ PA 2

To do this, we will need all the tables. Subselects with Joins Question: What is the least expensive product bought by customers from New Jersey? To do this, we will need all the tables.

Subselects with Joins First, we need to figure out the lowest price of products by customers from New Jersey? But this is not enough… We also need to find the product name. SELECT MIN(Product.Price) FROM orderdb.Customer, orderdb.`Order`, orderdb.Product, orderdb.OrderProduct WHERE Customer.CustomerID=`Order`.CustomerID AND `Order`.OrderNumber=OrderProduct.OrderNumber AND Product.ProductID=OrderProduct.ProductID AND Customer.State='NJ'; Price 1.29

Subselects with Joins So we nest the previous query in a big query: SELECT Product.ProductName, Product.Price FROM orderdb.Customer, orderdb.`Order`, orderdb.Product, orderdb.OrderProduct WHERE Customer.CustomerID=`Order`.CustomerID AND `Order`.OrderNumber=OrderProduct.OrderNumber AND Product.ProductID=OrderProduct.ProductID AND Customer.State='NJ' AND Product.Price=( SELECT MIN(Product.Price) FROM orderdb.Customer, orderdb.`Order`, orderdb.Product, orderdb.OrderProduct AND Customer.State='NJ'); ProductName Price Bananas 1.29

Summary Given a schema of a database, we now should be able to create a SQL statement (query) to answer a question based on multiple tables Understand how to use Joins Subselects