GROUP BY & Subset Data Analysis

Slides:



Advertisements
Similar presentations
Concepts of Database Management Seventh Edition
Advertisements

Concepts of Database Management Sixth Edition
AGGREGATE FUNCTIONS Prof. Sin-Min Lee Surya Bhagvat CS 157A – Fall 2005.
Chapter 11 Group Functions
Introduction to Structured Query Language (SQL)
Introduction to Oracle9i: SQL1 SQL Group Functions.
Structured Query Language Part I Chapter Three CIS 218.
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Microsoft Access 2010 Chapter 7 Using SQL.
Chapter 7: SQL, the Structured Query Language Soid Quintero & Ervi Bongso CS157B.
Computer Science 101 Web Access to Databases SQL – Extended Form.
SQL Operations Aggregate Functions Having Clause Database Access Layer A2 Teacher Up skilling LECTURE 5.
SQL – Logical Operators and aggregation Chapter 3.2 V3.0 Napier University Dr Gordon Russell.
Concepts of Database Management, Fifth Edition
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
Chapter 6 Group Functions. Chapter Objectives  Differentiate between single-row and multiple-row functions  Use the SUM and AVG functions for numeric.
Chapter 3 Single-Table Queries
SQL/lesson 2/Slide 1 of 45 Retrieving Result Sets Objectives In this lesson, you will learn to: * Use wildcards * Use the IS NULL and IS NOT NULL keywords.
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Nitin Singh/AAO RTI ALLAHABAD 1 SQL Nitin Singh/AAO RTI ALLAHABAD 2 OBJECTIVES §What is SQL? §Types of SQL commands and their function §Query §Index.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
SQL for Data Retrieval. Running Example IST2102 Data Preparation Login to SQL server using your account Select your database – Your database name is.
SQL-5 (Group By.. Having). Group By  Need: To apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based on some.
CIS 375—Web App Dev II SQL. 2 Introduction SQL (Structured _______ Language) is an ANSI standard language for accessing databases.ANSI SQL can execute.
CIS 375—Web App Dev II SQL. 2 Introduction SQL (Structured _______ Language) is an ANSI standard language for accessing databases.ANSI SQL can execute.
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Structured Query Language SQL Unit 2 An Introduction to Organizing and Retrieving Data with SQL.
SQL Aggregation Oracle and ANSI Standard SQL Lecture 9.
Session 9 Accessing Data from a Database. RDBMS and Data Management/ Session 9/2 of 34 Session Objectives Describe the SELECT statement, its syntax and.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
SQL: Single Table Queries SELECT FROM WHERE ORDER D. Christozov / G.Tuparov INF 280 Database Systems: Single Table Queries 1.
Aggregating Data Using Group Functions. What Are Group Functions? Group functions operate on sets of rows to give one result per group.
1 Chapter 3 Single Table Queries. 2 Simple Queries Query - a question represented in a way that the DBMS can understand Basic format SELECT-FROM Optional.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
Retrieving Information Pertemuan 3 Matakuliah: T0413/Current Popular IT II Tahun: 2007.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
SQL SQL Ayshah I. Almugahwi Maryam J. Alkhalifa
SQL Query Getting to the data ……..
More SQL: Complex Queries,
Chapter 3 Introduction to SQL(3)
Dead Man Visiting Farrokh Alemi, PhD Narrated by …
SQL FUNDAMENTALS CDSE Days 2018.
(SQL) Aggregating Data Using Group Functions
Graphical Interface for Queries
Chapter # 7 Introduction to Structured Query Language (SQL) Part II.
SQL for Calculating Likelihood Ratios
Types of Joins Farrokh Alemi, Ph.D.
SQL for Cleaning Data Farrokh Alemi, Ph.D.
Receiver Operating Curves
SQL.
Sections 4– Review of Joins, Group functions, COUNT, DISTINCT, NVL
SELECT & FROM Commands Farrokh Alemi, PhD
SQL – Entire Select.
Rank Order Function Farrokh Alemi, Ph.D.
Chapter 4 Summary Query.
Aggregating Data Using Group Functions
Procedures Organized by Farrokh Alemi, Ph.D. Narrated by Yara Alemi
Access: SQL Participation Project
Indexing & Computational Efficiency
CS122 Using Relational Databases and SQL
Query Functions.
Access: Queries III Participation Project
Spreadsheets, Modelling & Databases
Section 4 - Sorting/Functions
分组函数 Schedule: Timing Topic 35 minutes Lecture 40 minutes Practice
Introduction to SQL Server and the Structure Query Language
Group Operations Part IV.
Presentation transcript:

GROUP BY & Subset Data Analysis Farrokh Alemi, Ph.D. This section provides a brief introduction to the GROUP BY command within SQL and shows how it can be used to create summaries of data. This brief presentation was organized by Dr. Alemi. It was narrated by xxx

2nd 1st 3rd One Value Reported Cross Join Purpose The GROUP BY command tells the software to summarize the values in a column for subsets of data. If several values are reported within the subset, the GROUP BY command reports only one value per subset.

SELECT expression1, expression2, ... expression_n, Cross Join SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Syntax The syntax of the GROUP BY command is given in this slide.

SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Syntax Any fields, or expressions of fields, must either be listed in the GROUP BY command or encapsulated within an aggregate function in the SELECT portion of the command.

AVG SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Aggregate Functions Aggregate functions include AVG, where in all records in the subset of data are averaged. AVG

STDEV SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Aggregate Functions It includes STDEV, where the standard deviation of all records in the subset of data are calculated. STDEV

COUNT SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Aggregate Functions A common aggregate function is COUNT, where all values in the subset are counted. The COUNTIF counts a value if it meets a logical test. COUNT(DISTINCT, Field) calculates distinct values in the field. COUNT

MAX or MIN SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Aggregate Functions Finally MAX and MIN functions select the maximum or minimum value for the subset of data. Maximum of a numerical field gives the largest number in the subset. Maximum of a date field will selects the most recent value. Minimum of a date field selects the first date in our subset. MAX or MIN

SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Optional Portion The WHERE and ORDER BY commands are optional.

SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Optional Portion The ORDER BY command lists the data in a particular ascending or descending order of a set of fields.

SELECT expression1, expression2, ... expression_n, aggregate_function (other_expressions) FROM tables [WHERE conditions] GROUP BY expression1, expression2, ... expression_n [ORDER BY aggregate_function (expression) [ ASC | DESC ]]; Optional Portion The WHERE command restricts the data to the situation where the stated condition has been met.

SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final USE AgeDx SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final WHERE AgeAtDeath is null GROUP BY ID ORDER BY Count(distinct icd9) desc; Example The code snippet shows an example of use of GROUP BY command. The code reports the number of distinct diagnoses for patients who have not died.

SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final USE AgeDx SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final WHERE AgeAtDeath is null GROUP BY ID ORDER BY Count(distinct icd9) desc; Example In FROM and USE parts, the code specifies that the table “final” from database AgeDx should be used.

SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final USE AgeDx SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final WHERE AgeAtDeath is null GROUP BY ID ORDER BY Count(distinct icd9) desc; Example The GROUP BY command tells it to do separate analysis for each patient. Since there are several records available for each patient ID, the GROUP BY command tells the computer to return only one value per patient.

SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final USE AgeDx SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final WHERE AgeAtDeath is null GROUP BY ID ORDER BY Count(distinct icd9) desc; Example In the SELECT portion of the code, ID is listed without an aggregate function because it is already part of the GROUP BY command. The ICD9 code is not in the GROUP BY command so it must be listed with an aggregate function, in this case the count function. The same holds true for fields listed in the ORDER BY command.

SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final USE AgeDx SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final WHERE AgeAtDeath is null GROUP BY ID ORDER BY Count(distinct icd9) desc; Example The COUNT command tells the computer to report number of distinct entries in the field ICD9.

SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final USE AgeDx SELECT top 10 ID, Count(distinct icd9) AS CountDx FROM dbo.final WHERE AgeAtDeath is null GROUP BY ID ORDER BY Count(distinct icd9) desc; Example The WHERE command tells the computer to focus on alive patients. Note that variables in the WHERE portion of the code do not need to be encapsulated in aggregate function. The WHERE command is executed before the GROUP BY command. In large data, the use of WHERE command can make GROUP BY computations much faster.

ID CountDx 134748 195 153091 187 244694 187 728678 184 694089 180 571207 179 222254 178 756012 176 636920 176 541352 175 Resulting Data The slide reports the resulting data. Each ID is followed by the count of the patient’s distinct diagnoses. For ID 134,748 there were 195 distinct diagnoses. Seems a lot but we need to see over what timeframe.

Summarize All Summarize One WHERE is an Exception Except for WHERE If you summarize one field in your query, all listed fields must be summarized. So if you do not need to summarize some fields, just do not include those fields in the query.

The GROUP BY command summarizes the fields for subsets of data.