Notes on SQL. SQL Programming Employers increasingly tell us that they look for 3 things on a resume: SAS, R and SQL. In these notes you will learn: 1.What.

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

Characteristic Functions. Want: YearCodeQ1AmtQ2AmtQ3AmtQ4Amt 2001e (from fin_data table in Sybase Sample Database) Have: Yearquartercodeamount.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Professional Seminar Northwestern Polytechnic University By Dr. Michael M Cheng.
Concepts of Database Management Seventh Edition
Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
Introduction to SQL Session 2 Retrieving Data From Multiple Tables.
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Query-By-Example (QBE) 2440: 180 Database Concepts.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Section 3.2 Measures of Variation Range Standard Deviation Variance.
Rationale Aspiring Database Developers should be able to efficiently query and maintain databases. This module will help students learn the Structured.
Concepts of Database Management, Fifth Edition
SAS SQL SAS Seminar Series
Introduction to Databases Chapter 7: Data Access and Manipulation.
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
Database Queries. Queries Queries are questions used to retrieve information from a database. Contain criteria to specify the records and fields to be.
PROC SQL Phil Vecchione. SQL Structured Query Language Developed by IBM in the early 1970’s From the 70’s to the late 80’s there were different types.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Concepts of Database Management Seventh Edition
MATLAB for Engineers 4E, by Holly Moore. © 2014 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. This material is protected by Copyright.
Using Special Operators (LIKE and IN)
Programming in R SQL in R. Running SQL in R In this session I will show you how to: Run basic SQL commands within R.
Descriptive Statistics: Presenting and Describing Data.
SQL Basic. What is SQL? SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
Measures of Variation Range Standard Deviation Variance.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
SQL. Originally developed by IBM Standardized in 80’s by ANSI and ISO Language to access relational database and English-like non-procedural Predominant.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
SQL SQL Ayshah I. Almugahwi Maryam J. Alkhalifa
SAS and Other Packages SAS can interact with other packages in a variety of different ways. We will briefly discuss SPSSX (PASW) SUDAAN IML SQL will be.
Session 1 Retrieving Data From a Single Table
SQL Query Getting to the data ……..
Repetitive Structures
DATA MANAGEMENT MODULE: USING SQL in R
Linear Algebra Review.
Applied Business Forecasting and Regression Analysis
제 5장 기술통계 및 추론 PROC MEANS 절차 PROC MEANS <options> ;
Loops BIS1523 – Lecture 10.
Using Structured Query Language (SQL) (continued)
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
The Database Exercises Fall, 2009.
CPSC-608 Database Systems
Descriptive Statistics: Presenting and Describing Data
ECONOMETRICS ii – spring 2018
SQL FUNDAMENTALS CDSE Days 2018.
DATA MANAGEMENT MODULE: USING SQL in R
IE-432 Design Of Industrial Experiments
GO! with Microsoft® Access e
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
Database Queries.
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Quick Data Summaries in SAS
Defining and Calling a Macro
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
Access: SQL Participation Project
Summarizing Data with Summary Functions
Structured Query Language – The Fundamentals
Lab 3 and HRP259 Lab and Combining (with SQL)
Producing Descriptive Statistics
Chapter 8 Advanced SQL.
CHAPTER 2: Basic Summary Statistics
CPSC-608 Database Systems
14.3 Measures of Dispersion
2.3. Measures of Dispersion (Variation):
Database SQL.
Presentation transcript:

Notes on SQL

SQL Programming Employers increasingly tell us that they look for 3 things on a resume: SAS, R and SQL. In these notes you will learn: 1.What SQL is 2.Why it is used 3.The basics of SQL syntax And, we will go through a few REALLY fun and exciting examples.

SQL Programming What is SQL? SQL stands for “Structured Query Language”. It was designed as a language to manage data in relational database management systems (DBMS). The SQL language is sub-divided into several language elements, including: Queries, which retrieve the data based on specific criteria. This is the most important element of SQL. Clauses, which are constituent components of statements and queries. Expressions, which can produce either scalar values or tables consisting of columns and rows of data. Statements, which may have a persistent effect on schemas and data, or which may control transactions, program flow, connections, sessions, or diagnostics. SQL statements also include the semicolon statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.

Why is PROC SQL better than Data steps? The syntax is transferable to other SQL software packages You can join up to 250 SAS tables No need to sort any of the input tables When is Proc SQL not better than Data steps? Uses more memory than any regular data/procedure steps Could take longer than other procedures when working with very large contributing tables Logic flow becomes harder to implement SQL Programming

Why do we use SQL? SQL is used primarily to: Retrieve data from and manipulate tables/datasets Add or modify data values in a table/datasets Add, modify, or drop columns in a table/datasets Create tables and views Join/Merge multiple tables (whether or not they contain columns with the same name) Generate reports.

SQL Programming Why do we use SQL? You probably noticed that the previous list includes a lot of things that we do with DATA statements in SAS. In many cases, SQL is a better alternative to DATA statements in SAS – it is more efficient. Clarification regarding SQL in SAS… We use SQL like Data Statements in SAS…NOT like (most) Proc Statements. SQL is used to extract data, merge data and create variables…not to analyze data. Lets take a look…

SQL Programming Consider the Pennstate1 dataset. Lets say that you needed to: Only retain sex, earpierces, tattoos, height, height choice, looks and friends variables. Sort by sex. Delete observations with more than 4 earpierces. Create a new variable called HeightDifference which is the difference between their current height and their Height Choice. Create a new dataset called “Modeling” from the above requirements.

SQL Programming My guess is that at this point, you would use a DATA step and your code would look something like this: Data Modeling (keep = sex earprces tattoo height htchoice looks friends); set jlp.pennstate2; where earprces < 4; Heightdiff = Htchoice-Height; run; Proc sort data=modeling; by Sex; run; This code would run and produce what you need.

SQL Programming Here is what this same requirement would look like using Proc SQL: proc sql; create table work.modeling as Select sex,earprces,tattoo,height,htchoice,looks,friends, Htchoice-Height as HeightDiff from jlp.pennstate2 where earprces<4 order by sex; quit; What do you notice about this code that is unexpected in SAS?

SQL Programming Lets pull this apart: proc sql; create table work.modeling as

SQL Programming Select sex,earprces,tattoo,height,htchoice,looks,friends, Htchoice-Height as HeightDiff from jlp.pennstate2.

where earprces<4 order by sex; quit; SQL Programming

Lets look at another example…lets focus on categorizing a variable. Consider the UCDAVIS1 dataset. Create a new dataset called UCTEST. Only retain GPA, SEAT, SEX and ALCOHOL. Create a new variable “GPACAT” which is a categorization of the GPA variable…where <2 is low, <3 is medium and <4 is high. How would we do this without using SQL and using SQL…

Using a Data step, your code probably looks like this: Data UCTEST (keep = GPA GPACAT SEX ALCOHOL); set jlp.ucdavis1; Format GPACAT $CHAR7.; If GPA =. then GPACAT =" "; else if GPA <= 2 then GPACAT = "LOW"; else if GPA <= 3 then GPACAT = "MEDIUM"; else GPACAT = "HIGH"; Run; Proc print data=UCTEST; Run; SQL Programming  Why do we need this format statement?

Using SQL, your code probably looks like this: PROC SQL; CREATE TABLE work.UCTEST AS SELECT GPA,Sex,alcohol, CASE WHEN GPA =. THEN ' ‘ WHEN GPA<= 2.0 THEN 'LOW‘ WHEN GPA<= 3.0 THEN 'MEDIUM‘ ELSE 'HIGH‘ END AS GPACAT FROM jlp.ucdavis1; QUIT; SQL Programming What do you notice about this code that is different from the Data step?

SQL Programming Lets look at another example...lets focus on creating a new quantitative variable using a mathematical operator. Consider the UCDAVIS1 dataset again. Create a new dataset called UCTEST1. Create a new variable that is called “Leisure” which is the amount of TV time plus the amount of Computer time. Create a new variable that is 2x the sleep variable. Only retain those sitting in the front and the back. Sort the data by seat. How would we do this without using SQL and using SQL…

SQL Programming Using a Data step, your code probably looks like this: Data UCTEST1 (keep = TV Computer Sleepx2 Seat Leisure); set jlp.ucdavis1; Leisure = (TV + Computer); Sleepx2 = Sleep* 2; If seat = "Middle" then delete; Run; Proc sort data = UCTEST1; by seat; Run;

SQL Programming Using SQL, your code probably looks like this: PROC SQL; CREATE TABLE work.TEST AS SELECT TV, Computer, Sleep, Seat,(TV + Computer) AS Leisure FROM jlp.ucdavis1 WHERE SEAT IN ('Front', 'Back') ORDER BY SEAT; QUIT;

SQL Programming *The general form of PROC SQL includes the following: PROC SQL; SELECT CREATE TABLE...AS FROM WHERE 3.0> ORDER BY ; CASE WHEN END AS ; QUIT;

SQL Programming – Summary Statistics: The above table can found here Proc SQL syntaxDescription AVG, MEAN means or average of values COUNT, FREQ, N number of nonmissing values CSS corrected sum of squares CV coefficient of variation (percent) MAX largest value MIN smallest value NMISS number of missing values PRT probability of a greater absolute value of Student's t RANGE range of values STD standard deviation STDERR standard error of the mean SUM sum of values SUMWGT sum of the WEIGHT variable values T Student's t value for testing the hypothesis that the population mean is zero USS uncorrected sum of squares VAR variance

Any Questions?