Haas MFE SAS Workshop Lecture 3:

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Advanced SQL Topics Edward Wu.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Structured Query Language (SQL)
12 Copyright © 2005, Oracle. All rights reserved. Query Rewrite.
Using the Set Operators
Tutorial 3 – Creating a Multiple-Page Report
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
SQL: The Query Language Part 2
Dr. Engr. Sami ur Rahman Data Analysis Lecture 6: SPSS.
Advanced SQL (part 1) CS263 Lecture 7.
Variables Conditionals Boolean Expressions Conditional Statements How a program produces different results based on varying circumstances if, else if,
1 Lecture 5: SQL Schema & Views. 2 Data Definition in SQL So far we have see the Data Manipulation Language, DML Next: Data Definition Language (DDL)
Database Performance Tuning and Query Optimization
Campaign Overview Mailers Mailing Lists
Excel Lesson 11 Improving Data Accuracy
© Abdou Illia MIS Spring 2014
Chapter 12 Joining Tables Part C. SQL Copyright 2005 Radian Publishing Co.
MySQL Access Privilege System
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Excel 2010 by Robert Grauer, Keith.
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 1Creating Tables and Queries Chapter 2Creating Relationships between Tables.
Chapter 7 © 2013 Pearson Education, Inc. Publishing as Prentice Hall 1 Modern Database Management 11 th Edition Jeffrey A. Hoffer, V. Ramesh, Heikki Topi.
Microsoft Office Illustrated Fundamentals Unit K: Working with Data.
Microsoft Access.
Displaying Data from Multiple Tables
Chapter Information Systems Database Management.
Vanderbilt Business Objects Users Group 1 Reporting Techniques & Formatting Beginning & Advanced.
Review Chapter 11 - Tables © 2010, 2006 South-Western, Cengage Learning.
Pivot Tables Overview 1. What are Pivot Tables Pivot tables in Excel are a versatile reporting tool that makes it easy to extract information from large.
Benchmark Series Microsoft Excel 2013 Level 2
4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.
Addition 1’s to 20.
25 seconds left…...
Week 1.
We will resume in: 25 Minutes.
Relational Algebra and Relational Calculus
1 Chapter 3:Operators and Expressions| SCP1103 Programming Technique C | Jumail, FSKSM, UTM, 2006 | Last Updated: July 2006 Slide 1 Operators and Expressions.
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
Chapter 8 Improving the User Interface
Introduction to SQL Session 2 Retrieving Data From Multiple Tables.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Chapter 3: Using SQL Queries to Insert, Update, Delete, and View Data
Concepts of Database Management Sixth Edition
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Microsoft Access 2010 Chapter 7 Using SQL.
SAS SQL SAS Seminar Series
ASP.NET Programming with C# and SQL Server First Edition
Chapter 3 Single-Table Queries
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
IFS180 Intro. to Data Management Chapter 10 - Unions.
SAS and Other Packages SAS can interact with other packages in a variety of different ways. We will briefly discuss SPSSX (PASW) SUDAAN IML SQL will be.
Haas MFE SAS Workshop Lecture 3:
Database Systems: Design, Implementation, and Management Tenth Edition
Combining Data Sets in the DATA step.
Contents Preface I Introduction Lesson Objectives I-2
Shelly Cashman: Microsoft Access 2016
Presentation transcript:

Haas MFE SAS Workshop Lecture 3: Peng Liu http://faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006

Peng Liu http://faculty.haas.berkeley.edu/peliu/computing SAS SQL Peng Liu http://faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006

PROC SQL - What What does SQL can do? Selecting Ordering/sorting Subsetting Restructuring Creating table/view Joining/Merging Transforming variables Editing Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 3

PROC SQL - Why The Advantage of using SQL Combined functionality Faster for smaller tables SQL code is more portable for non-SAS applications Not require presorting Not require common variable names to join on. (need same type , length) Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 4

Selecting Data PROC SQL; SELECT DISTINCT rating FROM MFE.MOVIES; QUIT; The simplest SQL code, need 3 statements By default, it will print the resultant query, use NOPRINT option to suppress this feature. Begin with PROC SQL, end with QUIT; not RUN; Need at least one SELECT… FROM statement DISTINCT is an option that removes duplicate rows Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 5

Ordering/Sorting Data PROC SQL ; SELECT * FROM MFE.MOVIES ORDER BY category; QUIT; Remember the placement of the SAS statements has no effect; so we can put the middle statement into 3 lines SELECT * means we select all variables from dataset MFE.MOVIES Put ORDER BY after FROM. We sort the data by variable “category” Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 6

Subsetting Data - Character searching in WHERE PROC SQL; SELECT title, category FROM MFE.MOVIES WHERE category CONTAINS 'Action'; QUIT; Use comma (,) to separate selected variables CONTAINS in WHERE statement only for character variables Also try WHERE UPCASE(category) LIKE '%ACTION%'; Use wildcard char. Percent sign (%) with LIKE operator. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 7

Subsetting Data - Phonetic Matching in WHERE PROC SQL; SELECT title, category, rating FROM MFE.MOVIES WHERE category =* 'Drana'; QUIT; Always Put WHERE after FROM Sounds like operator =* Search movie title for the phonetic variation of “drama”, also help possible spelling variations Character searching Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 8

Case Logic - reassigning/recategorize PROC SQL; SELECT title, rating, CASE rating WHEN 'G' THEN ‘General' ELSE 'Other' END AS level FROM MFE.MOVIES; QUIT; The order of each statement is important CASE …END AS should in between SELECT and FROM Note there is , after the variables you want to select Use WHEN … THEN ELSE… to redefine variables Rename variable from “rating” to “level” More complicated, Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 9

Creating New Data - Create Table PROC SQL; CREATE TABLE ACTION AS SELECT title, category FROM MFE.MOVIES WHERE category CONTAINS 'Action'; QUIT; CREATE TABLE … AS can always be in front of SELECT … FROM statement to build a sas file. In SELECT, the results of a query are converted to an output object (printing). Query results can also be stored as data. The CREATE TABLE statement creates a table with the results of a query. The CREATE VIEW statement stores the query itself as a view. Either way, the data identified in the query can beused in later SQL statements or in other SAS steps. Produce a new dataset (table) ACTION in work directory, no printing Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 10

Creating New Data - Create View PROC SQL; CREATE VIEW G_MOVIES AS SELECT title, category, rating FROM MFE.MOVIES WHERE rating = 'G' ORDER BY title; SELECT * FROM G_MOVIES; QUIT; First step-creating a view,no output is produced; then display the desired output results Use ; to separate two block of code inside of proc sql When a table is created, the query is executed and the resulting data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view. The CREATE VIEW statement creates an SQL view. It is the same as the CREATE TABLE statement except that the word VIEW replaces the word TABLE. A view can be used the same way a table is used. However, a view does not operate the same way as a table. When a table is created, the query is executed and the resulting data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 11

Join Tables (Merge datasets) - Cartesian Join PROC SQL; SELECT * FROM MFE.CUSTOMERS, MFE.MOVIES; QUIT; Terminology: Join (Merge) datasets (tables) No prior sorting required – one advantage over DATA MERGE Use comma (,) to separate two datasets in FROM Without WHERE, all possible combinations of rows from each tables is produced, all columns are included Turn on the HTML result option for better display: Tool/Options/Preferences…/Results/ check Create HTML/OK Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 12

Join Tables (Merge datasets) - Inner Join using WHERE PROC SQL; SELECT * FROM MFE.MOVIES, MFE.ACTORS WHERE MOVIES.title = ACTORS.title; QUIT; Use WHERE to specify connecting columns (title) table1.matchvar = table2.matchvar Produce rows that have same movie title The matching variable can be of different name different datasets Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 13

Join Tables (Merge datasets) - Inner Join using WHERE (Cont.) PROC SQL; SELECT M.title, M.rating, A.actor_leading FROM MFE.MOVIES M, MFE.ACTORS A WHERE MOVIES.title = ACTORS.title; QUIT; Short-cut for table names Can be used in SELECT and WHERE statements Need to be declared in FROM statement Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 14

Join Tables (Merge datasets) - Join three tables PROC SQL; SELECT C.cust_no, M.title,M.rating, M.category, A.actor_leading FROM MFE.CUSTOMERS C, MFE.MOVIES2 M, MFE.ACTORS A WHERE C.cust_no = M.cust_no AND M.title = A.title; QUIT; Use AND in WHERE statement to specify two matching conditions Produce rows that satisfies all the conditions Note: We use MOVIES2 in this example Can join up to 32 tables in one SQL code Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 15

Join Tables (Merge datasets) - Inner Joins using ON PROC SQL; SELECT M.title, rating,actor_leading FROM MFE.MOVIES M INNER JOIN MFE.ACTORS A ON M.TITLE = A.TITLE; QUIT; Same result as using where WHERE is used to select rows from inner joins ON is used to select rows from outer or inner Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 16

Join Tables (Merge datasets) - Left Outer Joins PROC SQL; SELECT MOVIES.title, actor_leading, rating FROM MFE.MOVIES LEFT JOIN MFE.ACTORS ON MOVIES.title = ACTORS.title; QUIT; Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from LEFT table (MOVIES) that did not match any row in the right (ACTORS) table. Essentially the rows from LEFT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists. Need to specify a table name for the matching variable in SELECT Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 17

Join Tables (Merge datasets) - Right Outer Joins PROC SQL; SELECT ACTORS.title, actor_leading, rating FROM MFE.MOVIES RIGHT JOIN MFE.ACTORS ON MOVIES.title = ACTORS.title; QUIT; Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from RIGHT table (ACTORS) that did not match any row in the right (MOVIES) table. Essentially the rows from RIGHT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 18

Join Tables (Concatenating) - Outer Union PROC SQL; SELECT * FROM MFE.CUSTOMERS OUTER UNION SELECT * FROM MFE.MOVIES; QUIT; SQL performs OUTER UNION, similar to DATA steps with a SET statement to Concatenate datasets. The result contains all the rows produced by the first table-expression followed by all the row produced by the second table-expression. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 19

Transforming Data - Creating new Variables /*Creating new variables*/ PROC SQL; SELECT title, length, category, year, rating, 2006-year AS age FROM MFE.MOVIES; QUIT; You can create new variables within SELECT statement, the name of new variable follows after AS. Note the order of the express is reversed Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 20

Transforming Data - Summarizing Data using SQL functions PROC SQL; SELECT *, COUNT(title) AS notitle, MAX(year) AS most_recent, MIN(year) AS earliest, SUM(length) AS total_length, NMISS(rating) AS nomissing FROM MFE.MOVIES GROUP BY rating; QUIT; Simple summarization functions available All function can be operated in GROUPs Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 21

Editing Data – Insert observations. PROC SQL NOPRINT; INSERT INTO MFE.CUSTOMERS VALUES(1 'Peng'); SET Cust_no=2,Name='Sasha'; QUIT; There are two ways of inserting observations into a table. Data type should be the same. VALUES( ) new values are separated by space. SET column name = newly assigned values, delimited by commas. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 22

Editing Data – Deleting rows and Dropping columns /*Deleting rows*/ PROC SQL; DELETE FROM MFE.MOVIES WHERE length LE 100; QUIT; /*Droping variables*/ PROC SQL; CREATE TABLE NEW (DROP=rating) AS SELECT * FROM MFE.MOVIES; QUIT; Deleting columns can be done in SELECT or in DROP on created table Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 23

Editing Data – Update observations /*Updating Observation*/ PROC SQL NOPRINT; UPDATE MFE.CUSTOMERS SET Name='Liu' WHERE Cust_no=1; QUIT; UPDATE … SET… WHERE Find the observation and set new value If more than one observations satisfies the condition, all are updated with the new data in SET statement Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 24