Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS Workshop SAS Data Management Hun Myoung Park, Ph.D.

Similar presentations


Presentation on theme: "SAS Workshop SAS Data Management Hun Myoung Park, Ph.D."— Presentation transcript:

1 SAS Workshop SAS Data Management Hun Myoung Park, Ph.D.
University Information Technology Services Center for Statistical and Mathematical Computing Saturday, September 22, 2018 © The Trustees of Indiana University (812) , (317)

2 Outline INPUT Statement: Input Styles INFILE Statement
SAS Data Management September 22, 2018 Outline INPUT Statement: Input Styles INFILE Statement PROC IMPORT/EXPORT Data Manipulation IF…ELSE & DO…END Clauses Recode Variables LABEL and RENAME Variables Select Observations & Variables Working with Data Sets University Information Technology Services Center for Statistical and Mathematical Computing

3 SAS DATA SOURCES University Information Technology Services
SAS Data Management September 22, 2018 SAS DATA SOURCES University Information Technology Services Center for Statistical and Mathematical Computing

4 DATA STEP DATA steps read and manipulate data.
SAS Data Management September 22, 2018 DATA STEP DATA steps read and manipulate data. INPUT tells how SAS reads data. DATALINES indicates data items to be read begin from the next line. INFILE reads data from an external file. LIBNAME sm ‘c:\temp\sas’; DATA sm.grade0; INPUT name $ id stat math; DATALINES; John RUN; University Information Technology Services Center for Statistical and Mathematical Computing

5 INPUT STATEMENT There are six input styles.
SAS Data Management September 22, 2018 INPUT STATEMENT There are six input styles. List INPUT: simply list variable names Modified list INPUT: for ill-organized data Column INPUT: location of data items Formatted INPUT: format of data items Named INPUT: variable=data Mixed INPUT: combine input styles SAS recognizes styles automatically. List (previous slide), column, and formatted INPUT are commonly used. University Information Technology Services Center for Statistical and Mathematical Computing

6 COLUMN INPUT Specify column locations of data items
SAS Data Management September 22, 2018 COLUMN INPUT Specify column locations of data items $ follows a character variable name Data should be aligned correctly Some data may be skipped (id below) DATA sm.grade1; INPUT name $ 1-6 stat math 15-17; DATALINES; John Rachel RUN; University Information Technology Services Center for Statistical and Mathematical Computing

7 FORMATTED INPUT Specify formats of data items.
SAS Data Management September 22, 2018 FORMATTED INPUT Specify formats of data items. Data should be aligned correctly. $6. means a string 6 characters long. 3.0 or 3. means a 3 digit number without a decimal point. DATA sm.grade2; INPUT name $6. id 5. male 3. stat 3.0; DATALINES; John Rachel University Information Technology Services Center for Statistical and Mathematical Computing

8 COLUMN CONTROLS @n moves the input pointer to the nth column.
SAS Data Management September 22, 2018 COLUMN CONTROLS @n moves the input pointer to the nth column. +n shifts the input pointer by n column to the right. You may skip some variables (e.g., id) INPUT name $6. +5 stat math 3.; DATALINES; John Rachel University Information Technology Services Center for Statistical and Mathematical Computing

9 LINE CONTROLS @ and @@ trailing hold input records
SAS Data Management September 22, 2018 LINE CONTROLS @ and trailing hold input records INPUT name $ stat math DATALINES; John Ron #n moves the input pointer to nth line, / moves the input pointer to the next line INPUT name $6. #2 stat 3. / math 3.; John 89 95 University Information Technology Services Center for Statistical and Mathematical Computing

10 SAS Data Management September 22, 2018 USING INFORMAT INFORMAT defines a format of data entry and FORMAT for data output COMMA and PERCENT INPUT income COMMA9. saving PERCENT3.; DATALINES; 170,200 89% 92,540 95% MMDDYY and DOLLAR INPUT dates MMDDYY10. sales DOLLAR12.0; 10/01/ $343,750 10/05/2009 $9,784,650 University Information Technology Services Center for Statistical and Mathematical Computing

11 SAS Data Management September 22, 2018 INFILE STATEMENT 1 INFILE specifies an external data file that is created in advance. INFILE ‘c:\temp\sas\grade2.txt’; INPUT name $6. id 5. stat math (3.); If the data file contains missing values, add the MISSOVER option to INFILE. The MISSOVER option treats all remaining variables without values as missing values. INFILE ‘c:\temp\sas\grade2.txt’ MISSOVER; University Information Technology Services Center for Statistical and Mathematical Computing

12 SAS Data Management September 22, 2018 INFILE STATEMENT 2 Alternatively, use FILENAME to make an alias of an external file. Like LIBNAME, FILENAME makes it easy to referring a file name especially when the name and path are long. FILENAME gFile ‘c:\temp\grade2.txt’; DATA sm.grade2; INFILE gFile MISSOVER; INPUT name $6. id 5. stat math (3.); RUN; University Information Technology Services Center for Statistical and Mathematical Computing

13 SAS Data Management September 22, 2018 INFILE STATEMENT 3 DLM specifies a delimiter. More than one delimiter can be used. e.g., *iu* DSD (Delimiter Sensitive Data) assumes a comma delimiter and is needed for missing values INFILE DATALINES DLM=’,’ DSD; INPUT name $ id stat math; DATALINES; John,10091,89,95 Rachel,10100,,99 University Information Technology Services Center for Statistical and Mathematical Computing

14 SAS Data Management September 22, 2018 READ ODD STRINGS What if a string variable is long and/or contain commas? Use ampersand (&) INFILE DATALINES DLM=‘,’ DSD; INPUT name : $20. title & $100.; DATALINES; Lindblom,”Still Muddling, Not Yet Through” Park, “Reading “”Small is Beautiful””” How to read quotation marks in a string? Use tilde(~) INPUT name : $20 title ~ $100.; University Information Technology Services Center for Statistical and Mathematical Computing

15 SAS Data Management September 22, 2018 READ THROUGH NETWORK SAS can read external data files through TCP/IP (http and ftp) FILENAME masil URL ' DATA sm.grade2; INFILE masil FIRSTOBS=1; INPUT name $6. id 5. stat math (3.); URL denotes HTTP (protocol) FIRSTOBS=1 tells SAS to read observations from the 1st row University Information Technology Services Center for Statistical and Mathematical Computing

16 VIEW A DATA SET PROC CONTENTS displays data structures of a data set.
SAS Data Management September 22, 2018 VIEW A DATA SET PROC CONTENTS displays data structures of a data set. PROC CONTENTS DATA=sm.grade2; RUN; PROC MEANS displays summary statistics of a data set. PROC MEANS DATA=sm.grade2; RUN; PROC PRINT displays values of variables in a data set. PROC PRINT DATA=sm.grade2; VAR stat math; University Information Technology Services Center for Statistical and Mathematical Computing

17 SAS Data Management September 22, 2018 PROC IMPORT 1 PROC IMPORT reads external files such as CSV, Excel, dBase, and Access. FILENAME airline URL ' PROC IMPORT DATAFILE=‘airline’ OUT=sm.airline DBMS=CSV; GETNAMES=YES; DATAROW=2; DBMS specifies the format of the file. IMPORT Wizard is more user-friendly in MS- Windows. Click File->Import Data. University Information Technology Services Center for Statistical and Mathematical Computing

18 PROC IMPORT 2 An example of reading an Excel file.
SAS Data Management September 22, 2018 PROC IMPORT 2 An example of reading an Excel file. PROC IMPORT DATAFILE="c:\airline.xls“ OUT=sm.airline2 DBMS=EXCEL2000; SHEET=“sheet1"; GETNAMES=YES; SHEET specifies a worksheet with a range of data to be read in an Excel file. SHEET=“sheet1$A1:G91"; GETNAMES=YES reads variable names from the file. CAUTION: Excel 2007 file should be saved as 2003 format for SAS 9.13. University Information Technology Services Center for Statistical and Mathematical Computing

19 SAS Data Management September 22, 2018 PROC EXPORT Export to a CSV file. DATA specifies a data set to be exported. PROC EXPORT DATA=sm.airline OUTFILE="c:\temp\airline.csv“ DBMS=CSV REPLACE; Export to an Excel file. REPLACE overwrites the existing data set. DATA=sm.ariline OUTFILE="c:\temp\airline2.xls“ DBMS=EXCEL2000 REPLACE; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

20 DATA MANIPULATION Data sets are manipulated in DATA steps
SAS Data Management September 22, 2018 DATA MANIPULATION Data sets are manipulated in DATA steps SET statement loads existing SAS data sets DATA sm.grade7; SET sm.grade2; sum1 = stat+math; sum2 = SUM(stat, math); mean1 = (stat+math)/2; mean2 = MEAN(stat, math); RUN; The result is stored into grade7 without changing anything in grade2 University Information Technology Services Center for Statistical and Mathematical Computing

21 IF…THEN CLAUSE IF… THEN IF… THEN… ELSE IF… THEN… ELSE IF…
SAS Data Management September 22, 2018 IF…THEN CLAUSE IF… THEN IF mean1 GT 90 THEN DELETE; IF sum1 >= 98 THEN grade=‘A+’; IF (stat>90 AND math>90) THEN grade=‘A’; IF… THEN… ELSE IF mean2 >= 98 THEN grade=‘A+’; ELSE grade=‘A’; IF… THEN… ELSE IF… ELSE IF mean2>=92 THEN grade=‘A’; ELSE IF mean2>=88 THEN grade=‘A-’; ELSE grade=‘B’; University Information Technology Services Center for Statistical and Mathematical Computing

22 DO LOOP DO…TO… END DO num= 4, 6, 9, 10; …; END;
SAS Data Management September 22, 2018 DO LOOP DO…TO… END DO i=1 TO 100 BY 1; random=RANNOR( ); OUTPUT sm.randomtest; END; DO num= 4, 6, 9, 10; …; END; DO str=‘Mon’, ‘Tue’, ‘Wed’; …; END; DO WHILE (n<=10); …; END; DO OVER for an array variable DO OVER arr; …; END; University Information Technology Services Center for Statistical and Mathematical Computing

23 SAS Data Management September 22, 2018 IF & DO… END This combination executes a list of statements in a block. Imagine { and } in C and most computer languages. IF mean2 GT 90 THEN DO; …; END; ELSE DO; University Information Technology Services Center for Statistical and Mathematical Computing

24 SAS Data Management September 22, 2018 INPUT & DO… END This combination enables you to read experiment data in an easy manner. INPUT drug $ DO level=‘High’, ‘Medium’, ‘Low’; INPUT symptom OUTPUT; END; DATALINES; Drug Drug Drug University Information Technology Services Center for Statistical and Mathematical Computing

25 RECODE VARIABLES 1 Use operators and functions
SAS Data Management September 22, 2018 RECODE VARIABLES 1 Use operators and functions DATA sm.grade8; SET sm.grade2; random=RANNOR( ); id=_N_; /* serial observation number */ IF… THEN and Special Usage IF age <= 20 THEN age2=0; ELSE IF age<=30 THEN age2=1; ELSE IF age<=40 THEN age2=2; ELSE age2=3; Age2 = (age>20)+(age>30)+(age>40); University Information Technology Services Center for Statistical and Mathematical Computing

26 SAS Data Management September 22, 2018 RECODE VARIABLES 2 ARRAY links between variables and corresponding arrays (temporary vars.) Suppose trust has 1 through 5 and needs reverse recoding. ARRAY arr trust; DO OVER arr; arr=6-arr; END; ARRAY arr(5) q1-q5; DO i=1 TO 5; arr(i)=6-arr(i); University Information Technology Services Center for Statistical and Mathematical Computing

27 SAS Data Management September 22, 2018 MULTIPLE RESPONSE Suppose respondents are asked to choose three items and their choices are stored in the wide form from c1, c2, c3, respectively. For frequency analysis, data need to be rearranged in to the long form using OUTPUT. DATA sm.patient2; SET sm.patient; choice=c1; OUTPUT; choice=c2; OUTPUT; choice=c3; OUTPUT; RUN; The number of observations becomes N*3 . University Information Technology Services Center for Statistical and Mathematical Computing

28 RENAME VARIABLES Use the DATA statement in DATA Step.
SAS Data Management September 22, 2018 RENAME VARIABLES Use the DATA statement in DATA Step. “old variable= new variable” DATA sm.grade9( RENAME=(sum1=total1) ); SET sm.grade7; RUN; Use the RENAME statement in a DATA step. “old variable= new variable” DATA sm.grade10; RENAME sum2=total2 mean2=average2; University Information Technology Services Center for Statistical and Mathematical Computing

29 LABEL VARIABLES Variable labels
SAS Data Management September 22, 2018 LABEL VARIABLES Variable labels SET sm.grade2; LABEL stat=‘Statistics’ math=‘Math’; Value labels using PROC FORMAT. PROC FORMAT; VALUE yes_L 1=‘Yes’ 0=‘No’; VALUE $male_L ‘M’=‘Male’ ‘F’=‘Female’; A period in label name ($male_L.) may not be omitted in a FORMAT statement. DATA sm.survey2; SET sm.survey1; FORMAT trust yes_L. male $male_L.; University Information Technology Services Center for Statistical and Mathematical Computing

30 SELECT OBSERVATIONS Use the IF clause
SAS Data Management September 22, 2018 SELECT OBSERVATIONS Use the IF clause Keep only obs that meet the condition IF mean2 >= 90; IF (stat GE 80 OR math GE 80); Delete obs that meet the condition IF mean2 LT 80 THEN DELETE; IF (stat<80 AND math <80) THEN DELETE; WHERE in DATA/PROC uses a subset of a data set w/o changing the data set WHERE mean2 GE 90; University Information Technology Services Center for Statistical and Mathematical Computing

31 SAS Data Management September 22, 2018 SELECT VARIABLES Use either KEEP or DROP statement; You may not use both in a DATA step. DATA sm.grade15; SET sm.grade7; DROP sum1 mean1; RUN; In a DATA statement. Do not omit =. DATA sm.grade16 (KEEP=sum2 mean2); SET grade7; DATA sm.grade17 (DROP=sum1 mean1); University Information Technology Services Center for Statistical and Mathematical Computing

32 APPEND DATA SETS Use SET with a list of data sets to be appended.
SAS Data Management September 22, 2018 APPEND DATA SETS Use SET with a list of data sets to be appended. DATA sm.grade20; SET sm.grade12 sm.grade13; You may use PROC APPEND or PROC DATASETS when the master data set is huge. PROC APPEND BASE=sm.grade12 DATA=sm.grade13; RUN; BASE= specifies a master data set and DATA= a data set to be appended. University Information Technology Services Center for Statistical and Mathematical Computing

33 SPLIT DATA SETS Use DATA statement. Do not omit = after DROP and KEEP.
SAS Data Management September 22, 2018 SPLIT DATA SETS Use DATA statement. Do not omit = after DROP and KEEP. DATA sm.grade21 (DROP=math) sm.grade22(DROP=stat); SET sm.grade2; Use OUTPUT statement in IF… THEN… ELSE clause. DATA … SET sm.grade7; IF mean2 GT 90 THEN OUTPUT sm.gradeHigh; ELSE OUTPUT sm.gradeLow; University Information Technology Services Center for Statistical and Mathematical Computing

34 SAS Data Management September 22, 2018 SORT DATA SETS Sort data sets before using BY in a DATA/PROC step (e.g., match merge) PROC SORT DATA=sm.grade21 OUT=grade23; BY id; PROC SORT by default sorts in a ascending order. For a descending order, add DESCENDING before the variable name PROC SORT DATA=sm.grade22 OUT=sm.grade24; BY id DESCENDING mean2; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

35 MERGE DATA SETS Simple merge using MERGE.
SAS Data Management September 22, 2018 MERGE DATA SETS Simple merge using MERGE. DATA sm.grade25; MERGE sm.grade21 sm.grade24; Match merge using MERGE and BY. DATA sm.grade26; MERGE sm.grade23 sm.grade24; BY id; Be careful when data structure is different across data sets to be merged. Can be used for updating data sets. University Information Technology Services Center for Statistical and Mathematical Computing

36 SAS Data Management September 22, 2018 UPDATE DATA SETS UPDATE updates a data set using a transaction data set. Data sets need to be sorted in advance. DATA sm.surveyNew; UPDATE sm.survey08 sm.survey10 UPDATEMODE=MISSINGCHECK; BY id; MISSINGCHECK prevents missing in the transaction file from replacing values in a master file. University Information Technology Services Center for Statistical and Mathematical Computing

37 SAS Data Management September 22, 2018 GET AGGREGATE DATA PROC MEANS generates aggregate statistics into a data set. PROC MEANS DATA=sm.airline; VAR cost output; CLASS airline; OUTPUT OUT=sm.air_mean1 MEAN=costM outputM; Variables costM and outputM are stored into a new data set air_mean1. You may specify other statistics as well. PROC MEANS DATA=sm.airline N SUM MEAN VAR; VAR cost output fuel load; OUTPUT OUT=sm.air_mean2; University Information Technology Services Center for Statistical and Mathematical Computing

38 PROC DATASETS Display data sets in the library of interest.
SAS Data Management September 22, 2018 PROC DATASETS Display data sets in the library of interest. PROC DATASETS LIBRARY=sm; RUN; The procedure can also manage (e.g., copy, rename, repair, delete) SAS files. University Information Technology Services Center for Statistical and Mathematical Computing

39 SAS Data Management September 22, 2018 RESOURCES Burlew, Michele M Reading External Data Files Using SAS: Examples Handbook. Cary, NC: SAS Institute. Long, J. Scott The Workflow of Data Analysis Using Stata. College Station, TX: Stata Press. University Information Technology Services Center for Statistical and Mathematical Computing


Download ppt "SAS Workshop SAS Data Management Hun Myoung Park, Ph.D."

Similar presentations


Ads by Google