Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tamara Arenovich Tony Panzarella

Similar presentations


Presentation on theme: "Tamara Arenovich Tony Panzarella"— Presentation transcript:

1 Tamara Arenovich Tony Panzarella
Introduction to SAS Tamara Arenovich Tony Panzarella

2 I. OBJECTIVES This session is intended to introduce you to SAS – what it is, how it works, and how you will use it. The focus of this session is on the SAS programming essentials needed to help get you started on your SAS session. We will also cover some basic descriptive statistics. II. WHAT IS SAS? The acronym SAS stands for Statistical Analysis System. Simply put, it is a software program that allows you to analyze lots of data quite rapidly. It works by having you tell it what to do through a sequence of steps (or commands). Through this sequence of steps, there are four major tasks (in general) that are often performed: Data access Data management Data analysis Data presentation

3 Enhanced Editor III. EXPLORING THE SAS ENVIRONMENT (IN WINDOWS)
[I] Syntax Rules: SAS programs must be written following syntax rules Beginning with a Keyword Ending with a semicolon SAS statements are not case-sensitive, except inside quotation marks sex = 'm' is the same as SEX = 'm'; sex = 'm' is not the same as sex = 'M‘;

4 [II] Steps in a SAS Program
There are two types of steps in a SAS program: DATA steps PROC steps A SAS program can contain any number of DATA and PROC steps Examples:

5 [III] Running (Submitting) SAS Programs
Running your SAS program: Until you’re sure your SAS program is completely error-free, submitting only sections of your SAS code at a time is preferred

6 B. SAS Log After submitting a SAS program, the SAS log contains information about the processing of the SAS program, including any error or warning messages In the SAS log, you should always see The SAS statements NOTES You might also see WARNINGS ERRORS Note: Always read your SAS log after running a SAS program.

7 C. SAS Results Viewer / Output
As a general rule, PROC steps generate output, DATA steps do not. The results of your PROC steps can be viewed in the Results Viewer window Earlier versions of SAS (i.e. 9.1, 9.2) print results to the Output window by default. ODS commands are required to make these files look a bit nicer and generate graphics. Your practicum sites may be working with earlier versions of SAS

8

9 D. SAS Files All of your SAS programs, SAS datasets, SAS log, and SAS output can be saved: Type of File File Extension SAS program .sas SAS log .log SAS output .mht (default) .lst (earlier versions) SAS dataset .sas7bdat

10 E. Results Window F. Explorer Window
Lists all the reports that appear in the Output window. You can also use the Results window to jump through your output for easy navigation F. Explorer Window The Explorer Window allows you to browse SAS libraries and SAS datasets.

11 IV. THE FOUR MAJOR TASKS 1. Data access Data management Data analysis
Data presentation

12 Task 1: Data Access (Reading your data into SAS)
[I] SAS Libraries All SAS files follow a 2-part naming system: libref.fileref A libref is a reference to a directory on your computer or a connection to a physical location on your computer To define the libref component, we use a ‘libname’ statement (code) OR the New Library window (point & click) Example: libname lunl7 'C:\Users\projects\Data';

13 No record of this library reference in your program or log file
SAS will remember this library designation across sessions

14 Permanent SAS Libraries:
These libraries that are created by you are known as permanent SAS Libraries You may create as many permanent SAS libraries as you wish Rules For Naming your Permanent Library (the LIBREF): Must be 1 to 8 characters Must begin with a letter or underscore. Temporary SAS Libraries: Known as the WORK library. If no libref is specified, WORK is assumed

15 [II] Reading a SAS Dataset into SAS
Assign a library reference that refers to the directory on your computer where the SAS dataset is saved. No cards statement required, the data is already in SAS format – set command used instead

16 [III] Import/Export Wizard
The Import/Export wizard guides you through the importing or exporting process You can import your data from a variety of data sources (e.g. Excel, Access, SPSS, Stata), but make sure your data is structured appropriately prior to importing

17

18 Data access 2. Data management Data analysis Data presentation

19 Task 2: Data Management In Task 2, you are modifying the current SAS dataset and turning it into a new SAS dataset that is appropriate for analysis. All of this cleaning is performed in the DATA step. [I] Naming a SAS Dataset All SAS datasets have two-level names: libref.fileref. Fileref can be 32 characters long Not case-sensitive Must begin with a letter or underscore. Subsequent characters can be letters, underscores or numbers. Special characters (e.g., #) are not used.

20 Examples of valid SAS dataset names:
baseline baseline1 _baseline Examples of non-valid SAS dataset names: base line (cannot have spaces) baseline#1 (# is not a valid character) 1baseline (cannot begin with a number)

21 [II] Viewing Contents of SAS Dataset
Use the PROC CONTENTS procedure to view descriptive information about the contents of your dataset Use the PROC PRINT procedure to view the actual data Use the VAR statement to specify the variables to be displayed. Use the WHERE statement to specify the observations to be displayed

22 [IV] Types of SAS Variables
Character Letters, numbers, special characters and blanks Length 1 to 32,767 bytes [default length is 8 characters] Creating new variables, the length statement precedes the SET statement Numeric variables 8 bytes of storage by default Provides space for 16 to 17 significant digits SEX as a character variable might be coded as: a). sex = '1' or sex = '2' b). sex = 'Male' or sex = 'Female‘ SEX as a numeric variable might be coded as: a). sex = 1 or sex = 2 The way variable values are stored affects what you can do with the variables.

23 [V] Creating New SAS Datasets
Within the DATA step, use the DATA and SET statements to create a new SAS dataset. In the DATA statement, specify the new SAS dataset that you are about to create. In the SET statement, specify the SAS dataset that you are reading from. Example: data yoga.females; set yoga.data1; <insert data management & cleaning statements here>; if sex=1; run;

24 [VI] Common Data Management Activities Performed in the DATA Step
Keep or Remove Observations Use the WHERE statement or the IF statement.

25 Comparison Operators that Can be Used with a WHERE or IF statement:
Definition Mnemonic Symbol Equal to EQ = Not equal to NE ^= ~= Greater than GT > Less than LT < Greater than or equal to GE >= Less than or equal to LE <= Equal to one of a list IN In ()

26 Logical Operators that Can Be Used with a WHERE or IF statement:
Definition Mnemonic Symbol If both expressions are true AND & If either expression is true OR | To reverse logic of a comparison NOT ^ So, any of the WHERE statements below could have been used to restrict the dataset to baseline observations only: where sex = 1; where sex eq 1; where sex ^in (0); where sex not in (0); The following WHERE statement could be used to restrict your dataset to female participants age 65 and older only: where sex = 1 AND age GE 65;

27 Keep or Drop Variables There are three different ways to do this, and all three methods are done in the DATA step. Method 1: Use the KEEP (or DROP) statement in the DATA step. Method 2: Use the KEEP = (or DROP =) dataset option in the DATA statement. Method 3: Use the KEEP = (or DROP =) dataset option in the SET statement.

28 Method 1: Use the KEEP (or DROP) statement in the DATA step.

29 Method 2: Use the KEEP= (or DROP=) dataset option in the DATA statement.

30 Method 3: Use the KEEP= (or DROP=) dataset option in the SET statement.

31 Creating a new variable
There are many ways to create new variables. I will show you two ways here: Using equations Using conditional (if-then-else) logic.

32 Renaming a variable: Use the RENAME statement in the DATA step.

33 Create descriptive labels for variable names:
Use the LABEL statement in the DATA step.

34 Format the Values of a Variable
Create and apply formats when you wish to change the appearance of variable values. Creating and applying user-defined formats involves two steps. First, you must create the formats using the PROC FORMAT step. Then, you must apply the format in the DATA step.

35 Merging Files The MERGE statement can be used to combine two or more SAS datasets Ensure unique identifiers are present and that files are sorted by the unique identifiers One-one and one-many merges are ok, many-one and many-many DO NOT WORK!!!

36 Summary – in the Data Step
Keep or remove observations: Use the WHERE statement Keep or drop variables: Use keep statement in Data Step Use keep= dataset option in the Data statement Use keep= dataset option in the Set statement Create new variable: Using equations Using conditional (if-then-else) logic Renaming a variable: Use the RENAME statement Create descriptive labels: Use the LABEL statement Format the values of a variable: Using the PROC FORMAT step Applying the format in the DATA step

37 Data access Data management 3. Data analysis Data presentation

38 Task 3: Data Analysis Two very common SAS procedures: PROC FREQ and PROC MEANS.

39 [I] PROC FREQ To produce frequency counts and cross tabular frequency tables. Can be used with either numerical or character variables. In the PROC FREQ statement, specify the name of the SAS dataset you wish to analyze. In the TABLES statement, list the variables you want frequencies of. For a cross tabular frequency table, use an asterisk (*) symbol in the TABLES statement to cross variables.

40 [II] PROC MEANS To display simple descriptive statistics for variables in a SAS dataset. Numerical variables only. In the PROC MEANS statement, specify the name of the SAS dataset to be analyzed. In the VAR statement, list the variables to be analyzed. An optional CLASS statement may be used.

41 [III] Useful Statements That Can Be Used in Most PROC steps
BY Statement This statement may be used with the PROC FREQ procedure. It allows you to perform subgroup analysis, working similarly to the CLASS statement in the PROC MEANS procedure. Before using the BY statement in any procedure, you must first sort your data on the BY variable.

42

43 WHERE Statement This statement may be used with both the PROC FREQ or PROC MEANS procedure (and others!)

44 FORMAT Statement This statement may be used in the PROC FREQ procedure on the analysis variable, or it may be used in the PROC MEANS procedure on the class variable. Use this statement if you did not assign the format of interest in your DATA step, but wish to assign it for a specific procedure only.

45 Label Statement This statement may be used in both the PROC FREQ and PROC MEANS procedure. In the PROC MEANS procedure, it may be used with both the analysis variable and the class variable. Use this statement if you did not assign the descriptive label of interest in your DATA step, but wish to assign it for a specific procedure only.

46 SUMMARY – in the PROC Steps
By Statement: To be used with the PROC FREQ procedure WHERE Statement: To be used with both the PROC FREQ or PROC MEANS procedures FORMAT Statement: To be used with the PROC FREQ procedure on the analysis variable To be used with the PROC MEANS procedure on the class variable LABEL Statement:

47 Data access Data management Data analysis 4. Data presentation

48 Task 4: Data Presentation
Here will show you three methods: 1). SAS System Options; 2). Adding titles and/or footnotes; 3). Saving your output as a PDF or EXCEL file.

49 [I] SAS System Options You may use the OPTIONS statement to change SAS system options.

50 Commonly used options:
Description linesize = n Specifies the line size (printer line width) for the SAS log and the SAS output files pagesize = n Specifies the number of lines that can be printed per page of SAS output nonumber Suppresses the printing of page numbers (by default, page numbers are printed) nodate Suppresses the printing of today’s date (by default, the date is printed) errors = n Specifies the maximum number of observations with error messages

51

52 [II] Titles and/or Footnotes
Add titles and/or footnotes to your SAS output by using the TITLE and/or FOOTNOTE statement, respectively, in any PROC step.

53 [III] Create PDF Reports
Use ODS (Output Delivery System) statements to write your output to a PDF, HTML, or RTF file. Specifically, you need to write two statements: The ODS PDF FILE = statement specifies the destination of the new PDF file you are about to create. Note that in this statement you must also give the PDF file a name. The ODS PDF CLOSE statement closes the PDF destination.

54 When working in older versions of SAS, the ODS graphics on statement may generate additional results…(e.g. residual plots in PROC MIXED)

55 ODS can be extremely useful when you need to save some part of your output directly to a SAS file (e.g. simulations…)

56 [V] SAS Online Documentation
Access the SAS Online Documentation by clicking the last icon in the toolbar: There is a lot of information here! It is particularly useful to you if you know the procedure or statement you want to use and would like to get the syntax for it. Two useful chapters in the Online Documentation are: Procedures: Select Contents -> SAS Products -> Base SAS -> Procedures SAS Stat: Select Contents -> SAS Products -> SAS/STAT -> SAS/STAT User’s Guide

57 Contact information: Tamara Arenovich Manager, Biostatistical Consulting Service Centre for Addiction and Mental Health Tel: ext

58 Acknowledgments We thank Ms. Thi Ho & Ms. Anthea Lau for preparation of this material and this power point file.

59 - THE END -


Download ppt "Tamara Arenovich Tony Panzarella"

Similar presentations


Ads by Google