Download presentation
Presentation is loading. Please wait.
Published byAnnabelle Lane Modified over 9 years ago
1
Statistical Analysis System SAS
2
(Statistical Analysis System) It was developed by James Good knight. 1970-It was package 1980-Language 1990-Software
3
SAS Technical Techno Functional Functional | | | SAS/BASE SAS/Ware house admin SAS/Stat SAS/MACRO’S SAS/ETL Studio SAS/Graph SAS/ACCESS SAS/OLAP SAS/OR SAS/AF
4
In which domains SAS can be used: CLINICAL BANKING INSURANCE
5
INTRODUCTION TO THE SAS SYSTEM : SAS is an integrated system of software solutions that enables you to perform the following tasks: data entry, retrieval, and management report writing and graphics design statistical and mathematical analysis business forecasting and decision support applications development
6
Base SAS software provides you with essential tools for the basic data-driven tasks that you commonly perform as a programmer:
7
Accessing Data: you can access data that is stored almost anywhere, whether it is in a file on your system, or data that is stored another database system. In almost any format, including raw data, SAS data sets, and files created by other vendors' software.
8
Managing Data : After you have accessed your data, you can use the SAS programming language to manipulate it. Format your data,create variables (columns), use operators to evaluate data values,use functions to create and recode data values, subset data, perform conditional processing, merge a wide range of data sources, create, retrieve, and update database information.
9
Analyzing Data and Presenting Information Once your data is in shape, you can use SAS to analyze data and produce reports. Your SAS output can range from a simple listing of a data set to customized reports of complex relationships.
10
Analysis: Base SAS provides powerful data analysis tools. For example, you can produce tables, frequency counts, and cross-tabulation tables create a variety of charts and plots compute a variety of descriptive statistics, including the mean, sum, variance, standard deviation and more compute correlations and other measures of association, as well as multi-way cross-tabulations and inferential statistics.
11
Presentation For reporting and displaying analytical results, SAS gives you an almost limitless number of visually appealing output formats, such as an array of markup languages including HTML4 and XML output that is formatted for a high-resolution printer, such PostScript, PDF, and PCL files RTF color graphs that you can make interactive using ActiveX controls or Java applets.
12
SAS WINDOW ENVIRONMENT Five windows in SAS 1.Editor window 2.Output window. 3.Log window 4.Result window 5.Explorer window 1.Editor window : Editor window contains the list of programs which has an extension of.SAS We can type any no of programs in editor window We can execute all programs at a time or individually
13
2.OUT PUT window Results of program will be displayed in output window which has an extension of.LIST 3.LOG WINDOW Suppose if there are any errors or warnings in the program those messages will be displayed in log window It displays the licensed agreement of SAS Version no of variables, no of observations 4.Result window: It displays result of all the programs in editor window No extension for result window 5.Explorer window Contains Libraries and Mycomputer
14
SAS LANGUAGE : The SAS language consists of statements, expressions, options, formats, and functions similar to those of many other programming languages. In SAS, you use these elements within one of two groups of SAS statements: DATA steps PROC steps
15
DATA STEP: A DATA step consists of a group of statements in the SAS language that can read data from external files write data to external files read SAS data sets and data views create SAS data sets and data views. Create multiple SAS data sets in one DATA STEP. Combine existing data sets Creating accumulating totals Manipulate numeric and character values
16
Syntax: DATA INPUT …. ; CARDS; Data values ; RUN;
17
EG: data temp; input name $ no; datalines; hari 102 ravi 104 ganesh 105 kiran 109 ; run;
18
SAS DATA SETS: A SAS data set consists of the following: -descriptor information -data values. The descriptor information describes the contents of the SAS data set to SAS. The data values are data that has been collected or calculated. They are organized into rows, called observations, and columns, called variables. An observation is a collection of data values that usually relate to a single object. A variable is the set of data values that describe a given characteristic.
19
SAS VARIABLES AND OBSERVATIONS The below figure shows a SAS data set. The data describes participants in a 16-week weight program at a health and fitness club. The data for each participant includes an identification number, name, team name, and weight at the beginning and end of the program.
20
PROC STEP: Once your data is accessible as a SAS data set, you can analyze the data and write reports by using a set of tools known as SAS procedures. A group of procedure statements is called a PROC step. SAS procedures analyze data in SAS data sets to produce statistics, tables, reports, charts, and plots, to create SQL queries, and to perform other analyses and operations on your data. They also provide ways to manage and print SAS files.
21
PROCEDURE STEP BLOCK Syntax: Proc ; Statement 1 Statement 2. Statement n ; Run;
22
EG: proc print data=temp; Run; proc sort data=temp out=samp; by name; run;
23
Data Types in SAS System 1)Numerical Data(0-9) 2)Character Data (A-Z) SAS System by default reads both numeric and character data as numeric only To read character data Use $ symbol. The no of variables in SAS System is up to 32767 characters SAS Reads Data values observation by observation The no of observations in SAS Data set depends on system configuration or hard disk space. In SAS data set for missing value in output it shows a period(.) for missing value and “blank space “ for character value.
24
Different Data bases : Best db storage: Text Excel Access DB2 Oracle Tera Data
25
LIBRARIES: There are 2 ways of creating libraries. 1.Menu driven 2.Programming coding 1.Menu driven Explorer | Right click | New
26
2.Through programming Editor window LIBNAME ; LIBNAME Hari “D:\Ganesh”; Example: To delete Library: Libname Guru clear;
27
RULES FOR SAS STATEMENTS: There are only a few rules for writing SAS statements: _ SAS statements end with a semicolon. _ You can enter SAS statements in lowercase, uppercase, or a mixture of the two. _ You can begin SAS statements in any column of a line and write several statements on the same line. _ You can begin a statement on one line and continue it on another line, but you cannot split a word between two lines.
28
RULES FOR MOST SAS NAMES: SAS names are used for SAS data set names, variable names, and other items. The following rules apply: _ A SAS name can contain from one to 32 characters. _ The first character must be a letter or an underscore (_). _ Subsequent characters must be letters, numbers, or underscores. _ Blanks cannot appear in SAS names.
29
DATA STEP PROCESSING: The DATA step is one of the basic building blocks of SAS programming. It creates the data sets that are used in a SAS program’s analysis and reporting procedures.
30
OVERVIEW OF THE DATA STEP: The DATA step consists of a group of SAS statements that begins with a DATA statement. The DATA statement begins the process of building a SAS data set and names the data set. The statements that make up the DATA step are compiled, and the syntax is checked. If the syntax is correct, then the statements are executed. In its simplest form, the DATA step is a loop with an automatic output and return action.
31
DURING THE COMPILE PHASE: When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. SAS further processes the code, and creates the following two items: INPUT BUFFER PROGRAM DATA VECTOR:
32
INPUT BUFFER: Input buffer is a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. PROGRAM DATA VECTOR (PDV): Is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation.
33
The PDV contain two automatic variables: 1). _N_ It gives information about variables and observations 2). _ERROR_ classified as 2 types: i)If _error_=0 means no error in program ii) _error_=1 means there are errors in program.
34
Creating the Input Buffer and the Program Data Vector: When DATA step statements are compiled, SAS determines whether to create an input buffer. If the input file contains raw data (as in the example above), SAS creates an input buffer to hold the data before moving the data to the program data vector (PDV).
35
data total_points (drop=TeamName); input TeamName $ ParticipantName $ Event1 Event2 Event3; TeamTotal + (Event1 + Event2 + Event3); datalines; Knights Sue 6 8 8 Cardinals Jane 9 7 8 Knights John 7 7 7 Knights Lisa 8 9 9 Knights Fran 7 6 6 Knights Walter 9 8 10 ; run;
36
The following figure shows the Input Buffer and the program data vector after DATA step compilation.
37
Position of the Pointer in the Input Buffer Before SAS Reads Data The INPUT statement then reads data values from the record in the input buffer and writes them to the PDV where they become variable values. The following figure shows both the position of the pointer in the input buffer, and the values in the PDV after SAS reads the first record.
39
Program Data Vector with Computed Value of the Sum Statement
40
Writing an Observation to the SAS Data Set The First Observation in Data Set TOTAL_POINTS Output SAS Data Set TOTAL_POINTS: 1st observation
41
SAS then returns to the DATA statement to begin the next iteration. SAS resets the values in the PDV in the following way: The values of variables created by the INPUT statement are set to missing. The value created by the Sum statement is automatically retained. The value of the automatic variable _N_ is incremented by 1, and the value of _ERROR_ is reset to 0.
42
Compilation: Checks code for errors Translate code to machine code Establishes an area of memory called input buffer if reading raw data Establishes an area of memory called the program Data Vector Assign required attributes to variables Creates the descriptor portion of the new data set.
43
Execution: During the execution phase, SAS Initializes the PDV to missing Reads data values in to PDV Carries out assignment statement and conditional processing Writes the observation in the PDV to the output SAS data set at the end of the data step Returns to the top of the Data step Initialize any variable that are not read from SAS data sets to missing Repeat the process
44
Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. As SAS continues to read records, the value in TeamTotal grows larger as more participant scores are added to the variable. _N_ is incremented at the beginning of each iteration of the DATA step. This process continues until SAS reaches the end of the input file. The DATA step stops executing after it processes the last input record.
45
Word Scanner Compiler Input stack Data temp; Input name $ no; Cards; Hari 101 ; %let list=name; proc print data=temp; var &list; run;
46
Input stack ; Input name $ no; Cards; Hari 101 ; %let list=name; proc print data=temp; var &list; run; Data temp compiler Word scanner
47
“The process that SAS uses to extract words and symbols from the input stack to word scanner is called tokenization.” Tokenization is performed by a component of SAS called the word scanner. The word scanner starts at the first character in the input stack and examines each character in turn. Literal a string of characters enclosed in quotation marks. Number digits, date values, time values, and hexadecimal numbers. Name a string of characters beginning with an underscore or letter. Special any character or group of characters that have special meaning to SAS. Examples of special characters include: * / + - ** ; $ ( ). & % =
48
What Are the SAS Language Elements? Data set options Informats and formats Functions Statements SAS system options
49
Definition of Data Set Option: Data set options specify actions that apply only to the SAS data set with which they appear. They enable you to perform operations such as these: Renaming variables Selecting only the first or last n observations for processing Dropping variables from processing or from the output data set Specifying a password for a data set.
50
Syntax for Data Set Options Specify a data set option in parentheses after a SAS data set name. To specify several data set options, separate them with spaces. (option-1=value-1 ) These examples show data set options in SAS statements: data scores (keep=team game1 game2 game3);
51
data points (Keep=Event1 Event2); input TName $ PName $ Event1 Event2 ; datalines; Knights Sue 6 8 Cardinals Jane 9 7 Knights John 7 7 Knights Lisa 8 9 ; run;
52
Formats and Informats De.nition of a Format A format is an instruction that SAS uses to write data values. Syntax of a Format SAS formats have the following form: format. Here is an explanation of the syntax: $ indicates a character format; its absence indicates a numeric format. format names the format.
53
w specifies the format width, which for most formats is the number of columns in the output data. d specifies an optional decimal scaling factor in the numeric formats. data temp; amount=1145.32; put amount dollar10.2; run; The DOLLARw.d format in the PUT statement produces this result: $1,145.32
54
Informats De.nition of an Informat An informat is an instruction that SAS uses to read data values into a variable. For example, the following value contains a dollar sign and commas: $1,000,000 To remove the dollar sign ($) and commas (,) before storing the numeric value 1000000 in a variable, read this value with the COMMA11. informat.
55
Syntax of an Informat SAS informats have the following form: informat. Here is an explanation of the syntax: $ indicates a character informat; its absence indicates a numeric informat. informat names the informat. w specifies the informat width d specifies an optional decimal scaling factor in the numeric informats.
56
data tmp1; input ename $ edate ; informat edate ddmmyy8.; format edate date9.; cards; hari 10/10/07 ; run;
57
Functions: De.nition of Functions A SAS function performs a computation or system manipulation on arguments and returns a value. Syntax of Functions: The syntax of a function is as follows: function-name (argument-1 ) x=max (cash,credit); x=sqrt(1500);
58
Statements Definition of Statements A SAS statement is a series of items that may include keywords, SAS names, special characters, and operators. All SAS statements end with a semicolon. INPUT, List PUT, DATALINES DO Iterative DO Until DO While SELECT, DROP MERGE,SET FILE, LENGTH Sum, END OUTPUT, KEEP, DATA, RETAIN
59
SAS System Options System options are instructions that affect your SAS session. Syntax of SAS System Options The syntax for specifying system options in an OPTIONS statement is OPTIONS option(s); Here is an explanation of the syntax: option specifies one or more SAS system options that you want to change. options nodate linesize=72;
60
STANDARD DATA: The data values are in the standard format then the data is called standard data Eg: 467 NON STANDARD DATA : If data values are not in the standard format then data is called as non-standard data. Eg: 18-10-05 45,000 $21,000
61
Informats are used to read non-standard data : data dates; input name $ Bdate: ddmmyy8. ; format Bdate: ddmmyy8.; cards; hari 21-10-84 ravi 22-11-86 ; run;
62
Date Informats: Date Informat Format 12-07-78 DDMMYY8. DDMMYY8. 21-09-05 DDMMYY10. DDMMYY10. 22Jan89 Date7. Date7. 22jan1989 Date9. Date9.
63
Numeric Informats: Numeric Informat Format 25,000 COMMA6. COMMA6. $3,000 DOLLAR6. DOLLAR6. 25,000 COMMA6. WORDS6.
64
DEFINING VARIABLES IN SAS: INPUT statement provides instructions for reading data, it defines the variables for the data set that come from the raw data. SAS variables can have these attributes: _ name _ type _ length _ informat _ format _ label
65
DIFFERENT WAYS TO READ DATA: 1.RAW DATA IN THE JOB STREAM: You can place data directly in the job stream with the programming statements that make up the DATA step. The DATALINES statement tells SAS that raw data follows. The single semicolon that follows the last line of data marks the end of the data. The DATALINES statement and data lines must occur last in the DATA step statements:
66
data weight_club; input IdNumber 1-4 Name $ 6-20 Team $ StartWeight EndWeight ; datalines; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
67
3.DATA IN A SAS DATA SET You can also use data that is already stored in a SAS data set as input to a new data set. To read data from an existing SAS data set, you must specify the existing data set’s name in one of these statements: _ SET statement _ MERGE statement Data Temp; Set weight_club; Run;
68
2.DATA IN AN EXTERNAL FILE: If your raw data is already stored in a file, then you do not have to bring that file into the data stream. Use an INFILE statement to specify the file containing the raw data. The statements in the code that follows demonstrate the same example, this time showing that the raw data is stored in an external file: data ; infile ’your-input-file path\filename.extension’; input …….; run;
69
4.DATA IN A DBMS FILE: If you have data that is stored in another vendor’s database management system (DBMS) files, then you can use SAS/ACCESS software to bring this data into a SAS data set. SAS/ACCESS software enables you to assign a libref to a library containing the DBMS file. In this example, a libref is declared, and points to a library containing Oracle data. SAS reads data from an Oracle file into a SAS data set: libname dblib oracle user=scott password=tiger ; data employees; set dblib.employees; run;
70
DATA SET OPTIONS: Data set options specify actions that apply only to the SAS data set with which they appear. They enable you to perform operations such as these: KEEP: This example uses the KEEP= data set option in the SET statement to read only the variables that represent the in Set Statement: Data samp; Set weight_club (Keep= IdNumber Team); Run;
71
DROP: Use the DROP= option to create a subset of a larger data set when you want to specify which variables are being excluded rather than which ones are being included. The following DATA step reads all of the variables from the data set weight_club except for those that are specified with the DROP= option, and then creates a data set named A1. Data A1; Set weight_club (Drop= IdNumber Name); Run;
72
OBS= : Specifies when to stop processing observations data s1 ; set weight_club(obs=3); run; Firstobs=: Specifies which observation SAS processes first data s1 ; set weight_club(obs=4 firstobs=2); run;
73
RENAME=: Changes the name of a variable data two (rename=(name=Pname)); set weight_club; run; PW= : Assigns a read, write, or alter password to a SAS and enables access to a password-protected SAS. data two1 (Pw=ram) ; set weight_club; run;
74
WHERE=: Selects observations that meets the specified condition data weight_club; ; Data tmp; set weight_club (where=(Name ="David Shaw")); run; IN=: Creates a variable that indicates whether the data set contributed data to the current observation.
75
DATA STEP STATEMENTS: Data statement: Begins a DATA step and provides names for any output SAS data sets. Creating an Output Data Set data example1 ; set weight_club; run;
76
When Not Creating a Data Set data _NULL_; set weight_club; put Name ; run;
77
CARDS Statement: Indicates that data lines follow DATALINES Statement (New version): Indicates that data lines follow Using the DATALINES Statement In this example, SAS reads a data line and assigns values to two character variables, NAME and DEPT, for each observation in the DATA step:
78
DELETE Statement: Stops processing the current observation if Team=“red” then delete; FORMAT Statement: Associates formats with variables INFORMAT Statement: Associates informats with variables
79
data two1; input ename $ eid hiredate ; informat hiredate mmddyy8.; datalines; hari 101 12/01/05 ravi 102 11/03/06 ; run;
80
DATALINES4 Statement: or Cards4: Indicates that data lines that contain semicolons follow data biblio; input number citation $50.; datalines4; 6 1988 2 LIN ET AL., 1995; BRADY, 1993 3 BERG, 1990; ROA, 1994; WILLIAMS, 1992 ;;;;
81
DM Statement: Submits SAS Program Editor, Log, Procedure Output or text editor commands as SAS Statements dm log ‘clear’; KEEP Statement: Includes variables in output SAS data sets data average; set weight_club; keep name team; run;
82
LABEL Statement: Assigns descriptive labels to variables data rtest; set weight_club; label name=teamname; run;
83
LENGTH Statement: Specifies the number of bytes for storing variables data testlength; input firstname$ lastname$ n1 n2; length name $25 ; datalines; Alexander Robinson 35 11 ;
84
INPUT Statement: Reads input values from specified columns and assigns them to the corresponding SAS variables. This DATA step demonstrates how to read input data records with column input: data scores; input name $ 1-18 score1 25-27 score2 30-32;
85
INPUT METHODS: 1)List INPUT METHOD 2)Column INPUT METHOD 3)NAMED INPUT METHOD 4)FORMATTED INPUT METHOD 5)ABSOLUTE INPUT METHOD
86
1)List INPUT METHOD: In this method the data values should be seperated by at least single space. EG: -Do- 2)Column INPUT METHOD: In this method character data values contain more than 8 characters and it can contain blank spaces also.
87
data temp; input id 1-3 name $ 7-18 age 21-22; cards; 101 shiva krish 38 102 ravi krish 38 103 rama krish 38 ; run; 3)NAMED INPUT METHOD In this method data values are followed by variable names.
88
data samp; input id= name= $ age=; cards; id=290 name=ravi age=20 id=291 name=rani age=19 ; run; 4)FORMATTED INPUT METHOD In this method variables length followed by period to specify the length of the variable for all data values.
89
data one; input id 3. name $ 11. age 3.; datalines; 101 praveenraj 25 102 kiranraj 23 ; run; 5)ABSOLUTE INPUT METHOD In this input method we are using column hold pointer to give exact location of data values.
90
data two; input @1 id 3.+5 @10 name $ 4.+5 @19 age; cards; 102 hari 29 ; run;
91
Holding a Record Across Iterations of the DATA Step The INPUT statement uses the double trailing @ to control the input pointer across iterations of the DATA step. data test; input name $ age @@; datalines; John 13 Monica 12 Sue 15 Stephen 10 Marc 22 Lily 17 ;
92
data num; infile datalines dsd; input x y z; datalines; 1,2,3 4,5,6 7,8,9 ; data nums; infile datalines dsd delimiter='*'; input X Y Z; datalines; 1*2*3 4*5*6 7*8*9 ;
93
data weather; infile datalines missover; input temp1-temp5; datalines; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ;
94
Thanks Feedback at info@Sitworld.in
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.