Download presentation
Presentation is loading. Please wait.
1
Instructor: Raul Cruz-Cano raulcruz@umd.edu 7/9/2012
EPIB 698C Lecture2 Instructor: Raul Cruz-Cano 7/9/2012
2
Creating Data in SAS, an overview
Creating datasets by hand entry (Viewtable window, CARDS, DATALINES statement) Reading dataset from external files (not SAS data, INFILE statement) Using Import/Export facility (a point-and-click approach)
3
Entering Data with Viewtable window
To open viewtable window, select “Table Editor” from the Tools menu. An empty viewtable window will appear The letters at the tops of columns are default variable names. Right click on the letter and open the Column attributes window. You can replace variables name, type, etc
4
Entering Data with Viewtable window
Entering you data once you have defined your columns. To save your table, select “Save as” from the File menu, then select a library and specify the name of your table (SAS dataset) To open an existing table, go to ToolsTable editor, the view table window will be opened. Then go the File menu, click Openchoose libraryselect Table name. To switch from browse mode (default) to edit mode, select “Edit Mode” from the Edit menu Using Viewtable you can easily create a data table by setting the columns, you can add rows to the table. However, you can not add columns once you finish defining your columns. This is a considerable disadvantage.
5
Reading Data Inline, CARDS Statement
You enter the actual data points inside the PROGRAM EDITOR Example: CARDS statement data instructor; input name $ gender $ age; cards; Jane F 30 Mary F 29 Mike M 28 ; run;
6
Examples Reading multiple observations in each line of data by adding the symbol at the end of the input statement data aaa; input x y datalines; ; run;
7
Reading Dataset from External Files: INFILE Statement
Identifies the external file that contains the data and has options that control how the records in file are read into SAS Must be used before the input statement because it locates the data file to be read Syntax data data_set_name; Infile directory_and_file_name; input variable_list; run;
8
Reading raw data separated by spaces—list input
List input (also called free formatted input) can read data separated by at least one space. By default, SAS assumes data values are separated by one or more blanks. Will read all the data in a record, no skipping unwanted values Any missing data must by indicated with a period Character data must be simple, no embedded spaces
9
Reading raw data separated by spaces—list input
Example: data demographics; infile ’F:\teaching\SAS\lecture2\mydata.txt'; input Gender $ Age Height Weight; run; $ sign after gender means that gender is a character variable
10
Specify missing values with list input
We use a period to represent missing values M M F F M M F F M M
11
Reading raw data separated by commas
Comma separated values file (csv file) use commas as data delimiters They may or may not enclose character values in quotes. Example: mydata.csv "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166
12
Reading raw data separated by commas
data demographics; infile 'c:\books\learning\mydata.csv' dsd; input Gender $ Age Height Weight; run; Dsd: means delimiter sensitive data. It has several functions. (1)change the default delimiter from a blank to a comma. (2)If there are two delimiters in a row, it assumes there is a missing value in between (3)If character values are placed in quotes, the quotes are stripped from the value
13
INFILE statement, useful options
DSD It recognizes two consecutive delimiters as a missing value. Example: 20,30,,50, SAS will treat this as but with the the dsd option SAS will treat it as It allows you to include the delimiter within quoted strings. Example: a comma separated file and your data included values like "George Bush, Jr.” With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.
14
INFILE statement (cont.)
DLM= The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. dlm=‘,’ indicates a comma is the delimiter (e.g., a comma separated file, .csv file). dlm='09'x indicates that tabs are used to separate your variables dlm=‘:’ indicates a colon is the delimiter
15
INFILE statement (cont.)
We can use dsd and dlm at the same time: infile ‘file-description’ dsd dlm=‘:’ ; This combination of options performs all the actions requested by the DSD option, but overrides the default delimiter (comma) with a delimiter of your choice
16
INFILE statement (cont.)
useful options continued… missover: if number of variables in file does not match number of variables in input file all remaining variables are set to missing. obs: specifies the last record to be read into the data set firstobs: specifies the first line of data to be read into data set. Useful if there is a header row in the dataset.
17
Filename statement Filename statement identifies the file and associate it with a reference name then use this reference in your INFILE statement instead of the actual file name filename mydata_file 'F:\teaching\SAS\lecture2\mydata.csv'; data demographics; infile mydata_file dsd; input Gender $ Age Height Weight; run;
18
Specifying INFILE options with the DATALINES statement
data demographics; infile datalines dsd; input gender $ Age Height Weight; datalines; "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166 ; run;
19
Reading data from fixed columns
Many raw data files store specific information in fixed columns The advantage of fixed column files: (1) don’t need to worry about missing values (2) you can choose which variables to read and in what order to read them
20
Bank data columns 00110/ 21/1955M 00211/ 18/2001F Column 1-3 : subject ID Column 4-13: Date of birth Column 14-14: gender Column 15-21: Account balance
21
Column Input data financial; infile ‘f:\SAS\learning\bank.txt';
input Subj $ 1-3 DOB $ 4-13 Gender $ 14 Balance ; run;
22
Formatted Input Formatted input can read both character and standard numerical values as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates. Formatted input is the most common and powerful of all input methods SAS Formats and Informats: An informat is a specification for how raw data should be read. A format is a layout specification for how a variable should be printed or displayed.
23
Formatted Input The bank data: data financial;
infile ‘F:\teaching\SAS\bank.txt'; input @1 Subj $3. @4 DOB mmddyy10. @14 Gender $1. @15 Balance ; run;
24
Formatted Input sign the INPUT statement are called column tells SAS to go to column 4. Following variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value
25
Formatted Input Two of the most basic informats are w.d and $w.
The w.d informat reads standard numeric values. The w tells SAS how many columns to read; the optional d tells SAS that there is an implied decimal point in the value. For examples: data value is 123, With informat 3.0, SAS will save it as 123; With informat 3.1, SAS will save it as 12.3; If the data value already has a decimal in it, then SAS ignores the d option. For examples: data value is 1.23, With informat 4.1, SAS will it as 1.23;
26
Formatted Input The $w. Informat tells SAS to read w columns of character data The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date. SAS stores dates as numeric values equal to the number of days from January 1, Eg, if you read 01/01/1960, SAS stores a value of 0. The data 01/02/1960 is stored as a value of 1.
27
The format statement The format statements are built-in SAS command that allow you to display data in easily readable ways. All SAS formats command ends either in a period or in a period follows by a number. title "Listing of FINANCIAL"; proc print data=financial; format DOB mmddyy10. Balance dollar11.2; The dollar11.2 tells SAS to put a $ sign in front of the number, and allow up to 11 columns to print the balance values, the 2 tells SAS to include two decimal places after the decimal points.
28
Using a format/informat statement in a DATA step
It is usually more useful to place your format statement with a SAS data step. There is a permanent association of the formats and variables in the data set. You can override any permanent format by placing a FORMAT statement in a particular procedure.
29
A informat statement with list input
Following the key word informat, you list each variable and information you want to use to read each variable data list_example; informat Subj $3. Name $20. DOB mmddyy10. Salary dollar8.; infile 'c:\books\learning\list.csv' dsd; input Subj Name DOB Salary; format DOB date9. Salary dollar8.;
30
infile 'c:\books\learning\list.csv ' dsd; input Subj : $3. Name : $20.
data list_example; infile 'c:\books\learning\list.csv ' dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9. Salary dollar8.; there is a colon (called an informat modifier) preceding of each informat. It tells SAS to use informat supplied but to stop reading the values for this variable when a delimiter is met. (2) Without it, SAS may read past a delimiter to satisfy the width specified in the informat. 30 30
31
What are the Differences
Informats used at input Usually reading external data Formats used during output cycle Write formatted value to output
32
Selected date informats
Data form Input data Input statement Datew. ddmmmyyyy 1Feb1961 Input date Date9. ddmmmyy 1Feb61 Input date Date7. DDMMYYw. ddmmyy Input date ddmmyy8. ddmmyyyy 01/02/1961 Input date ddmmyy10. MMDDYYw. mmddyy Input date mmddyy8. Input date mmddyy10.
33
Selected date formats formats Data displayed Input data
format statement Results Datew. ddmmmyyyy 366 format date Date9. 01Jan1961 ddmmmyy format date Date7. 01Jan61 MMDDYYw. mmddyy format date mmddyy8. 01/01/61 mmddyyyy format date mmddyy10. 01/01/1961
34
More examples of informat for numeric data
Definition Input data INPUT statement Results COMMAw.d Removes embedded commas and $, converts left parentheses to minus sign $1,000 Input income comma6.0; 1000 (1,234) Input income comma7.0; -1234 Percentw. Converts percent to numbers (20%) Input value Percent5.; -0.2 w.d Reads standard numeric data -12.3 5.1;
35
More examples of informat for character data
Definition Input data INPUT statement Results $CHARw. Reading character data, do not trim leading or trailing blanks my cat Input animal $char10.; $w. Read character data, trim blanks
36
Import/Export Data To Export SAS datasets
Go to the File menu and select “Export Data” Choose the data file ( from the library Work) Locate and select file type using the browse button Save the data set and finish Check the log to make sure the data set was created This method does not require a data step, but any modification may require a data step Convenient for Excel file Import a SAS data set follows similar step
37
Read data in Excel file (1) Use Import procedure
FileImport dataChoose Microsoft ExcelClick “Next”Select work sheet using browseselect “Table” you want to importClick “Next”Select Library and assign a file nameClick “Next” if you want Proc import SAS code generated, otherwise click “Finish” Check the log window to make sure the data was created successfully. (2) Save Excel file as a csv file, then read in using the infile statement with dsd option.
38
Read existing SAS data:Libname and set statement
libname epib "D:\SAS_summer10\lecture2_3\lecture_data"; data new; set epib.list_example2; run;
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.