Download presentation
Presentation is loading. Please wait.
1
Chapter 2: Getting Data into SAS
Data is stored in many different formats Here are four categories of methods to read in data Entering data directly through the keyboard (small data sets) Creating SAS data sets from raw data lines Book likes direct data entry via Viewtable; I find it cumbersome. See class demos for use of raw data lines. © Fall 2011 John Grego and the University of South Carolina
2
Getting Data into SAS Here are four categories of methods to read in data Converting other software’s data files (e.g., Excel) into SAS data sets (my favorite!) Reading other software’s data files directly (often requires additional SAS/ACCESS products) © Fall 2011 John Grego and the University of South Carolina
3
Direct Data Entry with Viewtable window
Select Tools → Table Editor, Edit column attributes Enter data File→Save As… © Fall 2011 John Grego and the University of South Carolina
4
Import Window Allows you to import various types of data files (Microsoft Excel formats) The default is for the first row to be variable names. Use Options… to change. Options… button also selects worksheet from workbook We will step through this using ToughSpreadsheet.sas © Fall 2011 John Grego and the University of South Carolina
5
Import Window-Libraries
Work library--the data set is deleted after exiting SAS Other libraries—data set saved (but not necessarily library location) after exiting SAS © Fall 2011 John Grego and the University of South Carolina
6
PROC IMPORT Can save PROC IMPORT step used to import data
The default generated step includes options discussed in text DBMS=EXCEL may work better than DBMS=XLSX © Fall 2011 John Grego and the University of South Carolina
7
Reading in Raw Data Type data directly into a SAS program using the statements that appear to the right cards; datalines; lines; Bring up agency.sas © Fall 2011 John Grego and the University of South Carolina
8
Reading in Raw Data-INFILE
If your raw data is an external text file, you could use an INFILE statement to tell SAS where it is. Specify the full path name If your lines are longer than 256 charcters, use LRECL=.. Fall95.sas has an old INFILE statement. LongRecord.sas has an example of LRECL later.
9
Data Separated by Spaces
This style is called “free format” or list input since the number of spaces in between variables is flexible Use the INPUT statement to name variables Include a $ after names of character variables Bring up agency.sas. Again. Add some spaces to show free format still works.
10
Data Arranged in Columns
Knowledge of this approach is particularly critical for large data sets Each value of a variable is found at the same spot on the data line Advantages: Don’t need spacing between values Missing values don’t need a special symbol (can be blank) Show Fall95.sas and David Hitchcock’s example
11
Data Arranged in Columns-II
Advantages: Character data can have embedded blanks You can skip variables you don’t need to read into SAS Example: INPUT var var var3 $ 16-30; Fixed width
12
Data Not in Standard Format
Types of non-standard data: Numbers with commas or dollar signs Dates and times of day We can read nonstandard data using codes known as informats Most informats end in . (so SAS won’t confuse them with a variable) Show ToughSpreadsheet.xlsx or ToughSpreadsheet.txt
13
SAS Informats Import from Excel often assigns informats automatically
Pages list many SAS informats (Note that date informats are converted to a numerical value—Julian date) Run ToughSpreadsheet.sas
14
Other Inputting Issues
You can mix input styles: read in some variables list-style; others column-style; others using informats; even the order can be shuffled; E.g, you can explicity move the pointer to a specific column number: @50 moves the pointer to the 50th column I for formatted input. @’charstring’—the book has a fake-y example of this.
15
Messy Data colon modifier: Tells SAS exactly how many columns a variable occupies, but stops when it reaches a space This method is not appropriate for characters with embedded spaces Example: Deptname :$15. tells SAS to read Deptanme for 15 characters or until it reaches a space (or delimiter character). It handles ragged free format data well.
16
Multiple Lines of Data per Observation
Sometimes each observation will be on several lines in the raw data file (census data, standardized test scores, etc) Use / to tell SAS when to go to the next line Or use #2, for example, to tell SAS to go to the 2nd line of the observation PACT data we worked with had over 2000 columns. Run LongRecord.sas
17
Multiple Records per Line of Raw Data
Sometimes several observations will be on one line of data This is common for textbook exercises Use to tell SAS to keep the pointer on the raw data line and prepare to read the next observation Simple example: data a; input y x ; run;
18
Reading Part of a Data File
Sometimes we want to modify data input based on values of one variable We can read just the first variable(s) using sign and are not closely related, but the text draws a connection Run Fall95.sas and atandatat.sas
19
Reading Delimited Files
These instructions have been almost completely subsumed by the Import Wizard and the widespread use of Excel DLM= specifies alternative delimiters Comma: DLM=‘,’ Tab: DLM=‘09’X #: DLM=‘#’ Skim INFILE material, though we cover it in HW 3. Show INFILE options in David Hitchcock’s example.
20
Missing values Two delimiters in a row are read as a single delimiter
What if two commas in a row indicate a missing value? Or data values contain commas? Can use DSD option Data values with commas must be in quotes
21
Truncated Records MISSOVER adds missing values at the end of a short record rather than reading more data from the next line TRUNCOVER is used to advance to the next line when the length of a record is shorter than the variable field.
22
SAS data sets: Temporary and Permanent
Data sets stored in Work library are temporary (removed upon exiting SAS) Data sets stored in other libraries are permanent (will be saved upon exiting SAS)
23
Permanent SAS Data Sets
You can specify the library when creating a data set in the DATA step Example: Suppose you have a library called sportlib (this is a “libref”) DATA sportlib.baseball; creates a data set baseball to be stored in the permanent sportlib library
24
Direct References You can create a permanent SAS data set without creating a library Example: Suppose you want to save baseball.sas7bdat in your 540 directory: DATA “e:\STAT 540\baseball”;
25
Temporary SAS Data Sets
DATA work.baseball; stores baseball in the temporary work library DATA baseball; stores baseball in the default temporary work library Book covers PROC CONTENTS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.