Presentation is loading. Please wait.

Presentation is loading. Please wait.

14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.

Similar presentations


Presentation on theme: "14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training."— Presentation transcript:

1 14b. Accessing Data Files in SAS ®

2 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training Modules  2. NLTS2 Study Overview  3. NLTS2 Study Design and Sampling  NLTS2 Data Sources, either 4. Parent and Youth Surveys or 5. School Surveys, Student Assessments, and Transcripts  NLTS2 Documentation 10. Overview 11. Data Dictionaries 12. Quick References

3 14b. Accessing Data Files in SAS ® 2 Overview Purpose Opening and viewing data files Limiting variables Subsetting cases Joining/combining data files Summary Closing Important information

4 14b. Accessing Data Files in SAS ® NLTS2 restricted-use data NLTS2 data are restricted. Data used in these presentations are from a randomly selected subset of the restricted-use NLTS2 data. Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.

5 14b. Accessing Data Files in SAS ® 4 Purpose Learn to  Open a data file  See what is in a file (i.e., contents of the file)  “Size” a data file for a perfect fit Reduce the number of variables Reduce the number of cases (i.e., subset the data)  Combine information from multiple sources Bring in data from another source or another wave Join or combine files  Create a new file

6 14b. Accessing Data Files in SAS ® 5 Open and view data files SAS ® and SPSS ® data are in separate folders. SAS data files have a “.sas7bdat” extension. Associated value formats are stored in a SAS library, “Formats.sas7bcat.” SAS programming code is available for re-creating the user- defined formats in the SAS format library.  Hyperlinked from the table of contents on the database and documentation disk.

7 14b. Accessing Data Files in SAS ® 6 Open and view data files Files are either read from or written to. Files have a name and a location where they are stored.  SAS needs to know the name of the file and where to find it. SAS uses a LIBNAME statement to identify the path.  The path describes the nesting of folders.  C:\myprojects\NLTS2\Data is a path or location. i.e., the file is located on Drive “C”, in the folder “Data,” which is nested inside the “myprojects” and “NLTS2” folders. An example LIBNAME statement for a path would be LIBNAME sasdb 'C:\myprojects\NLTS2\Data' ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

8 14b. Accessing Data Files in SAS ® 7 Open and view data files  An easy way to view the contents of the file is PROC CONTENTS.  Specify LIBNAME statements for the data files and format library. LIBNAME [ddname] '[path]' ; LIBNAME library '[path]' ;  Syntax PROC CONTENTS DATA = [ddname].[file name] ; RUN ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

9 14b. Accessing Data Files in SAS ® 8 Open and View Data Files: Example Viewing files  Use the Wave 1 teacher file  Run a PROC CONTENTS. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

10 14b. Accessing Data Files in SAS ® 9 Open and view data files: Example These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

11 14b. Accessing Data Files in SAS ® 10 Viewing files These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

12 14b. Accessing Data Files in SAS ® 11 Limiting variables How to reduce the number of variables in the file  Large files with many cases and many variables are unwieldy; simplify. Fewer variables to search through. Fewer cases to process.  Create data files that are limited to just those variables needed for analysis.  You have the choice of drop or keep. Which one is best? The one that requires less typing! If you are dropping more variables than keeping, use “keep” and vice versa. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

13 14b. Accessing Data Files in SAS ® 12 Limiting variables Note: When making changes to your data  Use work files for temporary changes. Work files are files that are in existence only for the duration of the program or SAS interactive session. Work files are temporary files unless they are saved. Work files have a one-level name—no ddname needed  To save modified data permanently, create a new permanent data file. Usually it is best to create a new file rather than to modify the source file. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

14 14b. Accessing Data Files in SAS ® 13 Limiting variables Syntax DATA Par_w1_lmt_vars (DROP= np1E2a np1GroupMember); SET sasdb.n2w1parent (KEEP= ID w1_DisHdr2001 w1_GendHdr2001 w1_IncomeHdr2001 w1_AgeHdr2001 np1Weight np1HealthProb np1GroupMember np1ProblemCount np1E2a np1B2a) ; run ; A “KEEP” or “DROP” option on a “SET” or “MERGE” statement controls which variables come into the data set. A “KEEP” or “DROP” option on a “DATA” statement controls which variables are saved on the output data set. Notice that SAS statements end in a semicolon. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

15 14b. Accessing Data Files in SAS ® 14 Limiting Variables: Example Limiting variables  Create a file with fewer variables.  Create a new file called “PrScores” from n2w2dirassess.  Keep only the following variables: ID ndacalc_pr ndaPC_PR ndasyn_pr NDaF1_friend na_age4  Save the new file  Review the new file with a PROC CONTENTS w2_dis12 w2_gend2 na_grade4 w2_incm3 wt_na These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

16 14b. Accessing Data Files in SAS ® 15 Limiting variables: Example These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

17 14b. Accessing Data Files in SAS ® 16 Subsetting cases How to reduce the number of cases in a data step or in a procedure Often analysis is done on a subset; for example: Select only youth with visual impairment. Select only youth who are out of secondary school. Exclude younger students. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

18 14b. Accessing Data Files in SAS ® 17 Subsetting cases Example: Limit Wave 4 parent/youth interview data to those who are 21 or older, excluding youth who are 19 or 20 (W4_Age2007 = 19 or 20). Syntax to limit cases in a file DATA AgeGT20 ; SET sasdb.n2w4paryouth (WHERE=(W4_Age2007>20)); run; or DATA AgeGT20 ; SET sasdb.n2w4paryouth ; IF W4_Age2007>20; run ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

19 14b. Accessing Data Files in SAS ® 18 Subsetting cases Syntax to limit cases in a procedure  No change to file with this syntax PROC FREQ data=sasdb.n2w4paryouth ; WHERE W4_Age2007>20 ; TABLES W4_Age2007 ; run ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

20 14b. Accessing Data Files in SAS ® 19 Subsetting cases: Example Subsetting cases  Create a small data set with a subset of cases.  Use “PrScores” created in previous example.  Limit cases to those classified with hearing. impairment only, i.e., those with a value of “5” for “w2_dis12”.  Look at notes in the log window. Are there any clues that the file has changed? These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

21 14b. Accessing Data Files in SAS ® 20 Joining/combining data files How to bring in data from another file Purpose  Learn to combine or join files Bring in data from another source Bring in data from another wave  Learn what to watch for Number of cases in the combined file How cases are joined – Key variable, i.e., which variable to match on – Keyed file, i.e., which cases to keep These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

22 14b. Accessing Data Files in SAS ® 21 Joining/combining data files Why do this?  Often it is necessary to combine information from different files to perform comparative analyses, create new variables, or measure differences over time. For example, you may want to  Create composite variables from multiple sources.  Look at similar items at different points in time. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

23 14b. Accessing Data Files in SAS ® 22 Joining/combining data files Example of composite variables from multiple sources.  Create a variable for “if parent attended a parent/teacher conference” using Wave 2 teacher survey item nts2C8, and fill in with parent interview item np2E1a_d if teacher data are missing. Example of items at different points in time.  Create a variable to look at the pattern of employment between waves 2 and 3: employed both waves, either wave, or neither wave. Set to “employed both waves” if np2HasPdJob (W2) and np3HasJob (W3) are “yes.” Else set to “employed in either wave” if np2HasPdJob or np3HasJob are “yes.” Else set to “not employed” if np2HasPdJob and np3HasJob are “no.” These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

24 14b. Accessing Data Files in SAS ® 23 Joining/combining data files Hypothetical Example: Data Availability Across Instruments Youth Interview Data W1 Assessment Data W2 Interview Data W2 Program Data W2 1Yes No 2YesNoYesNo 3 Yes 4 No Yes 5 NoYesNo 6Yes There will be missing records across files and missing items within files. If data look like this, ask which file is the main file being analyzed. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

25 14b. Accessing Data Files in SAS ® 24 Joining/combining data files Data in all files must be sorted by the key variable.  The key variable matches files case by case.  Key variable is “ID.” Files on CD should be sorted by key variable, but as you work with files they may become unsorted. Syntax to sort data PROC SORT data=sasdb.PrScores ; by ID ; run ; Note for those who do not want to destroy data  If you use a KEEP or DROP statement, do a temporary sort, or otherwise change the data, save to a new file. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

26 14b. Accessing Data Files in SAS ® 25 Joining/combining data files Syntax to join files DATA Emplmt ; MERGE sasdb.n2w1parent (KEEP=id np1i_3a_7) sasdb.n2w2paryouth (KEEP=id np2HasPdJob) sasdb.n2w3paryouth (KEEP=id np3HasJob) sasdb.n2w4paryouth (KEEP=id np4HasJob) ; BY ID ; run ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

27 14b. Accessing Data Files in SAS ® 26 Joining/combining data files SAS log These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

28 14b. Accessing Data Files in SAS ® 27 Joining/combining data files Syntax to join files keeping only those who have Wave 4 data DATA Emplmt ; MERGE sasdb.n2w1parent (KEEP=id np1i_3a_7) sasdb.n2w2paryouth (KEEP=id np2HasPdJob) sasdb.n2w3paryouth (KEEP=id np3HasJob) sasdb.n2w4paryouth (KEEP=id np4HasJob in=inW4) ; BY ID ; IF inW4 ; run ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

29 14b. Accessing Data Files in SAS ® 28 Joining/combining data files SAS log These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

30 14b. Accessing Data Files in SAS ® 29 Joining/combining data files Suggestion: Name files in a meaningful way, such as  By date: AnFile_29July  By type of Analysis: PI_CrossWave  By source: PI_W123  By sequence: File_5 Be sure to use a two-level data set name (using a ddname) to save the file; otherwise, it is a work or temporary file. DATA Emplmt ; /* temporary work file */ DATA sasdb.Emplmt; /* saved file */ These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

31 14b. Accessing Data Files in SAS ® 30 Joining/combining data files: Example Joining/combining data  Combine data from another file with an existing file.  Sort PrScores by ID.  Bring in np2HasPdJob from n2w2paryouth.  Bring in np3HasJob from n2w3paryouth.  Save the file as PrScoresEmp. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

32 14b. Accessing Data Files in SAS ® 31 Joining/combining data files: Example These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

33 14b. Accessing Data Files in SAS ® 32 Summary Congratulations! You have learned to  Open and view a file  Create a new file  Reduce the size of files by specifying the Variables needed Cases needed  Join files using a key variable  Save files with a new name

34 14b. Accessing Data Files in SAS ® 33 Closing Topics discussed in this module  Purpose  Opening and viewing data files  Limiting variables  Subsetting cases  Joining/combining data files Next module  15b. Accessing Data: Frequencies in SAS

35 14b. Accessing Data Files in SAS ® 34 Important information  NLTS2 website contains reports, data tables, and other project-related information http://nlts2.org/http://nlts2.org/  Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/http://nces.ed.gov/statprog/rudman/  General information about restricted data licenses can be found on the NCES website http://nces.ed.gov/statprog/instruct.asphttp://nces.ed.gov/statprog/instruct.asp  E-mail address: nlts2@sri.com


Download ppt "14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training."

Similar presentations


Ads by Google