Presentation is loading. Please wait.

Presentation is loading. Please wait.

17b.Accessing Data: Manipulating Variables in SAS ®

Similar presentations


Presentation on theme: "17b.Accessing Data: Manipulating Variables in SAS ®"— Presentation transcript:

1 17b.Accessing Data: Manipulating Variables in SAS ®

2 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training Modules  2. NLTS2 Study Overview  3. NLTS2 Study Design and Sampling  NLTS2 Data Sources, either 4. Parent and Youth Surveys or 5. School Surveys, Student Assessments, and Transcripts  NLTS2 Documentation 10. Overview 11. Data Dictionaries 12. Quick References

3 17b. Accessing Data: Manipulating Variables in SAS ® 2 Prerequisites Recommended modules to complete before viewing this module (cont’d)  13. Analysis Example: Descriptive/Comparative Using Longitudinal Data  Accessing Data 14b. Files in SAS 15b. Frequencies in SAS

4 17b. Accessing Data: Manipulating Variables in SAS ® 3 Overview  Purpose  Modifying existing variables  Creating new variables  Summary  Closing  Important information

5 17b. Accessing Data: Manipulating Variables in SAS ® 4 NLTS2 restricted-use data NLTS2 data are restricted. Data used in these presentations are from a randomly selected subset of the restricted-use NLTS2 data. Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.

6 17b. Accessing Data: Manipulating Variables in SAS ® 5 Purpose Learn to  Modify an existing variable  Create a new variable  Join/combine data from different sources

7 17b. Accessing Data: Manipulating Variables in SAS ® 6 Modifying existing variables How to modify a variable. To collapse categories, break a continuous variable into categories, or recode a variable, it is not always necessary to create a new variable in SAS.  User-assigned formats control how output prints but does not change the variable. Syntax for categorizing an existing variable with a format PROC FORMAT ; VALUE b2catfmt low-1 = "(<=1) 1 or younger" 2-5 = "(2-5) 2 to 5 years of age" 6-10 = "(6-10) 6 to 10 years of age" 11-high = "(>=11) 11 or older" ; PROC FREQ data = collapse ; TABLES np1B2a ; FORMAT np1B2a b2catfmt. ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

8 17b. Accessing Data: Manipulating Variables in SAS ® 7 Modifying existing variables Syntax to modify an existing variable  Create a new variable rather than permanently changing the exiting variable  Create a new format so values are meaningful PROC FORMAT ; VALUE b2catfmt 1 = "(1) 1 or younger" 2 = "(2) 2 to 5 years of age" 3 = "(3) 6 to 10 years of age" 4 = "(4) 11 or older" ;  Recode the variable in a data step This would result in a temporary change. Why? What would make it a permanent change? DATA collapse ; SET sasdb.n2w1parent ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

9 17b. Accessing Data: Manipulating Variables in SAS ® 8 Modifying existing variables Syntax to recode an existing variable into a new variable with value and variable labels. /* create age of youth when diagnosed – with age range categories*/ if missing(np1B2a) then np1B2a_Cat = np1B2a ; else if np1B2a <= 1 then np1B2a_Cat = 1 ; else if 2<=np1B2a<=5 then np1B2a_Cat = 2 ; else if 6<=np1B2a<=10 then np1B2a_Cat = 3 ; else if np1B2a > 10 then np1B2a_Cat = 4 ; FORMAT np1B2a_Cat b2catfmt. ; LABEL np1B2a_Cat = '(np1B2a_cat) Age of youth when diagnosed - categorized into ranges' ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

10 17b. Accessing Data: Manipulating Variables in SAS ® 9 Modifying existing variables Look at results  Run a frequency of the new variable  Useful to look at a crosstab of the original variable by the new variable to check how values were coded Look at frequency distributions and crosstab of new vs. old variables  The “LIST” option on TABLES statement will print the crosstab table more compactly.  A FORMAT statement without a format specified will strip existing formats. TABLES np1B2a_Cat * np1B2a /MISSPRINT LIST ; FORMAT np1B2a ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

11 17b. Accessing Data: Manipulating Variables in SAS ® 10 Modifying existing variables These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

12 17b. Accessing Data: Manipulating Variables in SAS ® 11 Modifying existing variables These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

13 17b. Accessing Data: Manipulating Variables in SAS ® 12 Modifying existing variables: Example Modifying a variable  Use Wave 3 parent/youth interview file  Collapse np3NbrProbs into a new variable 0-1 2 3 4-6  Remember to Label the variable. Add value formats. Account for missing values. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

14 17b. Accessing Data: Manipulating Variables in SAS ® 13 Modifying existing variables: Example PROC FREQ with a user-defined format (no change made to np3NbrProbs) These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

15 17b. Accessing Data: Manipulating Variables in SAS ® 14 Modifying existing variables: Example PROC FREQ with new variable np3NbrProbs_Cat created from np3NbrProbs These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

16 17b. Accessing Data: Manipulating Variables in SAS ® 15 Modifying existing variables: Example Created np3NbrProbs_Cat compared with original np3NbrProbs Stripped existing formats from np3NbrProbs with format statement FORMAT np3NbrProbs; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

17 17b. Accessing Data: Manipulating Variables in SAS ® 16 Creating new variables How to create a new variable. The values in the new variable can be the results of calculations, assignments, or logic. A new variable can be created from an existing variable or from multiple variables, including variables from other sources and/or waves.  Variables from other sources/waves must be added to the active data file before creating the new variable. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

18 17b. Accessing Data: Manipulating Variables in SAS ® 17 Creating new variables Be aware of any coding differences between the variables when combining values. Decide what to do with missing values. Example: Create a variable using parent interview data from Waves 1, 2, and 3.  Has student been suspended and/or expelled in any wave? These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

19 17b. Accessing Data: Manipulating Variables in SAS ® 18 Creating new variables Create a format for the new variable and join data needed PROC FORMAT ; VALUE fmta 0 = "(0) Never suspended/expelled" 1 = "(1) Suspended or expelled in any wave" 2 = "(2) Suspended or expelled every wave" ; DATAcollapse ; MERGE sasdb.n2w1parent (keep=ID np1d7h) sasdb.n2w2paryouth (keep=ID np2d5d) sasdb.n2w3paryouth (keep=ID np3d5d) sasdb.n2w4paryouth(keep=ID np4d5d) ; BY ID ; These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

20 17b. Accessing Data: Manipulating Variables in SAS ® 19 Creating new variables Syntax If np1D7h>=0 and np2D5d>=0 and np3D5d>=0 and np4D5d>=0 then do ; if np1D7h=1 and np2D5d=1 and np3D5d=1 and np4D5d=1 then np4D5d_ever = 2 ; else if np1D7h=1 or np2D5d=1 or np3D5d=1 or np4D5d=1 then np4D5d_ever = 1 ; else np4D5d_ever = 0 ; end ; Code will result in a variable that  Requires a value for every wave  Is 0 if never suspended/expelled  Is 1 if suspended/expelled in any wave  Is 2 if suspend/expelled in all three waves. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

21 17b. Accessing Data: Manipulating Variables in SAS ® 20 Creating new variables These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

22 17b. Accessing Data: Manipulating Variables in SAS ® 21 Creating new variables: Example Creating a new variable  Use the Wave 4 parent/youth interview file.  Bring in np1F7 from Wave 1, np2P8_J4 from Wave 2, and np3P8_J4 from Wave 3 interview files.  Create a new variable np4P8_J4_ever (ever done volunteer or community service).  Initialize value to “0” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “0.”  Reassign to “1” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “1.” These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

23 17b. Accessing Data: Manipulating Variables in SAS ® 22 Creating new variables: Example Creating a new variable (cont’d)  Assign a variable label and value labels.  Run a frequency of np4P8_J4_ever.  Run a crosstabulation of np4P8_J4_ever by np4P8_J4. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

24 17b. Accessing Data: Manipulating Variables in SAS ® 23 Creating new variables: Example These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

25 17b. Accessing Data: Manipulating Variables in SAS ® 24 Summary Be aware of differences in coding between similar variables when building composite variables. Missing values must be considered.  Know how missing values are being coded, particularly when using more than one variable to create another.  Joined data are more likely to have missing values. Weights  Generally, the analysis weight would be the weight from the smallest sample when combining data.  When filling in values for a variable in an active file with values from another, it is OK to use the weight in the active file. These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

26 17b. Accessing Data: Manipulating Variables in SAS ® 25 Summary Know the values, mind the missing, and watch your weights! These results cannot be replicated with full dataset; all output in modules generated with a random subset of the full data.

27 17b. Accessing Data: Manipulating Variables in SAS ® 26 Closing Topics discussed in this module  Modifying existing variables  Creating new variables  Summary Next module:  18b. PROC SURVEY Procedures in SAS

28 17b. Accessing Data: Manipulating Variables in SAS ® 27 Important information  NLTS2 website contains reports, data tables, and other project-related information http://nlts2.org/http://nlts2.org/  Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/http://nces.ed.gov/statprog/rudman/  General information about restricted data licenses can be found on the NCES website http://nces.ed.gov/statprog/instruct.asphttp://nces.ed.gov/statprog/instruct.asp  E-mail address: nlts2@sri.com


Download ppt "17b.Accessing Data: Manipulating Variables in SAS ®"

Similar presentations


Ads by Google