Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lesson 9 - Topics Restructuring datasets LSB: 6:14

Similar presentations


Presentation on theme: "Lesson 9 - Topics Restructuring datasets LSB: 6:14"— Presentation transcript:

1 Lesson 9 - Topics Restructuring datasets LSB: 6:14
Programs in course notes Welcome to lesson 9. Today we will look at applications involving restructuring SAS datasets. These topics are illustrated in programs in the course notes..

2 PROC TABULATE S=[cellwidth=150]; CLASS sex ;
Some Tabulate Options PROC TABULATE S=[cellwidth=150]; CLASS sex ; VAR varlist / S=[cellwidth=275]; TABLE …/ box=‘Dietary Nutrient’; RUN; Note first the CLASS and VAR statement in PROC TABULATE are the same as you would use in PROC MEANS. What follows is the TABLES statement. This can be a bit tricky to follow but let’s give it a try. Take consolation that there are entire classes and books on how to use PROC TABULATE, so don’t be too discouraged if you don’t get it all the first time. Remember from the output on the previous slide the row information related to treatment group in TOMHS, the variable group. That is placed first followed by the keyword ALL to give the total. The part in quotes is the label for the total. A comma is then typed followed by the column information you want: here the N and MEAN for each of the variables dbp12 and sbp12. The f=8.1 tells SAS to display the mean as a column of 8 characters and display the mean with one decimal. The RTS option sets the number of spaces in the first column (where only labels are placed and not any data) We add label statements for each variable and apply a format for group so that the formatted values are displayed rather than the values 1-6.

3 Two Structures for Data
3 obs/patient There are two basic ways how longitudinal data can be structured. The first way is for all the data to be on one row. In the illustration here we see subject A001 has three longitudinal values of blood pressure and three sequential dates; all the data is on one row. Often times databases are structures as in the second form shown here, where subjects have multiple rows of data, one row per visit. This second structure is useful for producing various listings, for example, listing all occurrences where a patient has an elevated blood pressure. This format is sometimes also necessary for doing some types of multivariate statistical analyses. However, if you want to compare values within a person, you need to have the data on the same row, so you can compute a variable that is, say, the difference between BPs on sequential dates. We will see how to go from one format to another using SAS. Advantages for each format

4 Restructure Data Read in multiple observations per patient and write out one observation per patient Read in one observation per patient and write out multiple observations per patient So here are the two tasks, the first is going from multiple observations per patient to one, and the second going from one observation to multiple observations. We will first look at how to go from multiple records to a single record.

5 Restructuring Datasets
Data for a patient sometimes comes as multiple observations ptid visit date weight id /15/ id /18/ id /13/ id /14/ id /18/ id /13/ Wish to collapse data so all information for a patient is contained on 1 observation In this example, each patient has three observations (rows), with a visit code, the date of the visit and patient weight recorded. This structure is useful for some things including entering the data into a database, but what if we want to compute the average weight across visits, or the change in weight between visits. Well, to do that all the weights for a patient have to be on the same observation. So we need to have a way to restructure the dataset so that all the data for a patient is on the same observation or row.

6 Program 16 Collapsing multiple observations per patient to one observation per patient DATA visit; INFILE ‘C:\SAS_Files\multrec.data'; INPUT ptid $ visit group weight sbp; RUN; PROC SORT DATA=visit; BY ptid visit; PROC PRINT DATA = visit ; TITLE 'Display of Mult obs/patient dataset '; ** Let’s see how SAS can be used to do that. In program 16 we read in data from a file called multrec.data. This data is structured as one row per patient visit. The data is sorted by the patient id, and then within id by visit number. To see how the dataset looks we use PROC PRINT.

7 Display of Mult obs/patient dataset
Obs ptid visit group weight sbp 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 A 9 A 10 A This is the display. The first patient has 4 rows, the second patient has 2, and the third patient has 4. The second patient missed two visits so no record is there. We want to restructure the data to look like the following as shown on the next slide.

8 Want dataset to look like this:
Obs ptid group wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12 1 A 2 A 3 A 4 A 5 A 6 A 7 A Here each weight and BP is placed into separate variables. Note also that we have missing values for variables corresponding to missing visits. Here patient A00301, which has no data for 9 and 12 months, has missing values for weight and systolic BP at these visits. That is what we would want. How can we do this? SAS programmers have come up with several ways to do this. I will show you a couple of ways. Several ways to do this.

9 DATA visit3 visit6 visit9 visit12; SET visit;
/******************************************************* Here is one solution. Separate datasets are created for each visit and then merged. The visit-varying variables are renamed using PROC DATASETS. ********************************************************/ DATA visit3 visit6 visit9 visit12; SET visit; if visit = 3 then output visit3; else if visit = 6 then output visit6; else if visit = 9 then output visit9; else if visit =12 then output visit12; run; There is a solution that is perhaps the easiest if the number of visits is not too many. The method is to create separate datasets for each visit and then merge all the datasets together. We have seen how to create multiple datasets in one data step in a previous example involving creating separate datasets for men and women. We do the same thing here. We create four datasets, one for each visit. The visit 3 data will be sent to the dataset visit3, the visit 6 data will be sent to the dataset visit 6, and so on. These all have one row per id

10 * Need to rename variables to have unique names; PROC DATASETS;
MODIFY visit3; RENAME weight = weight3 sbp = sbp3; MODIFY visit6; RENAME weight = weight6 sbp = sbp6; MODIFY visit9; RENAME weight = weight9 sbp = sbp9; MODIFY visit12; RENAME weight = weight12 sbp = sbp12; RUN; DATA patient; MERGE visit3 visit6 visit9 visit12; BY ptid; PROC PRINT; TITLE 'Desired Dataset Using Separate Dataset Technique'; Procedure to add/change names, labels, or formats We would like to now merge the datasets by patient ID. However the names of the variables for weight and BP are the same on each dataset. So if we merge the datasets we will only get one of the values (the last one in the merge list). So we first have to rename all the visit varying variables. We can do that using the utility procedure called PROC DATASETS. This procedure can be used for several things including renaming variables. We use the MODIFY statement to first indicate the dataset we want to modify. We then use the rename statement to rename the weight and BP variables, adding a numeric suffix to each name to indicate the visit. Note we do not need to rename the variables that are fixed like variable group. Since these have the same values on each dataset we can just let the merge statement pick the last one. Lastly, we merge the four datasets by patient ID in a new data step. The resulting dataset will have one observation per patient, with all the data for this patient on this observation.

11 PROC PRINT DATA=patient (OBS=7); TITLE '1-Obs per patient dataset';
RUN; 1-Obs per patient dataset Obs ptid group wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12 1 A 2 A 3 A 4 A 5 A 6 A 7 A Here is a display of the new dataset for the first 7 subjects. Note that subject A00301 which did not attend the 9 or 12 month visit has missing values for data at these visits. Our technique accomplished that because when SAS merges datasets if a subject is not in one of the datasets then the variables from that dataset are set to missing.

12 A General Way Using Data Step
Read-in 1st obs for a patient Create new variables to hold values of weight and BP from 1st obs of patient Repeat above 2 steps for 2nd obs of patient When all data for a patient is complete then output variables to dataset The multiple datasets – merge solution works pretty well. However, I want to illustrate a more general way using the data step. The program code to do this, although fairly short, requires the understanding of some new statements and how the DATA step works. Plan on spending some time on this. I will outline the structure of the program and then look at the program itself. The program will flow as follows. We will read in the first observation for a patient and create new variables to hold the first weights and blood pressures. We will not output this observation yet but go back and read in the second observation and create new variables for this observation (with different names than for the first observation). We continue until all the data for a patient in read in. Only then will we output the observation to the new dataset. We continue this process for all patients. Let’s look at the code to do this.

13 RETAIN wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12;
PROGRAM 16 DATA patient; SET visit; BY ptid; RETAIN wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12; if FIRST.ptid = 1 then do; wt3=.; wt6=.; wt9=.; wt12=.; sbp3=.; sbp6=.; sbp9=.; sbp12=.; end; if visit = 3 then do ; wt3 = weight; sbp3 = sbp; if visit = 6 then do ; wt6 = weight; sbp6 = sbp; if visit = 9 then do ; wt9 = weight; sbp9 = sbp; if visit = 12 then do ; wt12 = weight; sbp12 = sbp; if LAST.ptid = 1 then OUTPUT; KEEP ptid group sbp3 sbp6 sbp9 sbp12 wt3 wt6 wt9 wt12; Here is the complete program. Note that it is not long. There are some new statements that we use that are important. The BY statement in combination with the SET statement will tell us whether we are processing the first or last observation for a patient. The RETAIN statement relates to holding the value of new variables between loops in the DATA step. The OUTPUT statement tells SAS when to output the observation. We will look at these statements in more detail.

14 Creates two special variables called FIRST.ptid
SET visit; BY ptid; Creates two special variables called FIRST.ptid = 1 when reading first obs for a patient = 0 otherwise LAST.ptid = 1 when reading last obs for a patient With these variables you can know if you are processing the first or last observation for a patient Here is a usual SET statement with a BY statement following. Remember the SET statement brings in an observation from a SAS dataset. If we follow this with a BY statement then SAS, in addition to bringing in the observation, creates two special logical variables named (in our case) first.ptid and last.ptid. FIRST.ptid is set to one when SAS is processing the first observation for a patient. Similarly, LAST.ptid is set to one if SAS is processing the last observation for a patient. Can you see how these variables might help? We will see how in a minute. Some of you smart folks have noted that these new variables contain a period within their name which supposed to be illegal. I guess the answer to that is if SAS creates the variables they can use whatever rules they want. They use the period so it is easy to spot these special variables.

15 RETAIN wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12;
* Tells SAS not to reset these variables to missing when going to top of datastep; RETAIN wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12; * Set variables to missing when reading new patient – clear previous patients data!; if FIRST.ptid = 1 then do; wt3=.; wt6=.; wt9=.; wt12=.; sbp3=.; sbp6=.; sbp9=.; sbp12=.; end; The RETAIN statement tells SAS not to reset the variables listed to missing when SAS goes back to the top of the DATA step. We need this statement: otherwise we would lose the information from the previous observation, because by default SAS sets the new variables to missing between loops of the DATA step. However, because we do this we will need to set these new variables to missing when a new patient is processed. This clears the previous patient’s data values. If we did not do this we might use the previous patient’s value for the new patient.

16 * Assign variables depending on visit; if visit = 3 then do ;
wt3 = weight; sbp3 = sbp; end; if visit = 6 then do ; wt6 = weight; sbp6 = sbp; if visit = 9 then do ; wt9 = weight; sbp9 = sbp; if visit = 12 then do ; wt12 = weight; sbp12 = sbp; *Output variables only when done with patient; if LAST.ptid = 1 then OUTPUT; Continuing the program we next define new variables to hold the values for weight and blood pressure, depending on the visit. We use a series of IF/THEN statements to accomplish this. The DO keyword is associated with the END keyword. The statement IF visit = 3 then DO tells SAS to process the statements that follow up to the end statement. Thus, when visit equals 3 the two statements in between the DO and END will be executed. Here wt3 and sbp3 are assigned to the variables weight and sbp. Note only one of the DO/END statements will be executed for each observation, which one depends on the value of visit. ** Not recorded The last statement tells SAS to output the observation (with all new variables) only when we are done processing the last observation for a patient. Here we use the special SAS variable LAST.PTID. If not, the program goes back to the top of the DATA step and processes the next observation for the patient. Study this program and see if you can understand how it works. Use the data from the first couple of patients shown before, and mentally go through the DATA step – step by step - with each observation.

17 PROC PRINT DATA=patient (OBS=7); TITLE '1-Obs per patient dataset';
RUN; 1-Obs per patient dataset Obs ptid group wt3 wt6 wt9 wt12 sbp3 sbp6 sbp9 sbp12 1 A 2 A 3 A 4 A 5 A 6 A 7 A When you do something this complicated you will certainly want to check that the dataset created is correct. Here we display the first 7 observations of the new dataset. We can verify that it is correct. We get the same results as using the previous method. If you understand how this program works then you can consider yourself to be a more advanced SAS programmer. The data step is very powerful and understanding the things it can do can come in handy.

18 PROC TRANSPOSE DATA=visit out=wtdata PREFIX = wt; BY ptid; ID visit;
VAR weight ; PROC PRINT DATA=wtdata; TITLE 'Weights For Patient On Same Record'; Weights For Patient On Same Record Obs ptid _NAME_ wt3 wt6 wt9 wt12 1 A weight 2 A weight 3 A weight 4 A weight 5 A weight 6 A weight 7 A weight PROC TRANSPOSE can sometimes help you restructure datasets There is a procedure called PROC TRANSPOSE that can be used to help restructure SAS datasets. This procedure interchanges rows and columns of a dataset. This works well if you have only one variable to transpose (weight here). However with multiple variables it requires multiple PROC TRANSPOSE’s and/or addition DATA steps, making it perhaps more complicated that the DATA step method just described. In this example the proc print display should clue you in to how this procedure works. The BY statement makes a separate row for each id. The variable transformed is weight. The ID statement tells SAS to use the value of visit as a suffix to the names of the weight variables. The prefix WT tells SAS to name the variables starting with WT. In general, if SAS has a utility procedure such as PROC TRANSPOSE, that can help you do the hard work then use it. However, DATA step processing will always be a more general method, without limitations.

19 Going the Other Way From: A001 125 130 140 01/15/95 07/15/95 01/15/96
To: A /15/95 A /15/95 A /15/96 3 obs/patient Sometimes we may want to go the other way – we have a dataset with one record per patient and we want to change it so that the dataset has multiple observations per patient. This format is useful for producing various listings, for example, listing all occurrences where a patient has an elevated blood pressure. This format is sometimes also necessary for doing some types of multivariate statistical analyses. You will see that the SAS code needed here is much easier than going the other way.

20 Program 17 DATA multrec ; INFILE ‘C:\SAS_Files\' OBS=8 ; INPUT
@ ptid $10. @ rdate mmddyy10. @ group @ date mmddyy10. @ date mmddyy10. @ sbpbl @ sbp @ sbp ; ** Recall that the TOMHS dataset has one observation per patient, with multiple dates of visits, blood pressures, etc. all on that one observation. In program 17 we start by reading in variables corresponding to systolic BP from three visits along with the corresponding dates. We also read-in the patient ID and the treatment group. For illustration we will only read-in data for the first 8 patients. Our goal is to write-out 3 observations for each patient, the first observation will contain the first visit (baseline) data, the second observation will contain the second visit data (month 6), and the third observation will contain the third visit data (month 12). Let’s see how we can do that.

21 * Creating 3 observations per patient; sbp = sbpbl; visit = '00';
datev = rdate; OUTPUT; *output 1st obs for patient; sbp = sbp6; *change values visit = '06'; datev = date6; OUTPUT; *output 2nd obs with new values; sbp = sbp12; visit = '12'; datev = date12; *output 3rd obs and values OUTPUT; KEEP ptid group sbp visit datev; Here is the next section of the program. Note two things: 1) there are three sets of similar code containing four statements each. The first set is in the box. 2) note the OUTPUT statement at the end of each block. We learned before that the OUTPUT statement will write the current values of all variables to the dataset we are creating. Since there are three of them there will be three observations written for each one read. Now let’s look at the code in the first block. We first create a new variable called sbp that is assigned the baseline systolic BP (variable sbpbl). We then create a character variable called visit. We set the value to “00” to indicate the basline (or the 0 visit). We then create a new variable called datev that is assigned the baseline visit date (variable rdate). We follow this with the output statement which writes out an observation to the dataset. For this observation the 3 new variables contain the “baseline” information. In the next block of code we then change the values for sbp, visit, and datev. We assign the 6-month data to these variables. We needn’t worry that we changed the values – the baseline values are safely on the SAS dataset. We then output these values with the second OUTPUT statement. We then repeat this process a third time for the 12-month data. If there were more visits there would be more blocks of similar statements. Finally, the KEEP statement tells SAS to only include the listed variables to the SAS dataset. We include only the new variables plus the ID and group. We don’t need the old SBP variables, they are all contained in the variable SBP, but on different records. Note that we created the visit variable to identify when that row of data was collected.

22 Listing of Dataset With Multiple Observations Per Patient
PROC SORT DATA=multrec; BY ptid visit; PROC PRINT DATA=multrec; VAR ptid group sbp visit datev; RUN; Listing of Dataset With Multiple Observations Per Patient Obs ptid group sbp visit datev 1 A /23/88 2 A /15/88 3 A /13/89 4 B /05/86 5 B /03/87 6 B /03/87 7 B /21/87 8 B /24/87 9 B 10 B /12/87 11 B /14/88 12 B /16/88 Let’s see what the new dataset looks like. To do this we run a PROC PRINT. Note each patient has 3 observations. The 3 systolic BPs and dates are now strung down. The variable visit was created to keep tract of which visit the data came from. Note patient B00644 has missing data for visit 12. This meant that variables sbp12 and date12 were missing on the original dataset for this patient. If we did not want that “empty” record on the dataset we could have used a conditional output statement. So now we have the dataset like we want.

23 Observations With SBP > 140
PROC PRINT DATA=multrec; VAR ptid group sbp visit datev; WHERE sbp > 140 ; TITLE 'Observations With SBP > 140'; RUN; Observations With SBP > 140 Obs ptid group sbp visit datev 1 A /23/88 4 B /05/86 7 B /21/87 12 B /16/88 16 B /12/87 19 C /10/87 23 D /23/88 24 D /30/88 Now let’s see some applications of having the dataset structured in this way. Well, to display patients that have a systolic BP above 140 at some visit we simply have to run a PROC PRINT with a WHERE statement. SAS will filter the rows and display values only when the WHERE statement is true, in our case when sbp > 140. We see here that patient D01348 had a systolic BP above 140 at both the 6 and 12 month visits (but not at baseline). Note this listing would be difficult to display if you were working with the one-record per patient dataset.

24 Listing of Highest SBP for Each Patient
PROC SORT; BY ptid DESCENDING sbp; DATA highbp; SET multrec; BY ptid; if FIRST.ptid; *Select the first obs for each patient - highest BP; PROC PRINT DATA=highbp; VAR ptid group sbp visit datev; TITLE 'Listing of Highest SBP for Each Patient'; RUN; Listing of Highest SBP for Each Patient Obs ptid group sbp visit datev 1 A /23/88 2 B /05/86 3 B /21/87 4 B /16/88 Next, suppose we want to display the highest systolic BP for each patient. Well, using a few tricks we have learned we can do this pretty easily. First we will sort the dataset by patient ID and within ID by highest to lowest systolic BP (using the DESCENDING option). This is an example of using a two level sort. Next, we run a little DATA step that picks off the first observation for each patient using FIRST. and LAST. variables created using the BY statement after the SET statement. The BY statement here will create new variables called FIRST.ptid and LAST.ptid. FIRST.ptid will have a value of 1 if the observation being processed is the first for the patient. The logical statement IF FIRST.ptid tells SAS to only go on in the DATA step and output the observation if FIRST.ptid = 1. Since we sorted the data from highest to lowest BP within patient then the first observation will be the one with the highest BP. We then display the highest systolic BP for each patient in the PROC PRINT. What would we have got if we replaced the logical statement FIRST.ptid with LAST.ptid? You guessed it – you would get the observation with the lowest systolic BP for each patient. You can see with proper sorting we can use this technique for many applications, for example, listing the last time a patient came in for a visit.

25 LIBNAME class ‘C:\SAS_Files'; DATA se;
Program 18 *Want to know the most common side effects ; LIBNAME class ‘C:\SAS_Files'; DATA se; SET class.tomhs (KEEP = ptid group clinic sex se12_1-se12_20 ); sidenum = 1; severity = se12_1; if severity > 1 then output; sidenum = 2; severity = se12_2; Well in program 18, instead of the most common cars, we want to list the most common side-effects reported by TOMHS patients. Unlike the car data where you listed up to three cars, there is a variable for each of the 20 side-effect. Patients were asked about each of 20 conditions and so a variable for each is included, even if patient was not bothered by the condition. A response of 1 means no side-effect for that condition and responses of 2-4 indicate a side-effect. In program 18 we read-in the 20 side-effect variables (names se12_1 through se12_20) along with patient ID, group, clinical center, and sex. We use the same technique as in the car example but we want to keep track of the side-effect number because each number corresponds to a different side-effect. We start by assigning the new variable sidenum to 1, corresponding to drowsiness. We set a new variable severity to the value of side-effect 1. This variable will be the common variable which will hold the values of all the side-effects. We then output an observation only if the severity is greater than 1, which means the side-effect was present. We repeat this same code 20 times, once for each side-effect. To shorten the code we will use arrays as illustrated next.

26 *Want to know the most common side effects ;
Program 18 *Want to know the most common side effects ; LIBNAME class ‘C:\SAS_Files'; DATA se; SET class.tomhsp (KEEP = ptid group clinic sex se12_1-se12_20 ); ARRAY se(20) se12_1 - se12_20; DO sidenum = 1 to 20; severity = se(sidenum); if severity > 1 then OUTPUT; output only if have se; END; KEEP ptid sidenum group severity; RUN; We place the 20 side-effect variables into an array called se. We then loop through the 20 variables with the DO/END statements. This will produce the same dataset as using 20 blocks of code shown in the previous slide. A record is output only for positive side-effects, values greater than 1. We use the KEEP statement to write to the dataset only the new variables plus ptid and group.

27 List of Side Effects by Patients - First 20
PROC PRINT DATA=se (OBS=20); TITLE 'List of Side Effects by Patient'; RUN; List of Side Effects by Patients - First 20 Obs ptid group sidenum severity A A A A A A A A A A Note: Number of obs on dataset is total number of se How many observations per patient will this dataset have? 20 per patient? No, since we output only when the side-effect is present there will be one observation per side-effect. If a patient had no side-effects among the 20 possibilities then no observations for this patient would be on the dataset. The total number of observations on the dataset is the total number of side-effects. Here we display the dataset using PROC PRINT. Patient A0083 had just one side-effect, number 13 (joint pain) with a mild severity (severity = 2). Patient A00301 had three side-effects, number 2 (tiredness), number 13 (joint pain), and number 18 (wake up early).

28 PROC FORMAT; VALUE setype 1='Drowsiness' 2='Tiredness' 3='Faintness'
4='Itchy Skin' 5='Skin Rash' 6='Headaches' 7='Ringing in Ears' 8='Stuffy Nose' 9='Dry Mouth' 10='Cough' 11='Fast Heart Rate' 12='Chest Pain' 13='Joint Pain' 14='Swelling Feet' 15='Muscle Cramps' 16='Numbness' 17='Trouble Sleeping' 18='Wake Up Early' 19='Mood Changes' 20='Depressed'; We can define a format for the side-effect number so that we don’t have to constantly refer to the form. Here we define a format called setype which assigns a text description for the 20 side-effects.

29 FORMAT sidenum setype. ;
PROC FREQ DATA=se ; TABLES sidenum; FORMAT sidenum setype. ; TITLE 'Number of Patients With Indicated Side Effect In Order on Form'; RUN; PROC FREQ DATA=se ORDER = FREQ ; TITLE 'Number of Patients With Indicated Side Effect - In Order of Frequency'; Remember, we wanted to know the most common side-effect. We can get a complete frequency distribution of side-effects from a PROC FREQ on the variable sidenum. We assign the format we created so the descriptions will display. The default order displayed is the side effect number. If we want to list the side-effects in order of frequency we add the ORDER=FREQ option to the PROC FREQ statement as shown here.

30 Number of Patients With Indicated Side Effect In Order on Form
Cum Cum sidenum Frequency Percent Frequency Percent Drowsiness Tiredness Faintness Itchy Skin Skin Rash Headaches Ringing in Ears Stuffy Nose Dry Mouth Cough Fast Heart Rate Chest Pain Joint Pain Swelling Feet Muscle Cramps Numbness Trouble Sleeping Wake Up Early Mood Changes Depressed Here is the complete distribution displayed using the first PROC FREQ. We see that there are 249 total side-effects reported. If we look down the list we see that joint pain is the most common.

31 Number of Patients With Indicated Side Effect - In Order of Frequency
The FREQ Procedure Cumulative Cumulative sidenum Frequency Percent Frequency Percent Joint Pain Stuffy Nose Headaches Tiredness Depressed Wake Up Early Cough Itchy Skin Drowsiness Muscle Cramps Ringing in Ears Dry Mouth Numbness Trouble Sleeping Mood Changes Skin Rash Chest Pain Fast Heart Rate Faintness Swelling Feet With the ORDER=FREQ option we get the frequency distribution by order of frequency. Here joint pain is listed on top. 14% of all side-effects reported were because of joint pain. This here is a nice summary of the side-effect data.

32 PROC PRINT DATA= se NOOBS ; VAR ptid group sidenum;
FORMAT sidenum setype. group group.; WHERE severity = 4; TITLE 'List of Patients With a Severe Side Effect'; RUN; What if we wanted a list of all patients with any severe side-effect. This is easily done with the WHERE statement as seen here. We also add a format for the variable group.

33 List of Patients With a Severe Side Effect
ptid group sidenum A Calcium Channel Blocker Stuffy Nose A Placebo Headaches A Placebo Joint Pain B Placebo Chest Pain B Calcium Channel Blocker Drowsiness B Calcium Channel Blocker Tiredness B Calcium Channel Blocker Joint Pain C Calcium Channel Blocker Joint Pain C ACE Inhibitor Muscle Cramps C ACE Inhibitor Itchy Skin C ACE Inhibitor Skin Rash D Calcium Channel Blocker Trouble Sleeping D Calcium Channel Blocker Wake Up Early Here is the listing. Patient A00608 in the calcium channel blocker group experienced a severe stuffy nose. Patient D01348 had two severe side-effects – trouble sleeping and waking up early. Changing the structure of datasets is a common task when using SAS. I hope you caught on to these techniques, they may prove useful when doing your analyses.


Download ppt "Lesson 9 - Topics Restructuring datasets LSB: 6:14"

Similar presentations


Ads by Google