Discussion Week 1 (4/1/13 – 4/5/13) Biostat 513 Discussion Week 1 (4/1/13 – 4/5/13)
Aims Review key Stata commands OR RR RD Data manipulation Categorical data OR RR RD Data manipulation
Stata tab, tabi – basic tabling cs, csi – analysis of prospective/cross-sectional studies, single binary covariate cc, cci – analysis of case-control studies, single binary covariate mcc, mcci – analysis of matched data epitab – “help epitab” provides summary of most relevant commands expand – expand summary dataset reshape – convert between long and wide formats
UGDP The ugdp.dta dataset describes the results from a drug trial among diabetics. The exposure (exposed) is tolbutamide, and the outcome (case) is death within a fixed time period. The dataset is provided in tabular form with pop indicating the number of subjects in each cell.
use http://courses. washington. edu/b513/datasets/ugdp. dta . use http://courses.washington.edu/b513/datasets/ugdp.dta . list +----------------------------+ | age case exposed pop | |----------------------------| 1. | <55 0 0 115 | 2. | <55 0 1 98 | 3. | <55 1 0 5 | 4. | <55 1 1 8 | 5. | 55+ 0 0 69 | 6. | 55+ 0 1 76 | 7. | 55+ 1 0 16 | 8. | 55+ 1 1 22 |
use http://courses. washington. edu/b513/datasets/ugdp. dta . use http://courses.washington.edu/b513/datasets/ugdp.dta . list +----------------------------+ | age case exposed pop | |----------------------------| 1. | <55 0 0 115 | 2. | <55 0 1 98 | 3. | <55 1 0 5 | 4. | <55 1 1 8 | 5. | 55+ 0 0 69 | 6. | 55+ 0 1 76 | 7. | 55+ 1 0 16 | 8. | 55+ 1 1 22 | . tab case exposed | exposed case | 0 1 | Total -----------+----------------------+---------- 0 | 2 2 | 4 1 | 2 2 | 4 Total | 4 4 | 8
use http://courses. washington. edu/b513/datasets/ugdp. dta . use http://courses.washington.edu/b513/datasets/ugdp.dta . list +----------------------------+ | age case exposed pop | |----------------------------| 1. | <55 0 0 115 | 2. | <55 0 1 98 | 3. | <55 1 0 5 | 4. | <55 1 1 8 | 5. | 55+ 0 0 69 | 6. | 55+ 0 1 76 | 7. | 55+ 1 0 16 | 8. | 55+ 1 1 22 | . tab case exposed | exposed case | 0 1 | Total -----------+----------------------+---------- 0 | 2 2 | 4 1 | 2 2 | 4 Total | 4 4 | 8
use http://courses. washington. edu/b513/datasets/ugdp. dta . use http://courses.washington.edu/b513/datasets/ugdp.dta . list +----------------------------+ | age case exposed pop | |----------------------------| 1. | <55 0 0 115 | 2. | <55 0 1 98 | 3. | <55 1 0 5 | 4. | <55 1 1 8 | 5. | 55+ 0 0 69 | 6. | 55+ 0 1 76 | 7. | 55+ 1 0 16 | 8. | 55+ 1 1 22 | . tab case exposed [freq=pop] | exposed case | 0 1 | Total -----------+----------------------+---------- 0 | 184 174 | 358 1 | 21 30 | 51 Total | 205 204 | 409
Common options for tab: by, chi2, exact, row, col, missing . expand pop (401 observations created) . tab case exposed | exposed case | 0 1 | Total -----------+----------------------+---------- 0 | 184 174 | 358 1 | 21 30 | 51 Total | 205 204 | 409 . tabi 30 21 \ 174 184 | col row | 1 2 | Total 1 | 30 21 | 51 2 | 174 184 | 358 Total | 204 205 | 409 Fisher's exact = 0.181 1-sided Fisher's exact = 0.112 Common options for tab: by, chi2, exact, row, col, missing
. bysort age: tab case exposed ----------------------------------------------------------------------------------- -> age = <55 | exposed case | 0 1 | Total -----------+----------------------+---------- 0 | 115 98 | 213 1 | 5 8 | 13 Total | 120 106 | 226 -> age = 55+ 0 | 69 76 | 145 1 | 16 22 | 38 Total | 85 98 | 183
Why choose column percents? . tab case exposed, chi2 exact col +-------------------+ | Key | |-------------------| | frequency | | column percentage | | exposed case | 0 1 | Total -----------+----------------------+---------- 0 | 184 174 | 358 | 89.76 85.29 | 87.53 1 | 21 30 | 51 | 10.24 14.71 | 12.47 Total | 205 204 | 409 | 100.00 100.00 | 100.00 Pearson chi2(1) = 1.8651 Pr = 0.172 Fisher's exact = 0.181 1-sided Fisher's exact = 0.112 Why choose column percents? Which Fisher’s test corresponds to the chi-squared test?
Why choose column percents? P(died | exp) . tab case exposed, chi2 exact col +-------------------+ | Key | |-------------------| | frequency | | column percentage | | exposed case | 0 1 | Total -----------+----------------------+---------- 0 | 184 174 | 358 | 89.76 85.29 | 87.53 1 | 21 30 | 51 | 10.24 14.71 | 12.47 Total | 205 204 | 409 | 100.00 100.00 | 100.00 Pearson chi2(1) = 1.8651 Pr = 0.172 Fisher's exact = 0.181 1-sided Fisher's exact = 0.112 Why choose column percents? P(died | exp) Which Fisher’s test corresponds to the chi-squared test?
Better to use cs or cc for these data? why? Okay or not to use the other? why?
Better to use cs or cc for these data? why? prospective study Okay or not to use the other? why?
Better to use cs or cc for these data? why? prospective study Okay or not to use the other? why? OR is fine to report
Better to use cs or cc for these data? why? Okay or not to use the other? why? . cs case exposed, or | exposed | | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 30 21 | 51 Noncases | 174 184 | 358 Total | 204 205 | 409 | | Risk | .1470588 .102439 | .1246944 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0446198 | -.0192936 .1085332 Risk ratio | 1.435574 | .8510221 2.421645 Attr. frac. ex. | .3034146 | -.1750577 .5870576 Attr. frac. pop | .1784792 | Odds ratio | 1.510673 | .8381198 2.722012 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Note “exposed, case” in upper left. Interpret OR and RR.
Predict the RD, RR and OR for each of these csi <exposed cases> <unexposed cases> <exposed controls> <unexposed controls> . csi 30 21 174 184, or | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 30 21 | 51 Noncases | 174 184 | 358 Total | 204 205 | 409 | | Risk | .1470588 .102439 | .1246944 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0446198 | -.0192936 .1085332 Risk ratio | 1.435574 | .8510221 2.421645 Attr. frac. ex. | .3034146 | -.1750577 .5870576 Attr. frac. pop | .1784792 | Odds ratio | 1.510673 | .8381198 2.722012 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Predict the RD, RR and OR for each of these . csi 21 30 184 174 /* switch exposed and unexposed */ . csi 174 184 30 21 /* switch death and no death */ . csi 184 174 21 30 /* switch both */
Predict the RD, RR and OR for each of these RD RR OR csi <exposed cases> <unexposed cases> <exposed controls> <unexposed controls> . csi 30 21 174 184, or | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 30 21 | 51 Noncases | 174 184 | 358 Total | 204 205 | 409 | | Risk | .1470588 .102439 | .1246944 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0446198 | -.0192936 .1085332 Risk ratio | 1.435574 | .8510221 2.421645 Attr. frac. ex. | .3034146 | -.1750577 .5870576 Attr. frac. pop | .1784792 | Odds ratio | 1.510673 | .8381198 2.722012 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Predict the RD, RR and OR for each of these RD RR OR . csi 21 30 184 174 /* switch exposed and unexposed */ -.045 1/1.44 1/1.5 . csi 174 184 30 21 /* switch death and no death */ . csi 184 174 21 30 /* switch both */
Predict the RD, RR and OR for each of these RD RR OR csi <exposed cases> <unexposed cases> <exposed controls> <unexposed controls> . csi 30 21 174 184, or | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 30 21 | 51 Noncases | 174 184 | 358 Total | 204 205 | 409 | | Risk | .1470588 .102439 | .1246944 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0446198 | -.0192936 .1085332 Risk ratio | 1.435574 | .8510221 2.421645 Attr. frac. ex. | .3034146 | -.1750577 .5870576 Attr. frac. pop | .1784792 | Odds ratio | 1.510673 | .8381198 2.722012 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Predict the RD, RR and OR for each of these RD RR OR . csi 21 30 184 174 /* switch exposed and unexposed */ -.045 1/1.44 1/1.5 . csi 174 184 30 21 /* switch death and no death */ -.045 ??? 1/1.5 . csi 184 174 21 30 /* switch both */
Predict the RD, RR and OR for each of these RD RR OR csi <exposed cases> <unexposed cases> <exposed controls> <unexposed controls> . csi 30 21 174 184, or | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 30 21 | 51 Noncases | 174 184 | 358 Total | 204 205 | 409 | | Risk | .1470588 .102439 | .1246944 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0446198 | -.0192936 .1085332 Risk ratio | 1.435574 | .8510221 2.421645 Attr. frac. ex. | .3034146 | -.1750577 .5870576 Attr. frac. pop | .1784792 | Odds ratio | 1.510673 | .8381198 2.722012 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Predict the RD, RR and OR for each of these RD RR OR . csi 21 30 184 174 /* switch exposed and unexposed */ -.045 1/1.44 1/1.5 . csi 174 184 30 21 /* switch death and no death */ -.045 ??? 1/1.5 . csi 184 174 21 30 /* switch both */ .045 ??? 1.5
Which of these statements is correct? | Exposed Unexposed | Total -----------------+------------------------+------------ Alive | 174 184 | 358 Died | 30 21 | 51 Total | 204 205 | 409 | | Risk | .8529412 .897561 | .8753056 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | -.0446198 | -.1085332 .0192936 Risk ratio | .9502877 | .8830483 1.022647 Prev. frac. ex. | .0497123 | -.022647 .1169517 Prev. frac. pop | .0247954 | Odds ratio | .6619565 | .3673752 1.193147 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Which of these statements is correct? The odds of death among those exposed to tolbutamide is 1.51 (1/.662) times the odds of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is 1.05 (1/.950) times the risk of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is ~1.51 (1/.662) times the risk of death among those not exposed to tolbutamide
Which of these statements is correct? | Exposed Unexposed | Total -----------------+------------------------+------------ Alive | 174 184 | 358 Died | 30 21 | 51 Total | 204 205 | 409 | | Risk | .8529412 .897561 | .8753056 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | -.0446198 | -.1085332 .0192936 Risk ratio | .9502877 | .8830483 1.022647 Prev. frac. ex. | .0497123 | -.022647 .1169517 Prev. frac. pop | .0247954 | Odds ratio | .6619565 | .3673752 1.193147 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Which of these statements is correct? The odds of death among those exposed to tolbutamide is 1.51 (1/.662) times the odds of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is 1.05 (1/.950) times the risk of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is ~1.51 (1/.662) times the risk of death among those not exposed to tolbutamide
Which of these statements is correct? | Exposed Unexposed | Total -----------------+------------------------+------------ Alive | 174 184 | 358 Died | 30 21 | 51 Total | 204 205 | 409 | | Risk | .8529412 .897561 | .8753056 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | -.0446198 | -.1085332 .0192936 Risk ratio | .9502877 | .8830483 1.022647 Prev. frac. ex. | .0497123 | -.022647 .1169517 Prev. frac. pop | .0247954 | Odds ratio | .6619565 | .3673752 1.193147 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Which of these statements is correct? The odds of death among those exposed to tolbutamide is 1.51 (1/.662) times the odds of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is 1.05 (1/.950) times the risk of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is ~1.51 (1/.662) times the risk of death among those not exposed to tolbutamide
Which of these statements is correct? | Exposed Unexposed | Total -----------------+------------------------+------------ Alive | 174 184 | 358 Died | 30 21 | 51 Total | 204 205 | 409 | | Risk | .8529412 .897561 | .8753056 | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | -.0446198 | -.1085332 .0192936 Risk ratio | .9502877 | .8830483 1.022647 Prev. frac. ex. | .0497123 | -.022647 .1169517 Prev. frac. pop | .0247954 | Odds ratio | .6619565 | .3673752 1.193147 (Cornfield) +------------------------------------------------- chi2(1) = 1.87 Pr>chi2 = 0.1720 Which of these statements is correct? The odds of death among those exposed to tolbutamide is 1.51 (1/.662) times the odds of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is 1.05 (1/.950) times the risk of death among those not exposed to tolbutamide The risk of death among those exposed to tolbutamide is ~1.51 (1/.662) times the risk of death among those not exposed to tolbutamide
HIVNET VPS 750 individuals participating in an HIV vaccine preparedness study were administered a questionnaire at enrollment and after 6 months. Between the two questionnaires, all subjects participated in an educational program about HIV and vaccines. We focus on a single question, asking about the safety of an HIV vaccine (coded 1=correct answer, 0=incorrect).
What’s wrong with this analysis? . use http://courses.washington.edu/b513/datasets/hivnet.dta . tab q4safe0 q4safe0 | Freq. Percent Cum. ------------+----------------------------------- 0 | 331 44.13 44.13 1 | 419 55.87 100.00 Total | 750 100.00 . tab q4safe6 q4safe6 | Freq. Percent Cum. 0 | 254 33.87 33.87 1 | 496 66.13 100.00 . cci 496 419 254 331 Proportion | month 6 month 0 | Total Exposed -----------------+------------------------+------------------------ correct | 496 419 | 915 0.5421 incorrect | 254 331 | 585 0.4342 Total | 750 750 | 1500 0.5000 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 1.542631 | 1.244963 1.911765 (exact) Attr. frac. ex. | .3517567 | .1967632 .4769231 (exact) Attr. frac. pop | .190679 | +------------------------------------------------- chi2(1) = 16.61 Pr>chi2 = 0.0000 What’s wrong with this analysis?
What’s wrong with this analysis? Paired data! (McNemar) . use http://courses.washington.edu/b513/datasets/hivnet.dta . tab q4safe0 q4safe0 | Freq. Percent Cum. ------------+----------------------------------- 0 | 331 44.13 44.13 1 | 419 55.87 100.00 Total | 750 100.00 . tab q4safe6 q4safe6 | Freq. Percent Cum. 0 | 254 33.87 33.87 1 | 496 66.13 100.00 . cci 496 419 254 331 Proportion | month 6 month 0 | Total Exposed -----------------+------------------------+------------------------ correct | 496 419 | 915 0.5421 incorrect | 254 331 | 585 0.4342 Total | 750 750 | 1500 0.5000 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 1.542631 | 1.244963 1.911765 (exact) Attr. frac. ex. | .3517567 | .1967632 .4769231 (exact) Attr. frac. pop | .190679 | +------------------------------------------------- chi2(1) = 16.61 Pr>chi2 = 0.0000 What’s wrong with this analysis? Paired data! (McNemar)
Interpret the OR . tab q4safe6 q4safe0 | q4safe0 q4safe6 | 0 1 | Total -----------+----------------------+---------- 0 | 169 85 | 254 1 | 162 334 | 496 Total | 331 419 | 750 . mcci 334 162 85 169 | month 0 | month 6 | correct incorrect | Total -----------------+------------------------+------------ correct | 334 162 | 496 incorrect | 85 169 | 254 Total | 419 331 | 750 McNemar's chi2(1) = 24.00 Prob > chi2 = 0.0000 Exact McNemar significance probability = 0.0000 Proportion with factor Cases .6613333 Controls .5586667 [95% Conf. Interval] --------- -------------------- difference .1026667 .0609249 .1444084 ratio 1.183771 1.106427 1.266522 rel. diff. .2326284 .151107 .3141498 odds ratio 1.905882 1.456995 2.508149 (exact) Interpret the OR
Interpret the OR . tab q4safe6 q4safe0 | q4safe0 q4safe6 | 0 1 | Total -----------+----------------------+---------- 0 | 169 85 | 254 Discordant pairs 1 | 162 334 | 496 Total | 331 419 | 750 . mcci 334 162 85 169 | month 0 | month 6 | correct incorrect | Total -----------------+------------------------+------------ correct | 334 162 | 496 incorrect | 85 169 | 254 Total | 419 331 | 750 McNemar's chi2(1) = 24.00 Prob > chi2 = 0.0000 Exact McNemar significance probability = 0.0000 Proportion with factor Cases .6613333 Controls .5586667 [95% Conf. Interval] --------- -------------------- difference .1026667 .0609249 .1444084 ratio 1.183771 1.106427 1.266522 rel. diff. .2326284 .151107 .3141498 odds ratio 1.905882 1.456995 2.508149 (exact) Interpret the OR
RESHAPE Wide form . input id cd1 cd2 cd3 cd4 cd5 cd6 1. 1 450 423 387 320 349 299 2. 2 187 220 201 177 140 101 3. 3 380 369 348 331 303 329 4. end . list +----------------------------------------+ | id cd1 cd2 cd3 cd4 cd5 cd6 | |----------------------------------------| 1. | 1 450 423 387 320 349 299 | 2. | 2 187 220 201 177 140 101 | 3. | 3 380 369 348 331 303 329 | Wide form
reshape keyword stem, i(unit id) j(newvar) . reshape long cd, i(id) j(visit) reshape keyword stem, i(unit id) j(newvar) . list +------------------+ | id visit cd | |------------------| 1. | 1 1 450 | 2. | 1 2 423 | 3. | 1 3 387 | 4. | 1 4 320 | 5. | 1 5 349 | 6. | 1 6 299 | 7. | 2 1 187 | 8. | 2 2 220 | 9. | 2 3 201 | 10. | 2 4 177 | 11. | 2 5 140 | 12. | 2 6 101 | 13. | 3 1 380 | 14. | 3 2 369 | 15. | 3 3 348 | 16. | 3 4 331 | 17. | 3 5 303 | 18. | 3 6 329 | Long form
reshape keyword stem, i(unit id) j(dropvar) . reshape wide cd, i(id) j(visit) reshape keyword stem, i(unit id) j(dropvar) New variable is stem+dropvar (cd+visit) . list +----------------------------------------+ | id cd1 cd2 cd3 cd4 cd5 cd6 | |----------------------------------------| 1. | 1 450 423 387 320 349 299 | 2. | 2 187 220 201 177 140 101 | 3. | 3 380 369 348 331 303 329 | Wide form