Download presentation
Presentation is loading. Please wait.
Published byVictor Osborn Bates Modified over 9 years ago
1
EPID 623-88 Introduction to Analysis and Interpretation of HIV/STD Data LECTURE 1: Examining Your Data and Steps in Data Analysis Manya Magnus, Ph.D. Summer 2001
2
Lecture Outline The relationship between data, “answers,” and graphics Examining data –Investigating variables and raw data –Plotting –Frequencies/crosstabs Data analysis –Deciding on a question –Dummy (aka: mock) tables –Data management –Data analysis
3
Getting from Point A to Point B How do you go from this: PIDDOBGenderGAMOTCounty 105562/1/9903912 106881/8/9912716 222585/19/9904015 6665611/5/99041113 2882510/29/62138412
4
Getting from Point A to Point B To this?
5
Getting from Point A to Point B Or this: AKLATNFL… PCP20521299… LIP166519105… SBI10292857… Wasting18191375…
6
Getting from Point A to Point B To this?
7
{Please note that these data were made up for the purpose of examples only… they are not real. The slides are real, however, and from the following website if you are interested: http://www.cdc.gov/hiv/graphics.htmhttp://www.cdc.gov/hiv/graphics.htm}
8
Each of the previous slides uses graphic means to communicate data. We are most interested in the data underlying the graphics. The purpose of graphics should be to INFORM and COMMUNICATE effectively and truthfully—not confuse or mislead. Starting with the data themselves is the best way to begin. But having an idea of the question you are asking, or the table or chart you want to present, can be helpful.
9
What question is each of the following slides answering?
10
*>1 diagnosis reported for some children Mycobacterium avium infection Condition Lymphoid interstitial pneumonitis Recurrent bacterial infections HIV wasting syndrome Candida esophagitis Number 2900 2061 1794 1564 1462 % of Cases* 33 24 21 18 17 Pneumocystis carinii pneumonia HIV encephalopathy 137216 Cytomegalovirus disease 83810 Pulmonary candidiasis 4185 Severe herpes simplex infection4225 Cryptosporidiosis 3264 7098 AIDS-Defining Conditions Most Commonly Reported for Children <13 Years of Age, N=8,718, Reported through 1999, United States
11
AIDS in Children <13 Years of Age by Exposure Category Reported in 1999 and Cumulative, United States Exposure Category % 88 1 100 91 4 3 2 Perinatally acquired Transfusion-associated Hemophilia Number 232 2 3 263 1 Other/not reported 2610 1999 Number % 1982-1999 7,943 379 235 8,718 Total 100 161 Cumulative
12
0 10 20 30 40 50 60 70 80 198519871989199119931995 June 1999 Year of Diagnosis Adult/Adolescent AIDS Cases by Exposure Category and Year of Diagnosis, 1985 - June 1999 United States Other includes cases with other or unreported risk exposure. Data adjusted for reporting delays and risk redistribution. 1997 MSM & IDU Men who have sex with men (MSM) Injection drug use (IDU) Other Heterosexual contact Percent of Cases
13
AIDS Cases in Adult/Adolescent Men, Reported July 1998 - June 1999, and Estimated AIDS incidence,* Diagnosed July 1998 – June 1999 by Risk Exposure United States * Data adjusted for reporting delays and estimated proportional redistribution of cases initially reported without risk. Data reported through March 2000. AIDS Incidence Reported July 1998 - June 1999 Estimated AIDS Incidence* Diagnosed July 1998 - June 1999 <1% 1% <1% 45% 21% 5% 8% 53% 13% 27% 6% Risk Exposure Injection drug use (IDU) Men who have sex with men Hemophilia Transfusion Heterosexual contact MSM/IDU Other/not identified
14
36% 62% 28% 31% 40% AIDS Incidence Reported July 1998 - June 1999 Estimated AIDS Incidence* Diagnosed July 1998 - June 1999 2% <1% 1% <1% 1% Injection drug use (IDU) Hemophilia Transfusion Heterosexual contact Other/not identified Risk Exposure * Data adjusted for reporting delays and estimated proportional redistribution of cases initially reported without risk. Data reported through March 2000. AIDS Cases in Adult/Adolescent Women, Reported July 1998 - June 1999, and Estimated AIDS Incidence,* Diagnosed July 1998 - June 1999, by Risk Exposure United States
15
Sample Dataset Human development data (adapted from StataQuest text and disk). Comparing average life expectancy in different groups of countries. From UN Human Development Report dataset.
16
Examining Data What are your data and your variables? –Type—numeric, string, counts –Individual level or aggregate –What do they indicate—labels, questionnaire responses, categories –Which variables are predictors –Which variables are outcomes –Which variables are potential confounders
17
Examining Data (1) Type—numeric, string, counts First assess what type of variables you are working with and how they are stored in your dataset. For each variable in your dataset, see if it is a continuous variable (e.g., age in years), a categorical variable (e.g., age in categories), or a count. How are the variables stored? Are they stored as string or numeric variables?
18
Dataset: Humandev.sav
19
Examining Data (2) Individual level or aggregate data Do the data relate to a person or to a group (e.g., a country)? What is the source of each line of data? Is there more than one line of data per id number?
20
Examining Data (3) What do they indicate—labels, questionnaire responses, categories Where do the variables come from? Are they from a coding guide? If so, have all codes used available for reference. Are they from a survey or questionnaire? If so, have all codes used as well as questions available for reference.
21
Examining Data (4) Which variables are predictors, outcomes, confounders? –Variables are variables—meaning is imposed by research question and researcher –[Lecture on confounders upcoming…] –Helpful to decide in advance which are which –Can organize data to reflect these decisions
22
Examining Data (5) Summarize data first. Mean Median Mode Standard deviation Variance Range, min, max
23
Examining Data (6) Plotting data and examining visually are very important. Look at raw data. Check normality visually and using statistics (kurtosis—peakedness—3==normal, skewness—symmetry—0==normal, etc.). Transform as needed. Check residuals. Remember assumptions for statistical testing.
24
Examining Data (7) Frequencies and crosstabs How many in each category? How many for each value? Looking at how many in each of new grouping. Know which is dependent, which is independent. Study design can dictate whether want row or column percentages. Be careful, because interpretation differs.
25
Steps in data management Make a back up copy first Refer to codebook/card layout Recode missing Make categorical variables from continuous Make summary variables/outcome variables Save your work as you go Confirm that new variables are correctly generated each time; QA
26
Steps in data analysis (1) Decide what question you are asking Plan the analysis on paper –Determine type of analysis –Identify and define outcome, predictors, confounders and effect modifiers for your analysis –Make dummy tables
27
Example of dummy table Trich no (n=) Trich yes (n=) OR (95% CI)P-value Gender Male Female Race AA Non-AA
28
Steps in data analysis (2) Perform Univariate Analysis –Frequencies –Descriptives (means and s.d., medians and ranges) Perform Bivariate Analysis –Do cross-tabulations –Check for confounding Perform multivariate analyses
29
Helpful Hints Save data file frequently Give *.sav, *.spo, *.sps files the same prefix (eg. NIR.sav, NIR.spo) Keep a notebook of what you did so that you remember what the new variables mean and add new variables to your coding guide. Use labels.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.