Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Secondary Datasets

Similar presentations


Presentation on theme: "Using Secondary Datasets"— Presentation transcript:

1 Using Secondary Datasets
Lee Cheng, MD, MSc Joint Primary Care Fellowship Department of Family & Community Medicine The University of Texas Medical School at Houston 2005

2 Definition of Secondary Data
Collected by other studies For the first researcher, they are primary data For the second researcher, they are secondary data Collected by federal, state, and local government agencies, nationally representative surveys Health and health care Social and behavioral data Polices

3 Benefits Cost-effective to test specific hypothesis
Create new research questions Demonstrate new and improved research designs, and analytic approaches

4 Potential Drawbacks Only as good as the research that produced them
Must assume what the authors meant by the terms they used Data may be neither valid nor reliable Instruments or data collection methods may have changed over time Data may have been modified by the researcher already (e.g., weighted)

5 Potential Drawbacks (cont’d)
Poor documentation of the secondary data set Limited access to the data, e.g., on-site only Substantial purchase cost

6 Start with Research Questions
Begin with well-defined research questions and testable hypotheses Identify variables needed Identify the most appropriate data source available to achieve the study's aims

7 Steps in Using Data

8 Verify the Data Check for the following: Proper documentation
The correct number of observations or cases The correct number of variables The correct coding scheme The original summary statistics are reproducible  

9 Determine Variable of Interests
Identify the variables needed Independent variables Dependent variables Confounding variables

10 Study Design of Data Cross-sectional Longitudinal
Examine trends over time

11 Codebooks Variable name Descriptive label Position in the record
Numeric or character code type Applicable field input format Questionnaire number that generated the measures

12 Get the Data Download some public use files directly from the web as either ASCII files or SAS transport files. CD-ROM Restricted use file

13 Survey Design Ask Sampling Frame
defines the population represented, as well as which population groups are excluded. Ask the sampling unit Individuals Households Institutions Ask whether surveys allow for cross-sectional and longitudinal analyses, as well as serial cross- sectional analyses to examine trends Ask whether surveys allow for regional or state level comparisons

14 Survey Sampling Design

15 Multistage Cluster Probability
Sample An efficient strategy for data collection Otherwise ……… it would be inordinately expensive to travel the entire US to interview a random sample of individuals or households across the United States

16 Multistage Cluster Probability
Sample Counties (PSUs) States (strata) Enumeration Area (SSUs) Households (EUs) Respondents

17 Information and variables
needed in the statistical analysis

18 Primary Sampling Unit (PSU)
The first unit that is sampled in the design. For example, school districts from Texas may be sampled and then schools within districts may be sampled. The school district would be the PSU. 

19 Stratification (Strata)
A method of breaking up the population into different groups, often by demographic variables such as gender, race or SES. 

20 Weight Variable The weight variable reflects information about the sampling design includes the probability of being sampled adjustments for non-response post-stratification adjustments A weight variable is already included in the data set to be analyzed Care must be taken to use the correct weight variable Survey documentation serves as the guide for selecting the appropriate weight

21 Cluster A naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc). A unit for which the administrative level has clear, non-overlapping boundaries. It is useful because it avoids having to compile exhaustive lists of every single person in the population.

22 Panel Design Participants (or subjects of survey) are followed over multiple survey rounds for a specified period of time. New panels are created at designated intervals. Certain percentage of subject is rotated out and replaced by newly sampled subjects each round.

23 Understanding the Potential
Sources of Errors Survey non-response Item non-response Sampling frame under coverage Measurement error Instrument error

24 Software for Survey Data
Designed to analyze complex survey Can account for important design parameters Survey weights Cluster and stratified sampling (psu) Software: SAS, Stata, SUDAAN

25 Power Calculations Make sure that there is sufficient data to test hypotheses. Or show that relevant statistics did not occur simply by chance.

26 Data bases for your use

27 The United States Department of Health & Human Services (HHS)
Major Federal health and human service surveys and data systems and state agency data sites. Information on completed and in-progress program evaluations and policy research. Links to published scientific literature in human services and health.

28 Behavioral Risk Factor Surveillance
System State data available at: English and Spanish Questionnaires available at: Evaluation of survey questionnaires which could be adapted for BRFSS can be found at: ss/surveys/ratings.html

29 Youth Risk Behavior Surveillance Survey
Developed in 1990 to monitor priority health risk behaviors that contribute markedly to the leading causes of death, disability, and social problems among youth and adults in the United States. These behaviors are often established during childhood and early adolescence (e.g., tobacco use, unhealthy dietary behaviors).

30 Cancer Registries http://www.cdc.gov/cancer/dbdata.htm
Age-adjusted mortality rates and number of new cases for Lung, colorectal, breast, prostate cancer

31 Other Sources of Demographic, Socioeconomic and Health Data
National Association of County and City Health Officials Community Health Status Indicators CD-ROM ($75) Area Resource File ($500)

32 Health and Medical Care Archive of the Inter-university Consortium for Political and Social Research (ICPSR) Funded by Robert Wood Johnson Foundation Datasets are organized by: Health Care Providers, Cost/Access to Health Care, Substance Abuse & Health, Chronic Health Conditions, Other. Students, faculty, and staff can download any data file on Web site to authorized IP addresses.

33 Other Federal Data Systems
Local VAMC databases: The decentralized Hospital Computer program (DHCP/VISTA) or Austin Help Desk at

34 Meetings and Training http://www.resdac.umn.edu/Index.asp
Research Data Assistance Center (ResDAC) provides assistance to researchers and gives workshops and seminars on how to analyze Medicare and Medicaid data: Joint Program in Survey Methodology

35 Expectation A brief report on using secondary data with the following format: Brief introduction (rational) Hypothesis Methods Data sources Variables needed Major analysis steps Timelines from accessing data to finishing draft manuscript


Download ppt "Using Secondary Datasets"

Similar presentations


Ads by Google