Using Secondary Datasets

Slides:



Advertisements
Similar presentations
How to write a study protocol Hanne-Merete Eriksen (based on Epiet 2004)
Advertisements

Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?
National Center for Health Statistics DCC CENTERS FOR DISEASE CONTROL AND PREVENTION Changes in Race Differentials: The Impact of the New OMB Standards.
Data Collection Methods
Chapter 13 Survey Designs
Who and How And How to Mess It up
Sampling.
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
Chapter 13 Survey Designs
Creating Research proposal. What is a Marketing or Business Research Proposal? “A plan that offers ideas for conducting research”. “A marketing research.
Survey Designs EDUC 640- Dr. William M. Bauer
Lecture 30 sampling and field work
Aspects of the National Health Interview Survey (NHIS) Chris Moriarity National Conference on Health Statistics August 16, 2010
Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 13 Survey Designs.
1-2 Training of Process FacilitatorsTraining of Coordinators 5-1.
NLM Database Central: The First Place to Look for Your PHSR Research Data F. Douglas Scutchfield 1, M.D., Michelyn W. Bhandari 2, DrPH, and Allison Amrhein,
RESEARCH A systematic quest for undiscovered truth A way of thinking
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.
2004 Falls County Health Survey Texas Behavioral Risk Factor Surveillance System (BRFSS)
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
Evaluating a Research Report
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
NLM Database Central: The First Place to Look for Your PHSR Research Data F. Douglas Scutchfield 1, M.D., Michelyn W. Bhandari 2, DrPH, and Allison Amrhein,
SOC 503 Techniques & Methods of Social Science Data Resources at Princeton University.
Acute and Chronic Disability Among US Farmers and Pesticide Applicators: The National Health Interview Survey O Gómez-Marín, D Zheng, W LeBlanc, D Lee,
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
24 Nov 2007Data Management and Exploratory Data Analysis 1 Yongyuth Chaiyapong Ph.D. (Mathematical Statistics) Department of Statistics Faculty of Science.
Saving Time, Money, and Work: How to Do Secondary Data Analysis Vijay Singh, MD, MPH, MS, University of Michigan Arch Mainous III, PhD, Medical University.
Appropriate use of Design Effects and Sample Weights in Complex Health Survey Data: A Review of Articles Published using Data from Add Health, MTF, and.
Introduction to NCHS Rob Weinzimer, Special Assistant for Outreach Centers for Disease Control and Prevention National Center for Health Statistics.
Data Collection Methods Pros and Cons of Primary and Secondary Data.
1 ANALYZING DATA FROM THE NATIONAL IMMUNIZATION SURVEY __________________________________________ Michael P. Battaglia Abt Associates Inc. Meena Khare.
Component D: Data Collection in Field Surveys Activity D.1: Management and monitoring of field interviewers Surveys Department EU Twinning Project.
National Center for Health Statistics (NCHS) Centers for Disease Control and Prevention.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Division of HIV/AIDS Managing Questionnaire Development for a National HIV Surveillance Survey, Medical Monitoring Project Jennifer L Fagan, Health Scientist/Interview.
Chapter 1 Introduction and Data Collection
METHODS SECTION OF A RESEARCH PROPOSAL
Part Two.
CHNA Kick off Meeting: Board of Directors
Canadian Election Study
Analyze ICD-10 Diagnosis Codes with Stata
WHO The World Health Survey General Introduction
CHAPTER OVERVIEW The Format of a Research Proposal Being Neat
Supplementary Table 1. PRISMA checklist
Trena M. Ezzati-Rice, Frederick Rohde, Robert Baskin
Adolescents, Young Adults, and Adults
Sampling And Sampling Methods.
Chapter 2 Sociological Research Methods
Trends in Chronic Diseases by Demographic Variables, Hawaii’s Older Population, Hawaii Health Survey (HHS) K. Kromer Baker1, A. T. Onaka1, B. Horiuchi1,
Chapter Three Research Design.
Chapter Eight: Quantitative Methods
Introduction to IPUMS NYTS and IPUMS YRBSS
Research Design Shamindra Nath Sanyal 12/4/2018 SNS.
RESEARCH METHODOLOGY ON ENVIRONMENTAL HEALTH PRACTICE IN WEST AFRICA
Secondary Data Analysis Lec 10
Introduction to IPUMS NYTS and IPUMS YRBSS
Business Statistics: A First Course (3rd Edition)
The Youth Risk Behavior Surveillance System (YRBSS): 2011
Week 9 Sampling a population
CHAPTER OVERVIEW The Format of a Research Proposal Being Neat
STEPS Site Report.
Recent Incidences and Trends of the Top Cancers in Northeast Tennessee Appalachian Region Adekunle Oke1, Sylvester Orimaye2, Ndukwe Kalu1, Dr. Faustine.
Using Large Databases for Research
Presentation transcript:

Using Secondary Datasets Lee Cheng, MD, MSc Joint Primary Care Fellowship Department of Family & Community Medicine The University of Texas Medical School at Houston 2005

Definition of Secondary Data Collected by other studies For the first researcher, they are primary data For the second researcher, they are secondary data Collected by federal, state, and local government agencies, nationally representative surveys Health and health care Social and behavioral data Polices

Benefits Cost-effective to test specific hypothesis Create new research questions Demonstrate new and improved research designs, and analytic approaches

Potential Drawbacks Only as good as the research that produced them Must assume what the authors meant by the terms they used Data may be neither valid nor reliable Instruments or data collection methods may have changed over time Data may have been modified by the researcher already (e.g., weighted)

Potential Drawbacks (cont’d) Poor documentation of the secondary data set Limited access to the data, e.g., on-site only Substantial purchase cost

Start with Research Questions Begin with well-defined research questions and testable hypotheses Identify variables needed Identify the most appropriate data source available to achieve the study's aims

Steps in Using Data

Verify the Data Check for the following: Proper documentation The correct number of observations or cases The correct number of variables The correct coding scheme The original summary statistics are reproducible  

Determine Variable of Interests Identify the variables needed Independent variables Dependent variables Confounding variables

Study Design of Data Cross-sectional Longitudinal Examine trends over time

Codebooks Variable name Descriptive label Position in the record Numeric or character code type Applicable field input format Questionnaire number that generated the measures  

Get the Data Download some public use files directly from the web as either ASCII files or SAS transport files. CD-ROM Restricted use file

Survey Design Ask Sampling Frame defines the population represented, as well as which population groups are excluded. Ask the sampling unit Individuals Households Institutions Ask whether surveys allow for cross-sectional and longitudinal analyses, as well as serial cross- sectional analyses to examine trends Ask whether surveys allow for regional or state level comparisons

Survey Sampling Design

Multistage Cluster Probability Sample An efficient strategy for data collection Otherwise ……… it would be inordinately expensive to travel the entire US to interview a random sample of individuals or households across the United States

Multistage Cluster Probability Sample Counties (PSUs) States (strata) Enumeration Area (SSUs) Households (EUs) Respondents

Information and variables needed in the statistical analysis

Primary Sampling Unit (PSU) The first unit that is sampled in the design. For example, school districts from Texas may be sampled and then schools within districts may be sampled. The school district would be the PSU. 

Stratification (Strata) A method of breaking up the population into different groups, often by demographic variables such as gender, race or SES. 

Weight Variable The weight variable reflects information about the sampling design includes the probability of being sampled adjustments for non-response post-stratification adjustments A weight variable is already included in the data set to be analyzed Care must be taken to use the correct weight variable Survey documentation serves as the guide for selecting the appropriate weight

Cluster A naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc). A unit for which the administrative level has clear, non-overlapping boundaries. It is useful because it avoids having to compile exhaustive lists of every single person in the population.

Panel Design Participants (or subjects of survey) are followed over multiple survey rounds for a specified period of time. New panels are created at designated intervals. Certain percentage of subject is rotated out and replaced by newly sampled subjects each round.

Understanding the Potential Sources of Errors Survey non-response Item non-response Sampling frame under coverage Measurement error Instrument error

Software for Survey Data Designed to analyze complex survey Can account for important design parameters Survey weights Cluster and stratified sampling (psu) Software: SAS, Stata, SUDAAN

Power Calculations Make sure that there is sufficient data to test hypotheses. Or show that relevant statistics did not occur simply by chance.  

Data bases for your use

The United States Department of Health & Human Services (HHS) http://aspe.hhs.gov/statinfo/ Major Federal health and human service surveys and data systems and state agency data sites. Information on completed and in-progress program evaluations and policy research. Links to published scientific literature in human services and health.

Behavioral Risk Factor Surveillance System State data available at: http://apps.nccd.cdc.gov/brfss/index.asp English and Spanish Questionnaires available at: http://www.cdc.gov/brfss/brfsques-qustionnairesesp.htm Evaluation of survey questionnaires which could be adapted for BRFSS can be found at: www.schs.state.nc.us/SCHS/about/programs/brf ss/surveys/ratings.html

Youth Risk Behavior Surveillance Survey http://apps.nccd.cdc.gov/YRBSS/index.asp Developed in 1990 to monitor priority health risk behaviors that contribute markedly to the leading causes of death, disability, and social problems among youth and adults in the United States. These behaviors are often established during childhood and early adolescence (e.g., tobacco use, unhealthy dietary behaviors).

Cancer Registries http://www.cdc.gov/cancer/dbdata.htm Age-adjusted mortality rates and number of new cases for Lung, colorectal, breast, prostate cancer

Other Sources of Demographic, Socioeconomic and Health Data www.census.gov National Association of County and City Health Officials Community Health Status Indicators CD-ROM ($75) Area Resource File http://www.arfsys.com/ ($500)

Health and Medical Care Archive of the Inter-university Consortium for Political and Social Research (ICPSR) www.icpsr.umich.edu/ Funded by Robert Wood Johnson Foundation Datasets are organized by: Health Care Providers, Cost/Access to Health Care, Substance Abuse & Health, Chronic Health Conditions, Other. Students, faculty, and staff can download any data file on Web site to authorized IP addresses.

Other Federal Data Systems http://www.fedstats.gov/ Local VAMC databases: The decentralized Hospital Computer program (DHCP/VISTA) http://www.virec.research.med.va.gov/ or Austin Help Desk at 512-326-6780

Meetings and Training http://www.resdac.umn.edu/Index.asp Research Data Assistance Center (ResDAC) provides assistance to researchers and gives workshops and seminars on how to analyze Medicare and Medicaid data: http://www.jpsm.umd.edu/ Joint Program in Survey Methodology

Expectation A brief report on using secondary data with the following format: Brief introduction (rational) Hypothesis Methods Data sources Variables needed Major analysis steps Timelines from accessing data to finishing draft manuscript