Data File Structure and Content Joe Larson 5 / 6 / 09.

Slides:



Advertisements
Similar presentations
ADaM Implementation Guide: It’s Almost Here. Are You Ready?
Advertisements

CONSOLIDATED ANNUAL REPORT (CAR) TRAINING Presented By: John Haigh Office of Vocational and Adult Education And Heather Fleck DTI Associates, Inc.
VistaShare Reports How to run the FaCT Main Reports Left click to advance the presentation.
Everything you ever wanted to know about using URS Cenk Erdil, HMIS Manager Caitlin Madevu-Matson, SI Specialist Strategic Information Unit 22 May 2014.
10. NLTS2 Documentation Overview. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training Modules.
Improving the quality of data through imputing missing values (Part One: Introduction to types of missing data) Saeid Shahraz MD, PhD Student Heller School.
Relieving distress, transforming lives Data Collection in IAPT The Importance of collecting data in IAPT-compliant services (References: The IAPT Data.
Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite.
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
HIBBs is a program of the Global Health Informatics Partnership Introduction to Form Design Regional East African Centre for Health Informatics (REACH-INFORMATICS)
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Florida Wage & Salary Survey. Why are salary surveys used and why should I participate? This is a question that is frequently asked. But before you decide.
EASY TEAM MANAGER By Dave Abineri EASYWARE: PO Box 231, Milford, OHIO (Cincinnati) Phone: (513) Use UP arrow to move to the NEXT slide Use.
Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.
TrendReader Standard 2 This generation of TrendReader Standard software utilizes the more familiar Windows format (“tree”) views of functions and file.
Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service ,
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 15.
SPF SIG State-Level Evaluation PARTICIPANT LEVEL INSTRUMENT (PLI)
© 2011 Octagon Research Solutions, Inc. All Rights Reserved. The contents of this document are confidential and proprietary to Octagon Research Solutions,
1 California Career Pathways Trust Reporting March 15, 2015 CALIFORNIA COMMUNITY COLLEGES CHANCELLOR’S OFFICE.
0 ICC Community CY 2013 Chart Audit September 9, 2014 Audit Timeframe: Jan 1, 2013 – Dec 31, 2013.
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
Improving Data Entry and Reporting for the HOPWA Program May 2012.
Running the New HUD APR August wilderresearch.org Webinar Use the questions section in the bar on the right to ask questions ─ We will do our best.
STUDENT ASSISTANCE LIAISON ONLINE QUARTERLY REPORTING Guidance On Understanding and Completing the Quarterly Reporting Form.
Oregon Feature Code. Oregon File Member Setup Done at the district level Runs the physical file information for the district Shouldn’t be too concerned.
4/22/2017 5:36 PM EViews Training Creating Workfiles.
Study Designs Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /4/20151.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
ALICE ADVANCED USERS TRAINING April 10, Welcome and Introductions  Alice for Advanced Users  FCADV Staff Support for Alice  address for.
HPRP: New Reports HPRP new reports and data entry reporting review April 2010.
1 Archiving Michael J. Levin Harvard Center for Population and Development Studies
Downloading and Installing Autodesk Revit 2016
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition.
Summary -1 Chapters 2-6 of DeCoursey. Basic Probability (Chapter 2, W.J.Decoursey, 2003) Objectives: -Define probability and its relationship to relative.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Collecting and Organizing Data (for ease of analysis and good results!) Annie N. Simpson, MSc. Biostatistician.
Renewals A HOW-TO. Objectives 1.Why are renewals necessary? 2.What projects require a renewal? 3.How do I find the form? 4.How do I fill out the form?
Unit 4: Reporting, Data Management and Analysis #4-4-1.
MHSA OMA Forms Overview Rev. 3/02/10. Child / Youth Ages 0-15 Child / Youth Ages 0-15 Transition Age Youth Ages Transition Age Youth Ages
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Copyright 2010, The World Bank Group. All Rights Reserved. Part 2 Survey Design Produced in Collaboration between World Bank Institute and the Development.
Chapter 2 Section 2.1 Sets and Set Operations. A set is a particular type of mathematical idea that is used to categorize or group different collections.
How to Setup and Score a Tournament May Let’s Get Organized The setup and organization outlined in this clinic are suggested steps however can be.
Types of Data When we are interested in finding out more about something, we start asking questions about it. Some questions have answers that are words.
What the data can tell us: Evidence, Inference, Action! 1 Early Childhood Outcomes Center.
MHSA OMA Forms Overview Rev. 6/12/2014. Objectives – FSP Forms Learn about the history of MHSA and Outcomes Learn about the 3 types of forms and how they.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search.
Chapter 2 Sets and Functions Section 2.2 Operations on Two Sets.
Multimedia Web site development Plan your site Steps for creating web pages.
How good is your SEND data? Timothy Kropp FDA/CDER/OCS 1.
Reporter Training for High School RIO TM
DU REDCap Introduction
Common Core Math I Unit 2 One-Variable Statistics
Introduction to Powerschool Gradebook and tienet
ePREM & YVM Data Information Session
PROCESSING DATA.
Coding Manual and Process
Minnesota’s Homeless Management Information System (HMIS)
PPMI 2018 ANNUAL INVESTIGATORS MEETING May 2-3
How Can I Use My Completeness Report to Improve Data Quality?
Reporter Training for High School RIOTM
Streamlined Data Collection
MOON Data File Components
Recidivism Among DWI Offenders in New Mexico (Preliminary Results)
Presentation transcript:

Data File Structure and Content Joe Larson 5 / 6 / 09

Outline What’s in a Data Set? What’s in a Data Set? - File Setup - Key Variables Data Conventions Data Conventions Fun With Demographics Fun With Demographics

What’s in a Data Set?

File Setup Data on the web is broken up into the forms it was collected on. Data on the web is broken up into the forms it was collected on. Different forms can have different collection time(s) and different participant subgroups Different forms can have different collection time(s) and different participant subgroups

Available Data is Broken up by Form All data on the web is arranged by form All data on the web is arranged by formExceptions: - Outcomes file - Demographics file Variables within a data set are in the order of the questionnaire, with any computed variables at the end of the file Variables within a data set are in the order of the questionnaire, with any computed variables at the end of the file

Available Data is Broken up by Form

Different Forms…Different Participants…Different Times Forms collected only once result in a file with one record per person Forms collected only once result in a file with one record per person Forms collected numerous times throughout follow-up result in a file with multiple records per person Forms collected numerous times throughout follow-up result in a file with multiple records per person Some data is only available for specific groups of participants (i.e. DM Only, blood subsample, etc.) Some data is only available for specific groups of participants (i.e. DM Only, blood subsample, etc.) Specifics for an individual file can be found in its corresponding data dictionary Specifics for an individual file can be found in its corresponding data dictionary

Example from Form 80

Key Variables Some variables are found in every file (with the exceptions of the demographics and outcomes files) Some variables are found in every file (with the exceptions of the demographics and outcomes files) - ID - Days since randomization/enrollment - Visit type / Visit number - Form closest to visit - Expected for visit

Key Variables Let’s take a look at actual Form 80 File Let’s take a look at actual Form 80 File

WHI Participant ID (ID)

Participant ID (ID) The ID variable is common to all of the web files. The ID variable is common to all of the web files. Completely independent of the member ID that is used at the individual clinics. Completely independent of the member ID that is used at the individual clinics. Also independent of the Public and blood draw IDs. Also independent of the Public and blood draw IDs.

Days Since Randomization / Enrollment (F80DAYS)

We do not give out actual dates for forms or events. We do not give out actual dates for forms or events. Time is calculated between randomization (CT) or enrollment (OS) and the form date. Time is calculated between randomization (CT) or enrollment (OS) and the form date.

Visit Type (F80VTYP) & Visit Number (F80VNUM)

These variables combine to let you know when data was collected. These variables combine to let you know when data was collected. For example, in the second line of the data on the previous slide we can see that the record is for “Annual Visit 3”. This matches up well with the 1189 days since randomization For example, in the second line of the data on the previous slide we can see that the record is for “Annual Visit 3”. This matches up well with the 1189 days since randomization

Closest to Visit Within Visit Type and Number (F80VCLO)

On rare occasions multiple forms were filled out or entered for the same participant at the same follow-up visit On rare occasions multiple forms were filled out or entered for the same participant at the same follow-up visit This variable identifies the visit closest to the actual date. For example, a year 1 annual visit with a value of “Yes” for VCLO will be the year 1 visit that is closest to 365 days from randomization/enrollment This variable identifies the visit closest to the actual date. For example, a year 1 annual visit with a value of “Yes” for VCLO will be the year 1 visit that is closest to 365 days from randomization/enrollment

Expected for Visit (F80EXPC)

Sometimes forms are filled out by participants who should not be filling them out Sometimes forms are filled out by participants who should not be filling them out The expected for visit flag identifies data that were expected by protocol The expected for visit flag identifies data that were expected by protocol

File Setup / Key Variables Files are arranged by form on the web at Files are arranged by form on the web at File structure and participant group varies by form and is in the data dictionary File structure and participant group varies by form and is in the data dictionary ID, Visit Type, and other important variables can be found at the start of each file ID, Visit Type, and other important variables can be found at the start of each file

Any Questions?

Data Conventions Skip patterns Skip patterns Mark all that apply Mark all that apply Version differences Version differences

Skip Patterns Questions within a form are often set up with a hierarchical structure with parent questions and subquestions In most cases, the sub-questions are set to missing if the parent value indicates the sub- questions should not be answered. This is the application of a skip pattern In a few cases where the error percentage is high, the skip pattern is not applied

Example: Skip Pattern Applied PetDogCatBirdFishOther PetDogCatBirdFishOther Skip pattern QA applied Sub-questions Error Percentage < 1%

Example: Skip Pattern Not Applied Error Percentage ~ 6-12%

If the Skip Pattern is not Applied It will be in the data dictionary It will be in the data dictionary

Mark All That Apply What kind of pet do you have? (mark all that apply) Dog(s) Cat(s) Bird(s) Fish Other One question with multiple choices is converted to separate indicator variables of 0’s and 1’s

OrderQuestion Question Number Value Value Description 17 Do you have a pet 111Yes 18Dog Cat11.12Marked 20Bird11.13Marked 21Fish Other11.15Marked O17O18O19O20O21O Mark all conversion

Version Issues Sometimes questions are not asked on all versions of a form, leading to higher percentages of missing data Sometimes questions are not asked on all versions of a form, leading to higher percentages of missing data The Data Dictionary will have this The Data Dictionary will have this

Data Conventions Some cleaning was done to the data before it reached the web Some cleaning was done to the data before it reached the web Skip patterns and mark-all-that-apply conversions were usually done Skip patterns and mark-all-that-apply conversions were usually done Sometimes questions were not collected on all versions of a form Sometimes questions were not collected on all versions of a form In all cases, any issues are documented in the data dictionary In all cases, any issues are documented in the data dictionary

Any Questions?

Fun With Demographics

The Demographics File The demographics file is the glue that pulls most analyses together The demographics file is the glue that pulls most analyses together It contains important variables that are used in just about every analysis It contains important variables that are used in just about every analysis The file has one record per person The file has one record per person

Trial Participation Flags

Trial Flags distinguish what part of the WHI a participant is in Trial Flags distinguish what part of the WHI a participant is in In addition to CT and OS indicators, there are indicator variables for each clinical trial component In addition to CT and OS indicators, there are indicator variables for each clinical trial component

Basic Demographic Data

Including age, ethnicity, education, and income can be found here Including age, ethnicity, education, and income can be found here Because clinical center data has not been released, the “U.S. Region” variable is the best variable to use for geographic location Because clinical center data has not been released, the “U.S. Region” variable is the best variable to use for geographic location

Trial Arms

These are the key variables for any analysis on the clinical trial. These are the key variables for any analysis on the clinical trial. The hormone arm variable can also be used to separate out participants in the two hormone trials The hormone arm variable can also be used to separate out participants in the two hormone trials

Days from CT to CaD Randomization

Key variable used to determine how far a follow-up visit is from CaD randomization Key variable used to determine how far a follow-up visit is from CaD randomization To determine days from CaD randomization To determine days from CaD randomization - Start with the days from CT randomization - Subtract the days from CT to CaD randomization

BMD Subsample Indicator

A ‘yes’ response indicates that the participant was at one of the three BMD clinics A ‘yes’ response indicates that the participant was at one of the three BMD clinics

Fun With Demographics The demographics file is a key file used in most analyses The demographics file is a key file used in most analyses It includes trial participation and treatment status variables, as well as basic demographic data It includes trial participation and treatment status variables, as well as basic demographic data

Questions?

Stay Tuned Later I’ll be doing a beginning to end example: Later I’ll be doing a beginning to end example: - Going to the web - Hunting down variables - Downloading the data - Loading it into SAS - Merging files together - Running some basic frequencies And taking questions while I do it! And taking questions while I do it!

Thanks and Good Night