Data Quality Assurance Beverly Musick
Introduction Electronic data play a critical role in health care delivery By providing decision support to health-care providers By forming the basis for periodic reporting to administrators and funders By aiding investigators in addressing relevant research questions Electronic data typically originate with paper collection forms which is why appropriate form design is so critical
Quality Assurance Quality Assurance is the set of processes, procedures, and activities that are initiated prior to data collection to ensure that the expected level of quality will be reached. Form Design Data Element Dictionary Electronic error prevention and control Anticipation of missing data Training and informing of clinical staff
Form Design Data collection forms must Employ clear, precise, unambiguous questions Organize questions into logical groupings Consider numbering questions/items Use appropriate responses Mutually exclusive categories Consider quantity vs. frequency Explicit vs. implicit
Tick all that apply vs. Tick one Coding Responses Tick all that apply vs. Tick one 10d. How do you think you were exposed to HIV? (Check all that apply) □ Patient knows spouse or partner is HIV+ □ Suspected exposure in prior relationship □ Blood Transfusion Year of Transfusion □ History of Intravenous Drug Use □ Contaminated Needle Stick □ Unknown □ Other
Tick all that apply vs. Tick one Coding Responses Tick all that apply vs. Tick one 10. What is your current relationship status? (select one) □ Never married and not living with a partner □ Legally married: Number of wives ________ □ Living with a partner □ Separated □ Divorced □ Widowed For “Tick one” responses must be mutually exclusive
Coding Responses Common Mistake Example: How often have you had pain in the past week? □ never □ 1-2 days □ 2-4 days □ 5 or more days Response categories are not mutually exclusive
Appropriate Responses Quantity vs. Frequency How much pain have you had in the past week? □ none □ mild □ moderate □ severe How often have you had pain in the past week? □ never □ 1-2 days □ 3-4 days □ 5 or more days (responses address quantity) (responses address frequency) For example, if you ask someone how much pain they’ve had in the past week you’ll want to categorize the responses as an amount of pain. (none, very little, some, a lot, very much). And if you want to know how often they’ve had pain during the past week then use responses such as (never, 1-2 days, 3-4 days, 5 or more days). You may need to ask both questions (amount and frequency) to get a true picture of the level of pain during the past week. In addition, you need to make sure that the responses are mutually exclusive and that no gaps are left in the responses. when the intention is such. If you ask a person what type of insurance they have, a few of them may say both medicaid and private. In this case you’ll want to have them check all that apply.
Appropriate Responses Explicit vs. Implicit 10a. □ Yes □ No – Sexually active last 6 months 10b. □ Sexually active last 6 months
Form Design Data collection forms must Employ clear, precise, unambiguous questions Use appropriate responses Include units of measurement
Include units of measurement Lab results can have different units of measure depending on assay used Weights recorded in both kg and lbs
Form Design Data collection forms must Employ clear, precise, unambiguous questions Use appropriate responses Include units of measurement Explicitly identify data
Explicitly Identify Data Name: ___________________________ vs. First Name: _______________________ Middle Name: _____________________ Sur Name: _______________________
Form Design Data collection forms must Employ clear, precise, unambiguous questions Use appropriate responses Include units of measurement Explicitly identify data Avoid open-ended questions
Original AMPATH Form Below was the method for collecting ART data Plan/Comments: ______________________________________________________________________________________________________ Avoid open-ended questions show example of messy data; need to recode; never sure capture all;
Ways to enter Triomune BEGIN TRIOMUNE 30 CHANGE MEDS TO TRIOMUNE CT EMTR REFILL N, Z40, E. CT EMTRI CT N,E CT N, E, Z CT N, Z, E CT N, Z40 CT N, Z30 CT N,Z CT TRIOMUNE EMTRI EMTRI 30 EMTRI-30 EMTRI-40 EMTRI – 40 EMTRI 40 E-40 E - 40 PUT ON TRIOMUNE PUT ON TRIPLE THERAPY RATHER THAN TRIOMUNE REFILL EPIVIR,ZERIT, NEVIRAPINE REFILL N, Z40, E RE-START BACK(2ndTIME) ON TRIOMUNE RESTART TRIOMUNE 30 SWITCH TO TRIOMUNE SWITCH TO T-30 SWITCH TO T-40 TO COMPLETE TRIOMUNE 40 TRY AGAIN T 30 TRIOMUNE TRIOMUNE 30 TRIOMUNE-30 TRIOMUNE 40 T 30 T 40 T-30 T-40 T030 T30 T40
Alternatives to Open-Ended Item Current Medication Plan: Triomune 30 Triomune 40 or Triomune 30: yes no Triomune 40: yes no
Form Design Data collection forms must Employ clear, precise, unambiguous questions Use appropriate responses Include units of measurement Explicitly identify data Avoid open-ended questions Reflect flow of data collection
Flow of Data Collection Who will be collecting data In what order How often will data elements be collected (cross-sectional vs. longitudinal variables) What is the interval of data collection (appropriate windows) Frequency and Interval – Need to determine how often each instrument is going to be administered? What are the interval windows of administration? In a longitudinal study you don’t need to re-ask questions which don’t change over time such as race, sex, date of birth. Be sure to include a visit number and date if the same instrument will be administered multiple times. Items that are repeated need to be asked in the same way at each time point to be comparable. Order of instrument collection can have a significant impact on the results. Need to consider subject fatigue, overall time constraints, and in the case of something like cognitive testing, proper placement of instruments to control time dependent variables (delayed memory for example). May want to vary the order of instrument collection (be sure to record order). Mode – self-administered vs. interviewer administered electronic (PDA, PC, tablet pc) vs. paper/manual vs. direct output from medical procedure/equipment phone vs. in-person
Form Design Summary Well-designed forms should employ clear questions and appropriately coded responses. Explicitly identify data and include measurement units. Use of open-ended questions should be avoided. Consider the overall organization of data both cross-sectional (one-time collection) and longitudinal (multiple observations collected). Cross-sectional data such as gender and DOB should not be re-collected on follow-up forms. Information that is collected repeatedly over time should be formulated in the same manner at each time point.
PRACTICUM DQA1 Review data collection form
Form Implementation Identify key fields Select appropriate data attributes (type, length, and format) Choose meaningful field names Prepare for missing data Implementation Implementation is easy when you have well-defined data collection forms. I’ve already touched on a few issues regarding the recording of responses and the storage of longitudinal data. A few more issues are.. Identify key field(s)… It is important to have one piece of info on each questionnaire that clearly distinguishes it from all other questionnaires. This may be a single field such as AMPATH ID or for longitudinal questionnaires it may be multiple fields such as AMPATH ID and visit date. Something else we try to do as we review a data collection form is to assign field names to each question. This forces us to pay close attention to the information being collected and how it relates to other questions. You may notice ways to combine questions that will reduce the interview time. For example, do you have any children Y/N… if yes, how many____ vs. Simply asking how many. This also This gives you an idea of what data the statistician will receive. Choose meaningful field name and data types… Meaningful field names improves the readability and generation of reports in the same way as self-evident codes. A variable for mother’s race, for example, could be called momrace or racemom to convey the full meaning of the field. In most relation database management systems, data types can be numeric, character, date, or logical. If you know you’ll need to calculate the mean of a field make it numeric. Date types can always be used evenly when some of the information may be incomplete, for example someone may know the month and year of their knee surgery, but may not know the exact date. Choose the first or fifteenth of the month to fill in the unknown and you’ll be able to use the built in operations that most software provides for date type variables. Prepare for missing data… Even if you think that no one could skip a question, you need to allow for missing information. You can use a special character or rely on system default as long as the default is not also a valid response.
Data Element Dictionary Used to map individual items and responses to the electronic data A guidebook for understanding the organization and storage of data Prevents ambiguity and misinterpretation Example: RefDocs\20050112_ampath_adult_return_Key.pdf
Example: Data Element Dictionary
Electronic Error Prevention and Checking Electronic interface should mirror the paper form as much as possible. Use drop down menus for ease of selecting the desired response. Force entry of required fields such as patient ID and date before allowing further entry. Prevent entry of duplicate patient ID’s into the patient registry or duplicate observations on the same patient same visit. Utilize range restrictions on numeric fields to prevent entry of erroneous data. Include logical checks that conditionally restrict entry.
Out of Range Values Out of range values are those values which are outside the expected scope of response Can be numeric values which exceed the anticipated minimum or maximum Can be response categories that were not previously identified Can be due to changes in the environment such as the availability of new medications Can be due to data entry errors Utilize range restrictions on numeric fields to prevent entry of erroneous data. Include logical checks that conditionally restrict entry.
Anticipating Out of Range Values Data Collection: Include response category for ‘Other’ with place to specify what the other is. Include version numbers on data collection forms so that you can track when a response category was added. Data Entry: Utilize range restrictions on numeric fields to prevent entry of erroneous data. Tag out of range values. Allow data entry clerks to make notes about unexpected values.
Dealing with out of range values Reject: prevent entry of out of range value Accept unconditionally: allow entry even though out of range Accept conditionally: allow entry but mark value to indicate out of range Correct: convert out of range value to upper or lower bound of in-range value
PRACTICUM DQA2 Electronic Error Prevention
Plan for Missing Data Before data collection begins, determine how missing data will be recorded and entered Possibilities for Coding Missing Data Not Applicable N/A (i.e. Adherence to medications for patient not taking any meds) Not Available (i.e. variable added to questionnaire at a later date, weight temporarily missing due to broken scale) Unknown (i.e. HIV status of partner) Refusal to answer (i.e. questions associated with stigma) True missing (i.e. Question skipped) Understand how missing data will be managed in the analysis to help determine how much information is to be gathered about missing data
Types of Missingness Missing Completely at Random – probability of missing data on variable Y is unrelated to the true value of Y or other variables in the dataset Ex. Water damage to paper forms prior to entry Missing at Random – probability of missing data on Y is unrelated to Y only after adjusting for one or more other variables Ex. For really sick patients, clinicians may not draw blood for routine labs Not Missing at Random – probability of missing data on Y is dependent on value of Y Ex. Higher income patients may be less likely to report income
Documenting Missingness Embed missing codes and/or variables in dataset and/or on data collection forms Pros: Permanently associated with variable Immediately available for analysis Reduces need to re-look for data Cons: Takes up a lot of digital and physical space Increases time needed to complete forms Provide explanation in separate meta document Pros: Global explanation of missing data Minimal digital and physical space Cons: Eliminates ability to code subject-level data Tends to get lost/separated from data
Benefits of Documenting Missingness Informs Quality Control reporting Query data collectors on missingness related to “result not available” but not on “test not ordered” Allows for full disclosure in publication or presentation of data Some statistical analysis methods are dependent on Missing Completely at Random or Missing at Random Useful for methodological research related to missing data
Procedures for Minimizing Missing Data In the clinic: review the data collection forms in the clinic, preferably while the patient is still there. should be part of the clinical staff training and oversight Point of data entry: prevent entry of a form that is missing key variables such as patient ID and visit date. Alert data entry clerk about missing fields.
Training and Informing Clinical Staff Regular training in accurate completion of encounter forms is vital to ensuring data quality. Review of key fields to ensure proper completion by independent individual Include the provider/user ID of the person completing the forms to encourage and ensure accountability.