Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 1 February 2011.

Slides:



Advertisements
Similar presentations
Organisation Of Data (1) Database Theory
Advertisements

MS-Access XP Lesson 1. Introduction to MS-Access Database Management System Software (DBMS) Store data in databases Database is a collection of table.
Health Insurance Portability & Accountability Act “HIPAA” To every patient, every time, we will provide the care that we would want for our own loved ones.
Exporting Data for Analysis Michael A. Kohn, MD, MPP 16 August 2012.
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP 5 September 2013.
Introduction for Clinical Database 陳勁辰2003/06/02.
Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 4 February 2003.
Web-Based, Hosted Research Data Management Platforms 2/12/2008.
SPECIAL DIABETES PROGRAM FOR INDIANS Competitive Grant Program Special Diabetes Program for Indians Competitive Grant Program SPECIAL DIABETES PROGRAM.
Registry 201 Excel Registry Training. Registry 201 Excel Registry Training Outline ► Important Information about PHI ► Getting to know you ► Excel Training.
Data Management for Research Michael A. Kohn, MD, MPP 7 January 2003.
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
REDCap Overview Institute for Clinical and Translational Science Neil Nuehring Jesteny Pascual Daniel Hingtgen
Access Tutorial 3 Maintaining and Querying a Database
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
MS Access: Database Concepts Instructor: Vicki Weidler.
REDCap Overview Institute for Clinical and Translational Science Heath Davis Fred McClurg Brian Finley.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
Review of Assignment 3, Loose Ends, Web-based Data Collection Michael A. Kohn, MD, MPP 3 February 2009.
Paula Peyrani, MD Medical/Project Director, HIV Program at the 550 Clinic Assistant Director, Research Design and Development Clinical and Translational.
Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 30 January 2007.
Classroom User Training June 29, 2005 Presented by:
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Database Resources Final Project Database Demonstrations 2/9/2010.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
Data Collection and Management for Clinical Research Michael A. Kohn, MD, MPP 31 August 2010.
Introduction to database systems
Data Management for Pharmaceutical Trials Michael A. Kohn, MD, MPP (Acknowledgment: Susanne Prokscha)
Platform Options and Tradeoffs CHC Study Description Mike Jarrett February 7, 2006.
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010.
Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 5 February 2008.
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP Josh Senyak 22 August 2013.
LiveText is an… Online Work Environment and…YOUR Digital Notebook! No More Lost Paper Assignments!
Database Management for Clinical Research Tables, Normalization, Queries, and Forms Michael A. Kohn, MD, MPP 3 September 2013.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Chapter 10 Database Management. Chapter 10 Objectives Discuss the functions common to most DBMSs Identify the qualities of valuable information Explain.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP 30 August 2012.
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP 29 August 2013.
Data Management for Research Michael A. Kohn, MD, MPP January 4, 2005.
Microsoft Access Designing and creating tables and populating data.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Preparing To Automate Data Management Chapter 1 “You.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
Databases,Tables and Forms Access Text by Grauer Chapters 1 & 2.
EPI 218 Queries and On-Screen Forms Michael A. Kohn, MD, MPP 9 August 2012.
1 CSE 2337 Introduction to Data Management Access Book – Ch 1.
REDCap Overview Institute for Clinical and Translational Science Fred McClurg Neil Nuehring.
REDCap Overview Institute for Clinical and Translational Science Heath Davis Fred McClurg Brian Finley.
Database Management Systems (DBMS)
Using REDCap (Research Electronic Data Capture) as a tool to perform research studies Abstract ID no. IRIA-1076.
TOPSpro Special Topics I: Database Managemen t. Agenda for Module I: Database Management  TOPSpro Backup/Restore Wizard  TOPS-TOPS Import/Export Wizard.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. REDCap:
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
N5 Databases Notes Information Systems Design & Development: Structures and links.
REDCap General Overview
GO! with Microsoft Office 2016
CIS 155 Table Relationship
GO! with Microsoft Access 2016
Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy
Spreadsheets, Modelling & Databases
Tutorial 7 – Integrating Access With the Web and With Other Programs
The ultimate in data organization
Presentation transcript:

Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 1 February 2011

Outline Assignment 3 Review Loose Ends: Yes/No Fields, BLOBs, Field Names, Front Ends HIPAA Privacy Rule and CFR 21 Part 11 Web-based Data Entry Assignment 4

Housekeeping Database demos with advice for Assignment 4/Final Project: Tuesday 2/8 Assignment 4/Final Project is due 3/8

Final Project Due 3/8 More time for students who need to personally demo their databases to satisfy Part A of the assignment students who are using REDCap and QuesGen to set up their accounts students who need consulting, either from me or from the CTSI DMU.

Assignment 3 Extra Credit: Write a sentence or two for the “Methods” or “Results” section on inter-rater reliability. (Use Bland and Altman, BMJ 1996; 313:744) Lab 3: Exporting and Analyzing Data 1/25/2011 Determine if neonatal jaundice was associated with the 5-year IQ scores and create a table, figure, or paragraph appropriate for the “Results” section of a manuscript summarizing the association.

Answer Of the infants with neonatal jaundice, 149 had IQ tests at age 5, and of the infants without neonatal jaundice, 248 had IQ tests. The mean (+SD) IQ score was significantly higher in the jaundice group, , than in the no-jaundice group difference 10.1 (95% CI 5.9 – 14.4).

Table. Mean Five-Year IQ Scores for Infants With and Without Neonatal Jaundice NMean (SD)* Jaundice (21.1) No Jaundice (20.5) *Difference in mean scores of 10.1 (95% CI )

Table. Mean Five-Year IQ Scores for Infants Without and With Neonatal Jaundice No Jaundice JaundiceDifference (95% CI) N Mean (SD)101.4 (20.5)111.5 (21.1) 10.1 ( )* *p<

Newman T et al. N Engl J Med 2006;354:

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] No | Yes | combined | diff | Degrees of freedom: 395 Ho: mean(No) - mean(Yes) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t = Would you submit this for publication?

Essential Elements Sample size (149 jaundiced, 248 non-jaundiced) Indication of effect size (report both means, or the difference between them) Get direction of effect right. (Jaundiced group did better!) Indication of variability (Sample SDs, SEs of means, CIs of means, or CI of difference between means.)

Browner on Figures Figures should have a minimum of four data points. A figure that shows that the rate of colon cancer is higher in men than in women, or that diabetes is more common in Hispanics than in whites or blacks, [or that jaundiced babies had higher IQs at age 5 years than non-jaundiced babies,] is not worth the ink required to print it. Use text instead. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 90

Cutoff at 50? Caption should be below figure. What are the error bars? “Neuopsychiatric”

Cutoff at 60? Caption should be below figure.

Browner on 3-D Figures Three dimensional graphs usually are not helpful. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 97 Also, note that the 3-D is only an effect. The data are two dimensional (score by jaundice).

Takes the prize for ugliest figure.

Caption not sufficiently explanatory. Sample size?

Figure 1: In 149 infants with neonatal jaundice, the average IQ scores were higher compared to the 248 non-jaundiced infants when evaluated at age 5 (p<0.0001).

Box Plot Median Line Box extends from 25 th to 75 th percentile Whiskers to upper and lower adjacent values Adjacent value = 75 th /25 th percentile ±1.5 x IQR (interquartile range) Values outside the adjacent values are graphed individually Would be nice if area (or at least width) of box were proportional to sample size (N). In some box plots the width of the box is proportional to log N, but not in Stata.

Extra Credit Report within-subject SD (4.0) as a measure of reliability. Calculate repeatability (11.0) Bland-Altman plot with mean difference and 95% limits of agreement* * Erika did this

Methods: We assessed inter-rater reliability of the IQ test by having different examiners re-test some of the children. We calculated the within-subject standard deviation and repeatability. (Bland and Altman, BMJ 1996; 313:744) Results: Different examiners re-tested 198 children. The within-subject standard deviation was 4.0, so the “repeatability” was 11.0, meaning that two examiners of the same subject would score within 11 points of each other 95 percent of the time. (Bland and Altman, BMJ 1996; 313:744) Methods/Results

N = 142 (children examined by both Satcher and Richmond) Mean Difference = 0.49 (95% CI – 1.38) 95% Limits of Agreement: –

N = 142 (children examined by both Satcher and Richmond) Mean Difference = 0.49 (95% CI – 1.38)

Outline DONE Assignment 3 Review Loose Ends: Yes/No Fields, BLOBs, Field Names, Front Ends HIPAA Privacy Rule, CFR 21 Part 11 Web-based Data Entry Assignment 4

Loose Ends Yes/No Fields BLOBs Field Names “Front End” vs. “Back End”

Yes/No fields Binary fields are not very useful, because you can’t distinguish “No” from blank (not valued). I create a combo box like we used for Race in Lab 1 with 0 for “No” and 1 for “Yes”. This allows blank.

Demonstration (BLOB) Memo fields in the Infant Jaundice Database Word Document Fields on the “Class” form of the ATCR Student Database Photograph fields in the ATCR Student Database Jpegs in Simon Knops’s Syndesmosis Database Field types are not limited to numbers, text, dates. You can put an “object”, such as a Word document or a photo, in a field

Field Names Establish and follow naming conventions for columns and tables. Short field names without spaces or underscores are convenient for programming, querying, and other manipulations. Instead of spaces or underscores, use “IntraCaps” (upper case letters within the variable name) to distinguish words, e.g. “SubjectID”, “FName”, or “ExamDate”. Table names should be singular, e.g. “Subject” instead of “Subjects”, “Exam” instead of “Exams”.

“Front End” vs. “Back End” “Back End” – Tables and Data “Front End” – Forms and reports for entering and viewing the data Access database that you have been using combines “back end” (tables and relationships) with “front end” (forms and reports).* *Even if both are in Access, you usually want to split the front end from the back end. QuesGen uses MySQL for the back end.

Start with Data Tables or Data Collection Forms? It doesn’t matter as long as the process is iterative. Can start with the tables and then develop the forms, test the forms, find problems, and update the tables. Can start with a word-processed form, create the tables, test, and update.* *This seems to work better for most investigators

Sometimes it helps to start with the data collection forms, but remember, you do NOT need one table per data collection form. In the labs you learned that one form can combine data from several tables. And data from one table can appear on several forms.

Before seeking help with data management Search the internet and ask other researchers for already developed data collection forms. Draft your data collection form. Test your data collection form with dummy subjects and, even better, with real (de-identified) study subjects. Enter your test data into a data table with rows corresponding to subjects and columns corresponding to data elements. (Use Excel, Access, Stata, or even Word.) Create or at least think about a data dictionary. Decide who will collect the data, and when/how the data will be collected.

Common Sequence Develop data collection forms in Word Create Excel spreadsheets to store the data (one column per field/attribute, one row per record/entity) Move from Excel to Access because of need for one or more of: –data entry forms (front end), –multiple related tables, –queries using the Access query design tool Move from Access to QuesGen or REDCap because of need for web-based data entry, hosting, auditing, richer user administration and security, but continue to use Access for querying of data extracts to filter, sort, format, and generate derived fields. Export to Stata for analysis.

What Have You Learned? The meaning and importance of the terms “normalization”, “primary key”, and “foreign key”. The difference between a flat-file database, and a normalized, multi-table relational database. A little bit of Microsoft Access Querying data Exporting data for analysis in a statistical package Field types “Front End” (forms) vs. “Back End” (tables)

HIPAA Privacy Rule Patient identifying information must be secure and available only to authorized personnel with auditing of all accesses Patient identifying data include dates such as date of visit, date of surgery, etc.

Name Address (all geographic subdivisions smaller than state) All elements (except years) of dates related to an individual (birth date, admission date, date of death and exact age if over 89) Telephone numbers FAX number address Social Security number Medical record number Health plan beneficiary number Account number Certificate/license number Any vehicle or other device serial number Device identifiers or serial numbers Web URL Internet Protocol (IP) address numbers Finger or voice prints Photographic images

Wednesday, January 27, 2010 (SF Chronicle) UCSF patient records possibly compromised Victoria Colliver, Chronicle Staff Writer (01-27) 16:01 PST SAN FRANCISCO -- Medical records for about 4,400 UCSF patients are at risk after thieves stole a laptop from a medical school employee in November, UCSF officials said today. bin/article.cgi?file=/c/a/2010/01/27/BA1U1BOI6U.DTL This problem can be avoided just by using a Remote Desktop as we do in this class.

CFR 21 Part 11 Required for submission of electronic data to the FDA when applying for drug or device approval Audit trail of all data entries, updates, and deletions.

Three Types of Research Database 1.Combination of paper files, Excel spreadsheets, and direct keyboard entry into the statistical analysis package. 2.Desktop multi-table relational database. --Access --Filemaker Pro 3.Web-Enabled Research Platform. --QuesGen (private vendor) --REDCap (academic consortium) --SurveyMonkey (private)

Web-Enabled Research Platform Browser based entry from anyplace with an internet connection. Enterprise database back end Available as a hosted service

Web-based Data Collection Platforms Vendor Hosted –QuesGen –SurveyMonkey –Medrio Institution Hosted –REDCap –Velos –LabMatrix –OpenClinica Not Discussed Here –Phase Forward –Oracle Clinical

Advantages of Being Web-Based Available anywhere with an internet connection No software requirement beyond a browser Easy to share data No PHI on laptops or USB drives

Disadvantages of Being Web-based Limited look-and-feel options on forms (In contrast, Access forms are highly customizable.) Limited data structures Requires an internet connection

Advantages of Being Hosted No need for servers, system administrators, etc.

QuesGen Demo Enter Robert’s data. (Delete record first if necessary.) Delete and restore Helen’s record Select extract “AvgScore” Use training.studydata.net, Jif02. Run the NIH Report.

Advantages of QuesGen Multiple user roles (DB admin, team member, view-only, site-specific) PHI fields explicitly identified (masked from user without PHI privileges) UCSF IT reviewed New functionality for institutional review Templates for clinical research (medication, lab sample, etc) and systematic reviews (publication) Survey/Questionnaires with skip logic Extensive auditing Supports complex data structures Good user/client support

Disadvantages of QuesGen Not Free.

REDCap Demo Infant Jaundice Copy 1 Delete Robert. Re-enter Robert. Run export. by subjectid, sort: egen avgscore = mean( exscore) keep if redcap_event_name == "Exam1"

Advantages of REDCap Multiple user roles PHI fields explicitly identified Provided by UCSF Templates for clinical research Survey/Questionnaires with skip logic Extensive auditing Free!

Disadvantages of REDCap* No subject or exam list Supports limited data structures (nearly flat file) Flawed data import/export tool. No filtering or reporting. Minimal querying. User/client support?

Data Management Protocol General description of database Data collection and entry Error checking and data validation Analysis (e.g., export to Stata) Security/confidentiality Back up

General Description of Database DBMS, e.g. MS Access XP # of dynamic tables # of static “lookup” tables # of forms # of reports An appendix could include the relationships diagram, the table names and descriptions, and the field names and descriptions (data dictionary). Print relationships diagram using either “Print Relationships” or taking a screen shot.

Data Collection and Entry Import baseline data from existing systems Import lab results, scan results (e.g. DEXA), holter monitor data, and other digital data. For each form, who will collect the data? Collect onto paper forms and then transcribe? Enter directly using screen forms? Scannable forms?

Error Checking and Validation Database automatically checks data against the range of allowed values. Periodic outlier detection. (Outliers still within the range of allowed values.) Calculation checks Is double data entry really needed ?

Analysis How will you get the data out of the database?

Security/Confidentiality Keep identifying data (name, SSN, MRN) in a separate table. Link rest of DB to this table via a Subject ID that has no meaning external to the DB. Restrict access to identifying data. Password protect at both OS and application levels. Audit entries and updates.

Back ups Ask your system person to restore a file periodically. This tests both the back-up and restore systems.

Assignment 4, Part A Send in your study database De-indentify. If you can’t send it, arrange a demo. Prefer real (de-identified) data, but realistic test records are okay. If doing secondary analysis, mock up a database for a follow-up study.

Assignment 4, Part B Data Management Protocol One-page data management section for your research study protocol or description of your current research study database. Briefly describe your study, including design, predictors, outcomes, target population, and sample size. (1 or 2 sentences) If you are doing secondary data analysis, you still need to submit the data collection forms. Send assignment to by 3/8/2011.

Assignment 4 1) What is your study? ("The [CUTE ACRONYM] study is a [DESIGN] study of the associations between [PREDICTOR] and [OUTCOME] in [STUDY POPULATION]"). 2) What data points are you collecting? (Helps to have an actual data collection form mocked up in Word or Access.) 3) Who will collect the data? You? RAs? MDs? Maybe the study subjects will enter the data themselves.

Assignment 4 (cont’d) 4) How will the data be collected? Written onto a paper form and then transcribed into a computer file? Entered directly into the computer? (If it's going to be transcribed, will you be doing that? Will you hire somebody? Or will you enlist some med students?) 5) Will the above-mentioned computer file be an Excel file, Stata file, Access file, or something else? 6) If it's a single table database (e.g., Excel or Stata), what will the rows represent, what will the columns be? Try to provide a detailed data dictionary with the name, data type, description, and validation rules for each field (column) in the single table.

Assignment 4 7) If it's a multi-table database, even a hand-drawn relationships diagram would help but is not required. 8) How will you validate the data for correctness and monitor the data collection effort? (Usually you have some range checks on individual variables and you periodically query for outliers that are nonetheless within the allowed range.) 9) You should periodically analyze the data, not only to look for problems, but also to see where the study is headed. How will you do this? Query in Access and export to Stata? 10) How will you protect your subjects' identifying data? 11) How will you ensure that you don't lose your data file in a computer crash or if a water pipe leaks?

Answering these questions is an essential part of doing a clinical research study.