Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 30 January 2007.

Slides:



Advertisements
Similar presentations
Organisation Of Data (1) Database Theory
Advertisements

Introduction to Microsoft Access
Exporting Data for Analysis Michael A. Kohn, MD, MPP 16 August 2012.
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP 5 September 2013.
Introduction for Clinical Database 陳勁辰2003/06/02.
Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 4 February 2003.
Web-Based, Hosted Research Data Management Platforms 2/12/2008.
Access - Project 1 l What Is a Database? –A Collection of Data –Organized in a manner to allow: »Access »Retrieval »Use of That Data.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 1 Introduction to Database Management.
Data Management for Research Michael A. Kohn, MD, MPP 7 January 2003.
Living in a Digital World Discovering Computers 2010.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you manage a database?
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Chapter 1 Introduction to Database Management. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Welcome! Database technology:
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Chapter 9 Database Management
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
MS Access: Database Concepts Instructor: Vicki Weidler.
Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.
Microsoft Windows 2003 Server. Client/Server Environment Many client computers connect to a server.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
Review of Assignment 3, Loose Ends, Web-based Data Collection Michael A. Kohn, MD, MPP 3 February 2009.
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
MICROSOFT ACCESS Pn. Jamilah Binti Yusof. DEFINITION A database is the computer equivalent of an organized list of information. Typically, this information.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall 1 1. Chapter 2: Relational Databases and Multi-Table Queries Exploring Microsoft Office.
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
XP New Perspectives on Microsoft Office Access 2003 Tutorial 12 1 Microsoft Office Access 2003 Tutorial 12 – Managing and Securing a Database.
Database Resources Final Project Database Demonstrations 2/9/2010.
Data Collection and Management for Clinical Research Michael A. Kohn, MD, MPP 31 August 2010.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010.
Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 5 February 2008.
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP Josh Senyak 22 August 2013.
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
Instructions and Reporting Requirements Module 7 Electronic Reporting For Facilities March 2014 North Carolina Central Cancer Registry State Center for.
Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 1 February 2011.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP 30 August 2012.
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
EPI 218 Web-Enabled Research Data Management Platforms Michael A. Kohn, MD,MPP 29 August 2013.
Presented By: Gail Rose-Innes Camps Bay High School ICT & CAT Department Microsoft Access 2010.
Data Management for Research Michael A. Kohn, MD, MPP January 4, 2005.
Microsoft Access Designing and creating tables and populating data.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Preparing To Automate Data Management Chapter 1 “You.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 6, 2009.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 8, 2008.
EPI 218 Queries and On-Screen Forms Michael A. Kohn, MD, MPP 9 August 2012.
1 CSE 2337 Introduction to Data Management Access Book – Ch 1.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
ADVANTAGES OF DATA BASE MANAGEMENT SYSTEM. TO BE DICUSSED... Advantages of Database Management System  Controlling Data RedundancyControlling Data Redundancy.
IS2803 Developing Multimedia Applications for Business (Part 2) Lecture 1: Introduction to IS2803 Rob Gleasure
HEI/OCAN College Access Program Data Submissions.
Access Module Implementing a Database with Microsoft Access A Great Module on Your CD.
Medidata Rave Start-Up Information
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall. 1 Skills for Success with Microsoft ® Office 2007 PowerPoint Lecture to Accompany.
Introduction to Microsoft Access
CIS 155 Table Relationship
Created by Kamila zhakupova
Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy
Microsoft Office Access 2003
(Required for DTCs, Recommended for STCs)
The ultimate in data organization
Guidelines for Microsoft® Office 2013
Presentation transcript:

Planning and Budgeting for Data Management in a Clinical Research Study Michael A. Kohn, MD, MPP 30 January 2007

Outline Assignment 3 Review Guidelines for Research Databases Loose Ends: BLOBs, Front Ends, Forms Planning and Budgeting for Data Management in a Research Project Assignment 4

Housekeeping Database demos with advice for Assignment 4: Tuesday 2/6 –Kathy Yang –Andrew High –Mike Jarrett Assignment 4 is due 2/12

Assignment 3 Extra Credit: Write a sentence or two for the “Methods” or “Results” section on inter-rater reliability. (Use Bland and Altman, BMJ 1996; 313:744) Lab 3: Exporting and Analyzing Data 1/23/2007 Determine if neonatal jaundice was associated with the 5-year neuropsychological scores and create a table, figure, or paragraph appropriate for the “Results” section of a manuscript summarizing the association.

Answer Of the infants with neonatal jaundice, 149 had neuropsychological exams at age 5, and of the infants without neonatal jaundice, 248 had neuropsychological exams. The mean (+SD) neuropsychological score was significantly higher in the jaundice group, , than in the no-jaundice group difference 10.1 (95% CI 5.9 – 14.4).

Table. Mean Five-Year Neuropsychiatric Scores for Infants With and Without Neonatal Jaundice NMean (SD)* Jaundice (21.1) No Jaundice (20.5) *Difference in mean scores of 10.1 (95% CI )

Table. Mean Five-Year Neuropsychological Scores for Infants Without and With Neonatal Jaundice No Jaundice JaundiceDifference (95% CI) N Mean (SD)101.4 (20.5)111.5 (21.1) 10.1 ( )* *p<

Table 3: WPPSI-R and VMI-4 scores in both groups Unadjusted means ControlsN TSB ≥ 25 mg/dLN Adjusted* difference 95% CI (adjusted diff) P (adjusted diff) WPPSI-R Verbal IQ WPPSI-R Performance IQ WPPSI-R Full Scale IQ VMI-4 Visual-Motor Integration VMI-4 Visual Perception VMI-4 Motor Coordination *Multiple linear regression. Models varied; most included paternal race and education. Covariates included in each model are available upon request. From Newman TB, et al. 5-Year Outcome of Newborns with Total Serum Bilirubin Levels of 25 mg/dL or More. In review by NEJM. (Not accepted. Not in press.)

Newman T et al. N Engl J Med 2006;354:

Results of Testing with the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R) Test and the Beery-Buktenica Developmental Test of Visual-Motor Integration, 4th edition (VMI-4)

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] No | Yes | combined | diff | Degrees of freedom: 395 Ho: mean(No) - mean(Yes) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t = Would you submit this for publication?

Essential Elements Sample size (149 jaundiced, 248 non-jaundiced) Indication of effect size (report both means, or the difference between them) Get direction of effect right (Jaundiced group did better!) Indication of variability (Sample SDs, SEs of means, CIs of means, or CI of difference between means.)

Browner on Figures Figures should have a minimum of four data points. A figure that shows that the rate of colon cancer is higher in men than in women, or that diabetes is more common in Hispanics than in whites or blacks, [or that jaundiced babies had higher IQs at age 5 years than non-jaundiced babies,] is not worth the ink required to print it. Use text instead. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 90

Cutoff at 50? Caption should be below figure. What are the error bars? “Neuopsychiatric”

Cutoff at 60? Caption should be below figure.

Browner on 3-D Figures Three dimensional graphs usually are not helpful. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 97 Also, note that the 3-D is only an effect. The data are two dimensional (score by jaundice).

Takes the prize for ugliest figure.

Caption not sufficiently explanatory.

Figure 1: In 149 infants with neonatal jaundice, the average IQ scores were higher compared to the 248 non-jaundiced infants when evaluated at age 5 (p<0.0001).

Box Plot Median Line Box extends from 25 th to 75 th percentile Whiskers to upper and lower adjacent values Adjacent value = 75 th /25 th percentile ±1.5 x IQR (interquartile range) Values outside the adjacent values are graphed individually Would be nice if area (or at least width) of box were proportional to sample size (N). In some box plots the width of the box is proportional to log N, but not in Stata.

Extra Credit Report within-subject SD (4.0) as a measure of reliability. Calculate repeatability (11.0) Bland-Altman plot with mean difference and 95% limits of agreement* * Nobody did this.

We assessed inter-rater reliability of the neuropsychological test scores by having different examiners re-test 198 of the children. The within- subject standard deviation was 4.0, so the “repeatability” was 11.0, meaning that two examiners of the same subject would score within 11 points of each other 95 percent of the time. (Bland and Altman, BMJ 1996; 313:744) Methods or Results?

N = 142 (children examined by both Satcher and Richmond) Mean Difference = 0.49 (95% CI – 1.38) 95% Limits of Agreement: –

What Have You Learned? The meaning and importance of the terms “normalization”, “primary key”, and “foreign key”. The difference between a flat-file database, and a normalized, multi-table relational database. A little bit of Microsoft Access 2000 Querying data Exporting data for analysis in a statistical package

Guidelines for Data Management in Clinical Research 1. Establish the database tables, their rows and columns, and their relationships correctly at the outset. A poorly organized database makes data maintenance and retrieval nearly impossible. Make sure the data are normalized. Try to avoid duplicate data entry or redundant storage. Sometimes it helps to start with the data collection forms, but remember, you do NOT need one table per data collection form. In the labs you learned that one form can combine data from several tables. And data from one table can appear on several forms.

Start with Data Tables or Data Collection Forms? It doesn’t matter as long as the process is iterative. Can start with the tables and then develop the forms, test the forms, find problems, and update the tables. Can start with a word-processed form, create the tables, test, and update.

Guidelines for Data Management in Clinical Research 2. Establish and follow naming conventions for columns and tables. Short field names without spaces or underscores are convenient for programming, querying, and other manipulations. Instead of spaces or underscores, use “IntraCaps” (upper case letters within the variable name) to distinguish words, e.g. “SubjectID”, “FName”, or “ExamDate”. Table names should be singular, e.g. “Baby” instead of “Babies”, “Exam” instead of “Exams”.

Guidelines for Data Management in Clinical Research 3. Obtain baseline demographic and clinical information about members of the study population from existing computer databases. Avoid re-entering data which are already available (in digital formats) from other sources. In the JIFee Study, the patient demographic data and contact information are obtained from the hospital database. Computer systems can almost always produce text-delimited or fixed-column-width character files that the database management system can import.

Guidelines for Data Management in Clinical Research 4. Minimize the extent to which study measurements are recorded on paper forms. Enter data directly into the computer database or move data from paper forms into the computer database as close to the data collection time as possible. When you define a variable in a computer database, you specify both its format and its domain or range of allowed values. Using these format and domain specifications, computer data entry forms give immediate feedback about improper formats and values that are out of range. The best time to receive this feedback is when the study subject is still on site.

On-screen vs. paper forms You can always print out a paper copy of the screen form or a report of the exam/interview results once the data are collected. Examples: ATM Machine’s printed transaction record, Gas Station’s printed receipt

Guidelines for Data Management in Clinical Research 5. Follow standard data entry conventions. Several conventions for data entry and display have developed over time. Although most users of screen forms are not aware of these conventions, they have come to expect them subconsciously. For example, a series of mutually exclusive, collectively exhaustive choices is usually displayed as an “option group” consisting of several different “radio buttons”, whereas choices which are not mutually exclusive are displayed as check boxes. N.B. An “option group” of mutually exclusive choices is a single column or field. A group of N check boxes represents N yes/no fields.

Use check boxes when options are not mutually exclusive. (5 fields) Use radio buttons when options are mutually exclusive. (1 field) Computer chart abstraction form showing two common data entry conventions.

On-Screen Data Collection Forms Will return to this at the end of lecture, time permitting.

Guidelines for Data Management in Clinical Research 6. Back up the database regularly and check the adequacy of the back up procedure by periodically restoring a file from the back up medium.

Demonstration (BLOB) Memo fields in the Infant Jaundice Database Word Document Fields on the “Class” form of the ATCR Student Database Photograph fields in the ATCR Student Database Field types are not limited to numbers, text, dates. You can put an “object”, such as a Word document or a photo, in a field

“Front End” vs. “Back End” “Back End” – Tables and Data “Front End” – Forms and reports for entering and viewing the data Are queries part of the “front end?” Access databases that we will be using combine “back end” (tables and relationships) with “front end” (forms, reports, and queries). QuesGen uses MySQL for “back end” and PHP for “front end.” Neuro Clinics database uses MS SQL Server for “back end” and Visual Basic for “front end.”

Four Types of Research Database 1.Combination of paper files, Excel spreadsheets, and direct keyboard entry into the statistical analysis package.* 2.Desktop multi-table relational database.** 3.Client-Server or “Enterprise” multi-table relational database.*** 4.Internet database server.*** *Can do yourself ** Might be able to do yourself ***Definitely need to hire help

Desktop multi-table relational database.** All study data in one database (with many tables) Proper normalization eliminates redundancy and opportunity for inconsistencies Can enforce referential integrity Easy to develop on-screen forms for data entry and viewing Graphical querying tool Report writer **Might be able to start yourself. Eventually may need to hire help.

Client-Server or “Enterprise” multi- table relational database.*** Richer security model Detailed auditing of data entry and revision BIG databases Transaction-intensive databases ***Need to pay somebody to build it.

Web-Enabled Research Platform*** Browser based entry from anyplace with an internet connection. Enterprise database back end Available as a hosted service ***Not a “do-it-yourself” proposition.

Desktop DBMS Microsoft Access Claris Filemaker Pro Paradox Microsoft Visual FoxPro Dataease The processing of records is done by the desktop. The server simply stores files (file server).

Client-Server DBMS Microsoft SQL Server Oracle Informix Sybase MySQL The processing of records is done by the server. The desktop manages the screen, but passes queries on to the server. (Just to confuse things, MS Access can be a client for SQL Server, and other enterprise systems. The ultimate in “thin” clients is a browser (e.g., Internet Explorer, FireFox). In this case, the server is an intranet or internet database server.)

File Server Client Machine File Server vs. Client Server SQL Server Client Machine Workstation does all the “thinking”… Server thinks too!

Build a prototype database in Access using what you learned in this course. (You will describe this prototype in Assignment 4.) Pay someone to help you –Turn your prototype Access database into a production system Split into front and back ends Make available via Remote Desktop –Upsize to an enterprise database –Convert to a web-enabled research platform Suggested Approach

Advice on Building a Desktop Multi- Table Relational Database for your Study Build it yourself using what you learned in this class—with occasional help from a database expert Budget $500-$1000 per month out of your grant for database consulting during the design phase. Take advantage of your departmental resources. Take advantage of campus resources. Don’t confuse database development with network administration and systems management.

Costs There is no adequate, on-campus resource for database design consulting. (If there were, it would cost $100/hour just like biostatistical consulting.) Independent database consultants also cost $100+ per hour.

Costs The JIFee Study developed a comprehensive database for study administrative data as well as results. They have a full time project coordinator and spent about $10,000 on database consulting. Total cost of the JIFee Database in time and money was at least $25,000.

Departmental Resources Your department should provide you with a networked desktop computer, as well as network support, server access, and database hosting. However, the departmental computer person will NOT be able to help you with database design or development. System administrators do not and cannot build database management systems.

Campus Resources CTSI Research Database Consulting Service –Still doesn’t exist –Investigators with career development grants will get priority –Consulting assistance will cost recharge $$. Independent consultants. Other campus resources? Library? PSG?

Investigators Using Data Management Skills from this Course

Petra Liljestrand (Study Coordinator, JIFee) Jessica Zegre (Study Coordinator, Immemdiate AIM) Jon Zaroff (PI, CRASH) Jim Quinn (PI, ED Syncope and Dog Bite Studies) Candice Wong (PI, Chinese Smoking Cessation Study) Rebecca Sudore (Co-Investigator, Advance Directives Study) Cari DeLoa (Study Coordinator, MS Genetics) Mark Pletcher (PI, Alcohol Withdrawal Study) Matthew Riley (PI, Pediatric NAFLD Study) Grace Yoon (PI, ATS Study) Cathy Lomen-Hoerth (PI, ALS Studies) Jay Garg (Co-Investigator, ESRD and Hypertenstion) Irina Garadetskaya (Study Coordinator, CRIC) Serge Lindner (PI, Advance Directives)

Anil Sapru (PI, Pediatric DKA) Emily Von Scheven (PI, Pediatric Rheumatology Studies) Roberta Keller (PI, Neonatology Studies) Carolyn Hoppe (Pediatric Hematology) Mark Eisner (PI, Kaiser COPD Study) Heidi Flori (PICU Studies, CHO) Matthew Reeves (OB/Gyn) Lee Zane (PI, Stress and Acne Study) Yvonne Wu (Pediatric Neurology) Neera Gupta (Pediatric Gastroenterology) Dan Raz (Thoracic Surgery) Annette Sohn (Pediatric ID)

Data Management Protocol General description of database Data collection and entry Error checking and data validation Analysis (e.g., export to Stata) Security/confidentiality Back up

General Description of Database DBMS, e.g. MS Access XP # of dynamic tables # of static “lookup” tables # of forms # of reports An appendix should include the relationships diagram, the table names and descriptions, and the field names and descriptions (data dictionary). Print relationships diagram using either “Print Relationships” or taking a screen shot.

Data Collection and Entry Import baseline data from existing systems Import lab results, scan results (e.g. DEXA), holter monitor data, and other digital data. For each form, who will collect the data? Collect onto paper forms and then transcribe? Enter directly using screen forms? Scannable forms?

Error Checking and Validation Database automatically checks data against the range of allowed values. Periodic outlier detection. (Outliers still within the range of allowed values.) Calculation checks Is double data entry really needed ?

Analysis How will you get the data out of the database?

Security/Confidentiality Keep identifying data (name, SSN, MRN) in a separate table. Link rest of DB to this table via a Subject ID that has no meaning external to the DB. Restrict access to identifying data. Password protect at both OS and application levels. Audit entries and updates.

Back ups Ask your system person to restore a file periodically. This tests both the back-up and restore systems.

Assignment 4 Data Management Protocol Write a one-page data management section for your research study protocol or a one-page description of your current research study database. At the beginning of your assignment, for the readers, briefly describe your study, including design, predictors, outcomes, target population, and sample size. (1 or 2 sentences) Include with your assignment a relationships diagram showing the structure of your study database. Send assignment to by 2/12/2007.

On-Screen Data Collection Forms Will demonstrate using the “race” field from the Infant Jaundice Study Free text versus coded response Single response (mutually exclusive choices) versus “all that apply”

Free Text vs. Coded Responses Same as “Open-Ended” vs. “Closed-Ended” Questions Free text responses useful in developing coded response options.

Mutually Exclusive, Collectively Exhaustive Response Options One field (=column) Can always make responses exhaustive by including an “Other” response Drop down list (combo box) vs. pick list (field list) vs. option group

Drop-down List (Combo Box) Saves screen real estate Doesn’t work on paper forms (Master form)

Combo Box

Pick List (Field List) Uses up screen real estate Useful on paper forms (MasterRaceAsFieldList form)

Field List

Option Group Radio buttons (by convention) Uses up screen real estate (MasterRaceAsOptionGroup form)

Option Group

Mutually Exclusive = One Field

“All that apply” Multiple fields (= columns) Use check boxes (by convention) (MasterRaceAsAllThatApply form)

All That Apply

“All that Apply” = Multiple Fields