Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Slides:



Advertisements
Similar presentations
BASIC SKILLS AND TOOLS USING ACCESS
Advertisements

Someone hands you a a diskette that has data about schools in the City of Cleveland. They tell you that the school file is in a a dBase format. How do.
Chapter 10 Excel: Data Handling or What do we do with all that data?
MS-Access XP Lesson 1. Introduction to MS-Access Database Management System Software (DBMS) Store data in databases Database is a collection of table.
Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness.
Templates and Styles Excel Advanced. Templates are pre- designed and formatted spreadsheets –They provide consistency of layout/structure –They.
How to enter data in SPSS
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
WebDFS Budget Amendment and Personnel Processing.
Exploring Microsoft Excel 2002 Chapter 7 Chapter 7 List and Data Management: Converting Data to Information By Robert T. Grauer Maryann Barber Exploring.
33 CHAPTER BASIC APPLICATION SOFTWARE. © 2005 The McGraw-Hill Companies, Inc. All Rights Reserved. 3-2 Competencies Discuss common features of most software.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
THE BRIEF PSYCHIATRIC RATING SCALE SYSTEM Senior Project by John Newman.
A Simple Guide to Using SPSS© for Windows
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
10 March Setup Users. 10 March Setup Users Window Allows you to perform several user tasks –Enroll users –Enable/disable users –Set user access.
Basic Concept of Data Coding Codes, Variables, and File Structures.
©2001 Chariot Software Group Using MicroGrade Classroom Management Software.
Pasewark & Pasewark 1 Access Lesson 6 Integrating Access Microsoft Office 2007: Introductory.
1 Access Lesson 6 Integrating Access Microsoft Office 2010 Introductory Pasewark & Pasewark.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Electronic EDI e-EDI. The EDI has been in use since 1999 using a paper-based system and computerized spreadsheets to collect and manage EDI data. Over.
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
Sue Lowry Biostatistical Design and Analysis Center (BDAC) Clinical and Translational Science Institute Academic Health Center University of Minnesota.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
September 5, 2015 Office Setup. Lesson Overview: Office Setup  In this lesson we will cover:  Adding new offices to COM  Individual office setup 
Microsoft Word 2000: Mail Merge Basics Peggy Serfazo Marple Molly Calvello Support Professionals Business Applications - Desktop Microsoft Corporation.
Excel / 1 Electronic Spreadsheets What is an electronic spreadsheet or worksheet ? It is a Computer Software package which allows a user to Manipulate.
Gadgets & More…. “Date Range” Gadgets Allows you to choose a specific date, before or after a date or a range of dates using the Workflows calendar.
Introduction to Spreadsheet Software. Spreadsheets and Their Uses Examples of Charts Spreadsheet Basics Spreadsheet Map Types of Spreadsheet Data Navigating.
Introduction to SPSS Edward A. Greenberg, PhD
Data management in the field Ari Haukijärvi 2nd EHES training seminar.
4/22/2017 5:36 PM EViews Training Creating Workfiles.
Creating a Web Site to Gather Data and Conduct Research.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
Microsoft Access Lesson 1 Lexington Technology Center February 11, 2003 Bob Herring On the Web at
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
Chapter 17 Creating a Database.
33 CHAPTER General- Purpose APPLICATION SOFTWARE.
Office Management Tools II Ms Saima Gul. Office Management Tools II Ms Saima Gul.
Databases,Tables and Forms Access Text by Grauer Chapters 1 & 2.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
EDExpress Training Presented by Doug Baldwin – CPS/SAIG Technical Support Bob Berry – U.S Department of Education/FSA.
MySQL Importing and creating a database. CSV (Comma Separated Values) file CSV = Comma Separated Values – they are simple text files containing data which.
Flat Files Relational Databases
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Davisware GlobalEdge 2008 Payroll Main Menu Time Entry and Payroll Processing.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
MSOffice Access Microsoft® Office 2010: Illustrated Introductory 1 Part 1 ® Database & Table.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Copyright 2007, Paradigm Publishing Inc. EXCEL 2007 Chapter 8 BACKNEXTEND 8-1 LINKS TO OBJECTIVES Import data from Access, a Web site, or a CSV text file.
Data quality & VALIDATION
Excel Tutorial 8 Developing an Excel Application
New Perspectives on Microsoft Access 2016
Software Application Overview
Practical Office 2007 Chapter 10
Developing Forms and Subforms.
ADE EDIS READ & Optimizer TRAINING Colorado Department of Education
Performing Mail Merges
REDCap 201: Leveraging REDCap’s Advanced Features
Microsoft Excel All editions of Microsoft office.
Case Study Creating a Database
UIT Remark Exam Grading Instructions University Information Technology
Microsoft Excel 2007 – Level 2
Tutorial 12 Managing and Securing a Database
Presentation transcript:

Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of Public Health

Overview Principles of Data System Design Data entry and management systems –Self managed systems How to manage data in MS Excel –Programmer managed systems How to manage data in MS Access Sample data sets How to prepare data for statistical analysis Confidentiality/security 7/20/2010Introduction to Clinical Research2

Principles of Data System Design Data Input/Entry –What resources are available –Amount of data –Set up time versus usage –Double data entry Data Validation –Data type (e.g., numeric, date, text ) –Range checks –Missing data –Violation of protocol checks –Coding and spelling errors –Consistency checks 7/20/2010Introduction to Clinical Research3

Principles of Data System Design (2) Data Audit –Computer audits –Manual audits Data Edit –Single line through incorrect value on data form, write correct value, initial and date –Make same changes to database file Data maintenance –Single database file Data archive –Have plans to archive data after the end of the study –Data can be archived on a CD –Data need to be stored for at least five years 7/20/2010Introduction to Clinical Research4

Principles of Data System Design (3) Identification of Study Participant –Do not use names, hospital history numbers or Social Security numbers –Patients should be identified with a unique study assigned ID number –Maintain a log linking the patient’s name and other personal information to the study ID Kept separately under lock and key or encrypted and password protected Only personnel who need access to information should have it –HIPAA guideline compliant – collect the least amount of protected health information (PHI) needed for the study 7/20/2010Introduction to Clinical Research5

Data Management Computer Systems Self-managed computer systems: –Spreadsheets Excel, Lotus Programmer managed computer systems: –Database management software Access, dBase –Statistical software SAS, SPSS, Stata –Web-based systems Gsurvey, REDCap –Fax/Teleform systems 7/20/2010Introduction to Clinical Research6

Self-managed Systems Advantages –Self managed –Convenient for small data sets –Descriptive statistics and graphics available Disadvantages –Data types are defined by first few entries –Not conducive to data validation –Cumbersome for very large data sets –Forms need to be designed separately –Repeated column names or no column names allowed –Data codes are entered manually into a separate file –Unable to do consistency checks across forms 7/20/2010Introduction to Clinical Research7

Creating an Excel Spreadsheet Unique variable names should be in the first row Data should be in column format Data in the same column should be of the same data type Some data validation features are available Data audit features are available for existing spreadsheets 7/20/2010Introduction to Clinical Research8

Programmer-managed Systems Advantages –Friendlier data entry environment –Computerized data validation –Ability to perform consistency checks within and across tables –Ability to track editing changes –More manageable for large data sets Disadvantages –Require more up front planning and resources –Require database knowledge to develop a file –User training 7/20/2010Introduction to Clinical Research9

Creating an Access Database Primary Key(s) –Must be unique and not missing –Indexes on this value Be careful of “Default value” –Default setting is zero for numeric data (pre 2007 versions) Use “Required” only when necessary –Will not allow field to be left blank 7/20/2010Introduction to Clinical Research10

Sample Dataset 1: Binary/Continuous Data IDagegenderheighttreatmentdisease /20/2010Introduction to Clinical Research11

Sample Dataset 2: Survival Data IDstartdteventdtfudtsurvivalcens 108/04/0501/01/ /01/0701/08/ /07/0403/24/ /23/0108/01/ /20/0607/20/ /16/0001/14/ /13/0203/21/ /20/2010Introduction to Clinical Research12

Sample Dataset 3: Longitudinal Data (Vertical) IDVisitTreatmentSBP /20/2010Introduction to Clinical Research13

Sample Dataset 4: Longitudinal Data (Horizontal) IDTreatmentSBP1SBP2SBP /20/2010Introduction to Clinical Research14

Sample Datasets What not to do! Real life examples 7/20/2010Introduction to Clinical Research15

Preparing Files for Statistical Analysis Allow adequate time for data preparation Better the quality of data, less time analyzing –Know your data Look at frequency distributions and scatterplots –Multiple checks for errors –Minimize missing data if at all possible Be aware of amount of data missing and why Freeze the dataset –Copy to another file and date the file –Document any corrections made to file and also correct in original database and on forms Plan on recoding categorical variables so each group has a sufficient sample size Prepare a separate code sheet for data 7/20/2010Introduction to Clinical Research16

Preparing Files for Statistical Analysis (2) General spreadsheet design –One line header row with a unique one word name for each variable –Do not mix data types within one column –Unique identifying number for each case –Only include raw, un-summarized data, i.e., no summary statistics or graphs in spreadsheet –Date format with four digit years –Avoid underlining, bold fonts, or italics –Do not leave blank rows or columns in between data –Do not use a row to label a group, use a grouping variable (column) 7/20/2010Introduction to Clinical Research17

Preparing Files for Statistical Analysis (3) Missing Data –Consider what software will be used for analysis –Use different codes to indicate reason missing e.g., not applicable, unable to complete, or missing –If numeric field Must not be a valid data point Do not use text, such as “NA”, “missing”, “*” 7/20/2010Introduction to Clinical Research18

Exporting Excel data into Stata Save the Excel file as comma delimited –In the Save As dialog box choose CSV(comma delimited) for Save As type In Stata –Go to drop down menu “File” – “Import” – “ASCII data created from a spreadsheet” – or use the command ‘insheet using “filepath.csv”, comma names’ 7/20/2010Introduction to Clinical Research19

Exporting Access Data into Stata Save Access file as a Comma Delimited File –Open the Access table –From the File Menu, select Export –In the pop-up dialog box click on “Save as file type” and select Text Files –Click “Save All” and in the Export Text Wizard select delimited and comma Follow instructions for importing a comma delimited file into Stata 7/20/2010Introduction to Clinical Research20

Data Transfer Software Software is available to transfer data between applications Stat/Transfer and DBMS/COPY –Access, ASCII, dBase, Epi Info, Excel, JMP, Paradox, QuattroPro, SAS, S-Plus, SPSS, Stata, Statistica –Need to update as software updates 7/20/2010Introduction to Clinical Research21

ICTR Resources Data management and statistical support available for ICTR (CRU) protocols – Computer facilities with data management and statistical software –Located in Carnegie 446 – –Gsurvey, Teleform, scanning, sample size programs, statistical software, data transfer software 7/20/2010Introduction to Clinical Research22

Data Security and Confidentiality Do not include unnecessary protected health information on research data files –No names, addresses, phone numbers, social security numbers –No medical record numbers Files that link the study ID to the PHI should not be maintained on removable storage drives or laptops ing of data files should be limited –If PHI are on the file, then the files should be encrypted and password protected –Do not the password Use JShare or SharePoint to share or transfer data files 7/20/2010Introduction to Clinical Research23

Summary Well designed systems minimize data errors and future problems Data management systems should be chosen based on resources and individual needs –Spreadsheets are appropriate for small and simple data sets –Databases provide more options for data management Add simple validations to check data entry Following guidelines for preparing files for statistical analysis will save time Data transfer software is available to transfer data between applications Limit PHI and keep data secure 7/20/2010Introduction to Clinical Research24