P20 Seminar November 12, 20091 Statistical Collaboration Part 1: Working with Statisticians from Start to Finish Part 2: Essentials of Data Management.

Slides:



Advertisements
Similar presentations
MICS4 Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Overview of Data Processing System.
Advertisements

Radiopharmaceutical Production
C6 Databases.
Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
ISP 121 Week 1 Introduction to Databases. ISP 121, Winter Why a database and not a spreadsheet? You have too many separate files or too much data.
Spreadsheet design an overview of further issues Research Methods Group Wim Buysse – ICRAF-ILRI Research Methods Group October 2004.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Electronic EDI e-EDI. The EDI has been in use since 1999 using a paper-based system and computerized spreadsheets to collect and manage EDI data. Over.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
Chapter Sixteen Starting the Data Analysis Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Today’s Lecture application controls audit methodology.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Analytical Aspects of Audit Stephen Allen, ACBA. Aspects Covered Sample selection from paper lists or multiple sources Simulating applications in MS Excel.
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Data Quality Data Cleaning Beverly Musick, M.S. May 20, This module was recorded at the health informatics –training course— data management series.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
LSP 121 Week 1 Intro to Databases. Welcome to LSP 121 Quantitative Reasoning and Technological Literacy II Continuation of quantitative data concepts.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
PHP meets MySQL.
Excel. Spreadsheet Software  What Is a Spreadsheet, and How Does It Work? A spreadsheet program allows users to perform simple and complex sorting. It.
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
Research Methodology Lecture No : 21 Data Preparation and Data Entry.
TIMES 3 Technological Integrations in Mathematical Environments and Studies Jacksonville State University 2010.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
IB ITGS Case Study. Introduction: Serving thousands of clients, it is method of environment-friendly green ticketing. User friendly system which minimizes.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 1 Statistics: The Art and Science of Learning from Data Section 1.3 Using Calculators.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Test and Review chapter State the differences between archive and back-up data. Answer: Archive data is a copy of data which is no longer in regular.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
ITGS Databases.
PROCESSING, ANALYSIS & INTERPRETATION OF DATA
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
The Marketing Research Process Overview. Learning Objectives  To learn the steps in the marketing research process.  To understand how the steps in.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Lexile Project Guidelines for Data Collection and Analysis.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Creating a data set From paper surveys to excel. STEPS 1.Order your filled questionnaires 2.Number your questionnaires 3.Name your variables. 4.Create.
Coding Preparing The Research for Data Entry. Coding (defined) Coding is the process of converting questionnaire responses into a form that a computer.
Data Entry, Coding & Cleaning SPSS Training Thomas Joshua, MS July, 2008.
Saving Everyone’s Time and Energy: Practical Tips for Database Design Cynthia Wilson Garvan PhD Statistics, MA Mathematics College of Nursing
Data Management in Clinical Research
Data quality & VALIDATION
DATA TYPES.
Introduction to the SPSS Interface
GO! with Microsoft Office 2016
GO! with Microsoft Access 2016
Databases.
REDCap 201: Leveraging REDCap’s Advanced Features
Finding Answers through Data Collection
Spreadsheets, Modelling & Databases
The ultimate in data organization
Introduction to the SPSS Interface
Presentation transcript:

P20 Seminar November 12, Statistical Collaboration Part 1: Working with Statisticians from Start to Finish Part 2: Essentials of Data Management

P20 Seminar November 12, Objectives Participants will learn about: process of consulting and collaborating with statistician general principles of database setup, data entry, verification, cleaning and storage

P20 Seminar November 12, Part 1: Working with Statistician from Start to Finish Kay Savik, MS

P20 Seminar November 12, Collaboration “Collaboration implies that statistician and researcher want to learn and exchange information. This exchange should be mutually beneficial.” Gerald van Belle

P20 Seminar November 12, Types of Consulting Cross sectional - statistical advice for data already collected or analyzed Longitudinal – a long term relationship between statistician and researcher

P20 Seminar November 12, First Meeting Intent of study Source of data Sampling unit Randomization Model of effects Type of study Type of data

P20 Seminar November 12, First Meeting What is the research question? What level of statistical knowledge does researcher have? What are the data and what form are they in? What are the conventions in this specific area of study?

P20 Seminar November 12, The Conversation To prevent type III error – the right answer to the wrong question! Clarify research aims Appropriate design Measurement Data management Analysis

P20 Seminar November 12, Analysis Choice Sir David Cox – “Begin with very simple methods and, if possible, end with simple methods” Rinndskopf’s Rules of Statistical Consulting – “Sometimes the “best” or “right” statistical procedure is not the best for a particular situation.”

P20 Seminar November 12, Which Statistical Package? There is not one “perfect” software for any procedure All standard packages have been tested and are reliable “Specialized” procedures are found in several packages

P20 Seminar November 12, Collaborate Rather than Consult Collaboration is a communal activity Decide who is responsible for what at first meeting Politely and quickly leave a collaboration where any party seems misguided or unethical Decide on questions of authorship at first meeting

P20 Seminar November 12, Part 2: Essentials of Data Management (DM) Olga Gurvich, MA

P20 Seminar November 12, Data Management Essential part of any research Interactive and collaborative venture of both investigator and statistician Requires a well-defined in advance system and consistency in its implementation

P20 Seminar November 12, Data Management Stages Database setup Raw data collection [who, what, when, how] Raw data entry, verification and cleaning Data storage [Data re-structuring for statistical analyses] [Data analysis] Data archiving

P20 Seminar November 12, Database Setup - Software Choice mainly depends on Amount of data to be collected Complexity of data structure Type of data Export/import capabilities to/from Planned statistical analyses and software Software: try avoiding Excel SPSS, ACCESS, EpiInfo, output of survey software, plain text (ASCII)

P20 Seminar November 12, Database Setup – Structure Participants => rows ; variables => columns Logical Record: one row contains all data for a single study participant Multiple Record: multiple rows per single participant Relational: multiple data files that can be merged

P20 Seminar November 12, Database Setup - General Give short, meaningful and “dated” name DB given to a statistician for cleaning and analyses should include - ONLY collected raw data; - NO graphs, comments, titles, summaries, hidden rows, split-spreadsheets, multiple spreadsheets, imposed “special” formats or highlighting

P20 Seminar November 12, Database Setup - Variables Set unique numeric ID(-s) in 1st column (-s) Identify types of variables, measurement units and type of recording [auto/manual] Carefully choose variables’ format and length Dates format MM/DD/YYYY; if parts are missing, create three separate variables Time format dd hh:mm:ss or similar

P20 Seminar November 12, Database Setup - Variables Create separate variable for every separate piece of information Give unique, short [6-8 char], meaningful names No special characters [!, %, $,spaces] Do not start with a number Consider other restrictions of specific software [e.g., lower/upper case letters]

P20 Seminar November 12, Database Setup - Coding Assign short and meaningful codes; consistent for same-response variables Use numeric (if possible) coding; do not combine num and char codes within a numeric variable Address missing values Avoid using “N/A”, “?”, etc. entirely

P20 Seminar November 12, Database Setup – Codebook/Data Dictionary A written handbook with information on study data: Study title, PI name, date of last update, DB name and location # of observations, # of variables Study variables and their attributes [name, label, location (ASCII), coding (values), format, measurement units] Other [formulae, weights, scoring documentation, etc.]

P20 Seminar November 12, Data Entry, Verification and Cleaning Ultimate aim is a fully-documented backed-up archive of verified, validated and ready-for-use data

P20 Seminar November 12, Data Entry “Do it promptly, completely and consistently” Preferably one trained data entry person [unless double entry] Unique ID (-s) All the data must be entered in its “raw” form directly from the original records - NO hand calculations Frequent back-up

P20 Seminar November 12, Data Verification and Cleaning Optimally done by a statistician or DM professional in close collaboration with investigator Includes (but not limited to) general and logic checks to detect errors and outliers, verification of data completeness (subjects and variables) Audit trail/log book for a complete record of changes made Following all necessary corrections, ONE FINAL CLEAN DB is created

P20 Seminar November 12, Data Storage Stored on a password-protected server are 1. ONE INITIAL RAW DB 2. ONE FINAL CLEAN DB 3. CODEBOOK 4. Audit trail or log book [if used] Frequent BACK-UPs are performed All previous DB versions EXCEPT the initial raw one are destroyed

P20 Seminar November 12, Data Re-Structuring If not foreseen in advance, may be needed for certain analyses Usually can be done in statistical packages Keep a record of any re-structuring Use “version-” or “date-numbering” system

P20 Seminar November 12, Data Archiving At the end of a project, the data, codebook, log-book and programs [syntax] must be archived The archive serves as a permanent storage and gives access to all project-related information Keep a copy of the archive and detailed report of the archive’s structure