Data Management – Overview Claire Osgood November 2017
Children’s Environmental Health Initiative What is “Data Management”? Collecting and organizing data so that it is accurate, complete, secure, and easy to interpret and use appropriately. Poll: Have you ever… --Forgotten what you called a file and/or where you put it? --Discovered duplicate files, and struggled over which to keep? --Tried to recreate an analysis, or tried to pick up work where someone else left off? --Had to submit code and data for a publication? Data sharing is the current trend for research. If you publish, you can expect to be asked to share your data and documentation. Good data management can help with all of these issues.
Children’s Environmental Health Initiative Regulatory Procedures (IRB, DUAs, BAAs…) Retention, Archiving Security / IT Documentation, Metadata DATA MANAGEMENT COMPONENTS Architecture (folders, files, data elements) Audit Trail and Replicability Programming and Processing Quality Assurance Quality Control
Children’s Environmental Health Initiative DM Components Regulatory Procedures (IRB, DUAs, BAAs…) IRB protocols, Data Use Agreements (DUAs), Business Associate Agreements (BAAs), compliance with both university and data supplier policies. Working with the following departments: Industrial Contracts Sponsored Projects & Research Compliance Office of Technology Transfer
Children’s Environmental Health Initiative DM Components Security / IT The infrastructure for a system to store and share files, mechanism for transferring files on/off the system, managing access to the system, managing permissions on the system, training users on security policies and procedures. Working with the following departments: • IT Security • Systems Engineering • Networking Telecom DCO • Campus Services (local IT) The Human Element – You need to do your part to help!
Children’s Environmental Health Initiative DM Components Architecture (folders, files, data elements) Managing folder structure, standards for folder and file names, standard structures for common variables, standards for documentation.
Children’s Environmental Health Initiative DM Components Programming and Processing Converting raw data files received into analysis-ready datasets. Investigating, cleaning, standardizing, and linking data to other sources.
Quality Assurance Quality Control DM Components Quality Assurance Quality Control Checking and verifying all manual and programming processes.
Children’s Environmental Health Initiative DM Components Audit Trail and Replicability Every finalized dataset must have a clean audit trail and must be replicable. Date/Time stamps in chronological order, raw and intermediary datasets exist and are frozen, programs are checked and frozen, documentation is in order.
Children’s Environmental Health Initiative DM Components Documentation/Metadata Document every step in the process, standard documents as well as additional documentation needed, metadata for GIS files
Children’s Environmental Health Initiative DM Components Retention/Archiving This varies depending on the project, IRB protocol, DUAs, and other agreements. Typically files are retained for 3-5 years, then archived for up to 7 years.
Children’s Environmental Health Initiative Regulatory Procedures (IRB, DUAs, BAAs…) Retention, Archiving Security / IT Documentation, Metadata DATA MANAGEMENT COMPONENTS Architecture (folders, files, data elements) What do researchers tend to focus on? When submitting grants, do you include personnel and time for ALL components? Audit Trail and Replicability Programming and Processing Quality Assurance Quality Control
Children’s Environmental Health Initiative Data Life Cycle Collect/ Acquire Document Store un-processed data Process (clean, standardize, construct variables) Verify/ Check Document Processed/Analysis data (cleaned, value-added, analysis ready) Stats/ Analysis Verify/ Check Document Publish results What DM components are used in each step of the life cycle? Archive