This presentation is made available through a Creative Commons Attribution-Noncommercial license. Details of the license and permitted uses are available.

Slides:



Advertisements
Similar presentations
This presentation is made available through a Creative Commons Attribution- Noncommercial license. Details of the license and permitted uses are available.
Advertisements

Introduction to Likelihood Meaningful Modeling of Epidemiologic Data, 2012 AIMS, Muizenberg, South Africa Steve Bellan, MPH, PhD Department of Environmental.
PG Planning the Network Task M3/M8 Performance Indicators Issues around delivery of Action Plan 20 March 2007.
Determinants of Penicilliosis Seasonality in Ho Chi Minh City, Vietnam Philip L. Bulterys, 1 Thuy Le, 2,3 Vo Minh Quang, 4 Kenrad E. Nelson, 5 James O.
February 3,  1/20/11-1/21/11 – IEEE CSU Engineering Summit  1/22/11 – IEEE UCLA Student Professional Awareness Venture.
Line Efficiency     Percentage Month Today’s Date
This presentation is made available through a Creative Commons Attribution- Noncommercial license. Details of the license and permitted uses are available.
NAEP 2008 Trends in Academic Progress in Reading and Mathematics.
1 st Meeting, January 10 th, 2012 U-Hour To Contact Us : Pre-Pharmacy Society Website Pre-Pharmacy Society Facebook.
BSc (Honours) in Environmental Science & Sustainable Technology CR365.
Mission-Based Management January 2008 Electronic CV System Users Group.
Including images with the img tag Skills: using the img tag IT concepts: none This work is licensed under a Creative Commons Attribution-Noncommercial-
Developing a Questionnaire. Goals Discuss asking the right questions in the right way as part of an epidemiologic study. Review the steps for creating.
The CalREN backbone Skills: none IT concepts: backbone topology, LAN backbone connection, backbone management, link traffic statistics, device status statistics.
Study Design and Analysis in Epidemiology: Where does modeling fit? Meaningful Modeling of Epidemiologic Data, 2010 AIMS, Muizenberg, South Africa Steve.
Project: Ghana Emergency Medicine Collaborative Document Title: Open Educational Resources Author(s): University of Michigan Department of Emergency Medicine.
Satisfaction with the quality of care in the integrated chronic disease management model in rural South Africa Soter Ameh 1, Xavier Gómez-Olivé 1,2, Kathleen.
Methods to Study and Control Diseases in Wild Populations Steve Bellan, MPH Department of Environmental Sci, Pol & Mgmt University of California at Berkeley.
What are research data? July 2015 This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0.
This presentation is made available through a Creative Commons Attribution-Noncommercial license. Details of the license and permitted uses are available.
SSuN Cycle 2 Conference call #5 Population-based gonorrhea surveillance Lori Newman & Kristen Mahle November 13, 2008.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
National Chlamydia Screening Programme Chlamydia testing and diagnoses in year olds, England January – December 2014 CTAD Team HIV & STI Department.
Title of Presentation: And a Very Very Very Very Long Subtitle First Last, Title, Institution Second Presenter, Title, Institution Select your preferred.
Theresa L. Smith, MD, MPH Division of Vector-Borne Infectious Diseases Centers for Disease Control and Prevention Fort Collins, CO West Nile Virus Epidemiology.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
AP Style Shenanigans Miss Landon 2.0. Numbers Spell out numbers one through nine. Use numerals for 10 and up. (One, five, 10, 15, 17, etc.) For ages,
Fractions and Decimals
Ronald W. Reagan 9 Jan 15 Make sure you have all documentation Use BLACK INK when completing forms.
Country update on AI situation Ly Sovann, MD, DTMH, MCTM Deputy of CDC Department, MoH, Cambodia.
This presentation is made available through a Creative Commons Attribution- Noncommercial license. Details of the license and permitted uses are available.
This presentation is made available through a Creative Commons Attribution- Noncommercial license. Details of the license and permitted uses are available.
Learn to identify rational numbers and place them on a number line. Course Fractions and Decimals.
Presented at the UCI Undergraduate Research Symposium by Rebecca Christensen May 15, 2004 Social Support and Foster-Care Children’s Adjustment: A Comparison.
EML Analysis Tools Introduction Ecoinformatics Working Group Taiwan Forestry Research Institute (TFRI)
Introduction to CIS 301 Professor Rosenthal Only required enterprise technology course.
OMICS international Contact us at: OMICS International through its Open Access Initiative is committed to make genuine and.
This presentation is made available through a Creative Commons Attribution- Noncommercial license. Details of the license and permitted uses are available.
OASIS O NLINE A CCESS TO S TUDENT I NFORMATION AND S CHEDULING.
Course Title: Using Epi Info™ 7 Using Classic Analysis (Continuation) April Epi Info™ 7 Training Software for Public Health Epi Info™ 7 Training.
Fitting Mathematical Models to Data Adapting Likelihood Based Inference Meaningful Modeling of Epidemiologic Data, 2010 AIMS, Muizenberg, South Africa.
HOW TO REQUEST ARTICLES FROM OTHER LIBRARIES IN SOUTH AFRICA Antoinette Lourens March 2005.
OMICS international Contact us at: OMICS International through its Open Access Initiative is committed to make genuine and.
OMICS international Contact us at: OMICS International through its Open Access Initiative is committed to make genuine and.
Place of Employment: University of California, Los Angeles (UCLA) Type of Work: Human population genetics, with emphasis on Sub-Saharan Africa Postdoctoral.
Riders on the Storm Rule 31 compliance in the first nine months.
Poster Title Weak D Types in the Egyptian Population Eiman Hussein, MD 1 and Jun Teruya, MD, DSc 2 Clinical Pathology, Division of Transfusion Medicine.
Research Colloquium on Post-School Education and Training 4 November 2014, Burgers Park Hotel Higher Education and Training Information System Jean Skene,
Why do you need to cite your sources? To give credit to the words and ideas of others. P a r a p h r a s e d T e x t Wordsworth extensively explored the.
TEMPLATE INSTRUCTIONS Please use these instructions to complete your TV Ad Find YOUR Department / Program template or use the general College of Arts &
Instruction This template should be used Only for The Best Employee Engagement category. Template can be modified, subject to your company template or.
Introduction to Statistics
HTW Berlin University of Applied Sciences
Employee engagement The Best Contact Center Indonesia 2018.
مناهــــج البحث العلمي
Data Quality Out of Range Values
MAINTENANCE TRAINING SCHEDULE
Professional Organizations
MONTH CYCLE BEGINS CYCLE ENDS DUE TO FINANCE JUL /2/2015
Let’s Talk About Variable Attributes
Value-Based Payments.
General Online Research Conference GOR February to 2 March 2018, TH Köln – University of Applied Sciences, Cologne, Germany First author, e.g. John.
General Online Research Conference GOR 19 6 March to 8 March 2019, TH Köln – University of Applied Sciences, Cologne, Germany First author, e.g. John.
By Sandeep Patil, Department of Computer Engineering, I²IT
More to Learn Viewing file details
SIRCAUS - BAU March 2018 title title title title title title title title title title title title title title title Author, Author, and Author Address(es)
World Geography Course Team Content and Testing Dates
Technology Title One-line Description Name, PhD Title Department
Value-Based Payments.
Presentation transcript:

This presentation is made available through a Creative Commons Attribution-Noncommercial license. Details of the license and permitted uses are available at © 2010 Dr. Juliet Pulliam and the Clinic on the Meaningful Modeling of Epidemiological Data Title: Introduction to Data Management and Cleaning Attribution: Dr. Juliet Pulliam, Clinic on the Meaningful Modeling of Epidemiological Data Source URL: inghttp://lalashan.mcmaster.ca/theobio/mmed/index.php/Introduction_to_data_management_and_clean ing For further information please contact Dr. Juliet Pulliam

Introduction to data management and cleaning Clinic on the Meaningful Modeling of Biological Data and BSc Honours Course in Biomathematics African Institute for the Mathematical Sciences Muizenberg, South Africa 28 May 2010 Dr. Juliet Pulliam RAPIDD Program, DIEPS Fogarty International Center, NIH, USA and Department of Ecology & Evolutionary Biology University of California at Los Angeles (UCLA)

Levels of data aggregation Aggregated data De-identified data Personally identifying data } Individual-level

Read in the dataset # cases.R # JRCP 06 Jan 2010 # # Last modified: rm(list=ls()) library(foreign) library(date) cases<-read.spss(”cases.sav",to.data.frame=T,trim.factor.names=T) dim(cases) [1] names(cases) [1] "idno" … "sex" … "age" "occup" "dor" "dtadm" "dtill" … "fever" "dtfever" … "cough" "dtcough" … "headache" … "dthead" … "outcome" … "dtdeath" … "igm1" "igg1" "pcr1ser" "dtserum1" "Igm2" "Igg2" "dtserum2" … "virusiso” … "CXR" "ARDS" …

Look at the dataset - format head(cases$dor) [1] cases$dor<-as.date(cases$dor/60/60/ ) range(cases$dor,na.rm=T) [1] 6Mar Mar2008

Look at the dataset - format head(cases$dor) [1] cases$dor<-as.date(cases$dor/60/60/ ) range(cases$dor,na.rm=T) [1] 6Mar Mar2008 Categorical data - factor (levels) Continuous data - numeric (range) Dates - date, POSIXct (range) Binary data - T, F

Look at the dataset - format Categorical data - factor (levels) Continuous data - numeric (range) Dates - date, POSIXct (range) Binary data - T, F "idno" … "sex" … "age" "occup" "dor" "dtadm" "dtill" … "fever" "dtfever" … "cough" "dtcough" … "headache" … "dthead" … "outcome" … "dtdeath" … "igm1" "igg1" "pcr1ser" "dtserum1" "Igm2" "Igg2" "dtserum2" … "virusiso” … "CXR" "ARDS" …

Look at the dataset - logic subset(cases,dor-dtill<0)$idno AT567 range(cases$dor-cases$dtill) [1] NA NA range(cases$dor-cases$dtill,na.rm=T) [1] hist(cases$dor-cases$dtill,20,xlab="Days from onset to report",main="",col=1)

Inconsistencies IDNO: SF999, Year: 2005 dtill given as 16 Feb 2005 dtfever given as 16 Feb 2003 Correction: Change year of dtfever from 2003 to 2005 IDNO: GW321, Year: 2003 dtill given as 12 Jan 2003 dtcough given as 12 Mar 2003 dthead given as 12 Mar 2003 dtdeath given as 15 Jan 2003 Correction: Change month of dtcough and dthead from 12 Mar to 12 Jan

Inconsistencies IDNO: LP309, Year: 2002 dtill given as 12 Aug 2002 dthead given as 11 Aug 2002 Correction: none IDNO: AT567, Year: 2005 dor given as 6 May 2005 dtill given as 14 May 2005 Correction: delete case (sample negative when tested at CDC)

Correct the dataset # DATABASE CORRECTIONS # cases$dtfever[cases$idno==”SF999"]<-as.date("16Jan2005") # Onset 2005; dtfever was 2003 cases$dthead[cases$idno==”GW321"]<-as.date("12Jan2003") # Onset/death Jan; dtfever was March cases$dtcough[cases$idno==”GW321"]<-as.date("12Jan2003") # Onset/death Jan; dtcough was March cases<-subset(cases,idno!=”AT567") # sample negative when tested at CDC Save corrected dataset as new file today<-strptime(Sys.time(),"%d%b%Y") fn<-paste("casesClean",today,".Rdata",sep="") fn [1] "casesClean07Jan2010.Rdata" save(cases,today,file=fn)