Download presentation
Presentation is loading. Please wait.
Published byKrista Gaskell Modified over 9 years ago
1
Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness.
2
STATISTICIANS COME IN MANY SHAPES AND SIZES
4
BUT
6
Data formats Ideally, use a normalized database with validated data entry as part of LIMS… But 99% of the time => Excel spreadsheet Some statisticians prefer to work with raw data (i.e. FCS files) but not common – Scott will cover consistent annotation for raw data at another lecture
7
Basic principle #1 Statisticians do not like Excel – The first thing they will try to do is export to a CSV or delimited file, for import into SAS or R – If this is difficult to do, they will not like you
8
Excel rules for happy statisticians 1 worksheet = 1 table 1 cell = 1 value Data? Metadata? Formatting? Validation?
9
1 worksheet = 1 table A table has column headers and a number of rows and nothing else – it is RECTANGULAR Do not put more than 1 table in a worksheet Do not use non-rectangular tables Example of good worksheet
10
1 worksheet = 1 table
11
1 cell = 1 value Easy to filter by tube, sample or subject Easy to write validation rules or lookup table
12
1 cell = 1 value ID column has 3 different values Need to do text parsing to recover information – very error prone
13
Data: column names Consistent column names across worksheets – Singlets/Lymphocytes – Singlet/Lymphs – Singlets / Lymphocytes – Singlets/Lymphoctyes Use full gating path for column name – Singlets/Lymphocytes/Viable/CD4+/CM/IFN+
14
Data: What to record Better to have more data than less data – Sample type (PBMC, whole blood) – Recovery – Viability Better to have basic than derived data – Counts better than relative frequencies Keep link to raw data for reproducibility – Path to FCS file on server Use special indicator for missing data (e.g. NAN), not zero Can have extra column for notes – Ideally codified so Error 23 rather than “Sample sat > 8 hours before processing”
15
Data: Versioning Do not change the data in the worksheet once it has been handed to statistician. If there are errors that must be corrected, make a new copy, label the filename with date and version, and send that to statistician – ArcticRatExperiment_07May2013_Version01.xlsx – ArcticRatExperiment_17May2013_Version02.xlsx
16
Metadata Should have SOP document for metadata – How missing data is represented (e.g. NA or blank) – Keys for interpretation – e.g. Table of error codes – Contact person: phone #, email – Metadata can be in 2 nd worksheet or separate document Gating scheme with labeled gates matching cell subsets used in column names (PDF or PPT) Panel information – Antibodies, clones, batches, fluorochromes, peptide mixes Path to Flowjo.jo or.xml analysis file
17
Metadata There are minimal information standards that should be followed – MiFlowcyte – MIATA Google for them if you’re not familiar with them – increasingly these are required by journals for publication, so worth making it an SOP for documentation of results
18
Formatting Don’t do it. Avoid putting information via: – Highlighting – Fancy spacing – Different fonts and font effects – Merging cells – Comments Will it survive a round-trip from Excel to CSV and back again?
19
Formatting - Before
20
Formatting - After Comments are lost Highlighting is lost Bad cell formatting is lost Merged cells become missing information
21
Validation Can set up validation rules in Excel to minimize data entry errors: – Number range (0, 10000000) Can use lookup tables for codes to use – E.g. Error codes with explanation If possible, once format for data is decided, get local Excel wizard to create template and lookup rules
22
Questions? If no questions and need to kill time, watch Biologist talks to Statistician video – http://www.youtube.com/watch?v=Hz1fyhVOjr4 http://www.youtube.com/watch?v=Hz1fyhVOjr4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.