Download presentation
Presentation is loading. Please wait.
Published bySilvia Reed Modified over 9 years ago
1
Data Entry and Assembly
2
Data Acquisition Best Practices for Creating Data Data Entry Options Data Manipulation Options Gathering Existing Data
3
Data Acquisition After completing this lesson, the participant will be able to: ◦ Describe the characteristics of an easily understood and manipulated dataset ◦ Identify data entry tools and validation measures that can be done as data is entered ◦ Define what a relational database is and why it is useful ◦ Describe documentation associated with using existing data
4
Data Acquisition CollectAssureDescribeDepositPreserveDiscoverIntegrateAnalyze
5
Data Acquisition Whether recording data on paper forms or entering data digitally, accuracy and usability are important Have a plan for storing your data before you collect it The quality of your data records should be defensible Following proper procedures when creating data results in digital datasets that are: ◦ Valid ◦ Organized ◦ Easy to understand ◦ Easy to subset
6
Data Acquisition Inconsistency between data collection events (& more than one activity type) – Date in title or as a data element – Column names vary – Order of columns differs Other Issues – Inconsistent Date formats – Different Site spellings – Typos in Site spellings – Poor column names (ex: Acult) – Inappropriate column use (‘got away’, J, N in Acult column) – Mean1 Label is in Species column – Text and numbers in same column
7
Data Acquisition Descriptive File Name Column data types are consistent: – only numbers, dates, or text Consistent Names, Codes, Formats (date) used in each column Similar data (separate samples) are all stored in one table with a single format – computing is simpler using a single table than multiple, unrelated small tables that require a lot of individual human manipulation and synchronization
8
Data Acquisition Create simple, consistently formatted file names. Consider using a date suffix in the name to help with version control. MyFieldData_20110910a.xls Avoid using -,+,*,^, /, and other higher ASCII characters in column names. Software may interpret them as operators or reserved characters. Create readable, descriptive column names without spaces or special characters ◦ Soil T30 Soil_Temp_30cm ◦ Species-Code Species_Code
9
Data Acquisition DateTimeNO3_N_ConcNO3_N_Conc_Flag 2008101113000.013 2008101113300.016 200810111400M 2008101114300.018 2008101115000.001Est ◦ Whenever possible, leave the value empty (NULL = no value) ◦ Use a Data_Flag column to qualify missing values (and other issues) Ex: NA, Est, Null (no qualification) ◦ For numeric fields, you might need to provide a distinct unlikely value such as 9999 to indicate a missing value ◦ For text fields, “NA” may be appropriate ( for “Not Applicable” or “Not Available”) M = missing; no sample collected Est = estimated from grab sample
10
Data Acquisition Enter complete lines of data, don’t assume a person or program will know to ‘fill in below’ for cells in a spreadsheet Sorting can remove the best of orderly intentions! Also avoid creating columns that are rarely used
11
Data Acquisition Document the (authoritative) data source Obtain and keep all metadata about the format, content, quality, and limitations of the data Document any data-sharing agreements or data-use constraints Before use, evaluate the data for its fitness to your needs Track the provenance of the data in derived data products Consider informing the data source of the use of their data
12
Data Acquisition Spreadsheets Databases Googledocs Forms
13
Data Acquisition Control data entry by using standardized pick lists Also look into using Excel’s automatic ‘data form’ feature
14
Data Acquisition 20 Avoid entry errors by trapping values - by range, type, etc. Let the spreadsheet do the work of warning you!
15
Data Acquisition
16
Data Acquisition
17
Data Acquisition Great for Charts, Graphs, Calculations Poor choice for large or complex datasets Difficult to subset Flexible about cell content type (no type enforcement) Lack record integrity (can independently sort columns relative to other columns) Easy to use – but harder to maintain as complexity and size of data grows (lots of repeats in records) Not a math or graphics tool, but can provide data to those tools Work well with lots of data Easy to query and subset data Data fields are typed – only integers in integer fields Columns cannot be sorted independently of each other (a good thing!!) Slight learning curve compared to a spreadsheet Normalization reduces complexity and entry effort
18
Data Acquisition *siteID site_name latitude longitude description *siteID site_name latitude longitude description Site Each data value is stored in only one place Tables contain attributes describing a single thing (Site) Primary Keys (marked with *) establish row identities, and are shared between tables to link them Based on set theory, the query language provides a concise way of grouping, sorting, filtering, & summarizing *speciesID species_name common_name family order *speciesID species_name common_name family order Species *sampleID siteID sample_date speciesID height flowering flag comments *sampleID siteID sample_date speciesID height flowering flag comments SpeciesSample sets of well-defined tables formal relationships a query language for manipulating data
19
Data Acquisition DateSiteSpeciesHeightFlowering Advantages quality control performance organization Advantages quality control performance organization
20
Data Acquisition DateSiteSpeciesFlowering? 2/13/2010ABOGR2y 2/13/2010BHODRy 4/15/2010BBOER4y 4/15/2010CPLJAn SiteLatitudeLongitude A34.1-109.3 B35.2-108.6 C32.6-107.5 DateSiteSpeciesFlowering?LatitudeLongitude 2/13/2010ABOGR2y34.1-109.3 2/13/2010BHODRy35.2-108.6 4/15/2010BBOER4y35.2-108.6 4/15/2010CPLJAn32.6-107.5 Join tables and their data using queries
21
Data Acquisition DatePlotTreatmentSensorDepthSoilTemperature 2010-02-01CR3012.8 2010-02-01BC1013.2 2010-02-02AN015.1 OR Select * from SoilTemp where Treatment = ‘N’ and SensorDepth = ‘0’ OR Select Max(Soil_Temperature) from SoilTemp Select Date, Plot, Treatment, SensorDepth, Soil Temperature from SoilTemp where Date > ‘2010-01-01’
22
Data Acquisition SAS, R, SPSS Good for calculations, data analysis, subsetting data Also can be used for quality assurance
23
Data Acquisition Be aware of Best Practices when designing data file structures Make sure you adequately document data that you obtain from others, and track the use of those data in your derived data products Choose a data entry method that allows some validation of data as it is entered Consider investing time in learning how to use a database if datasets are large or complex
24
Data Acquisition Best Practices for Preparing Environmental Data Sets to Share and Archive. September 2010. Les A. Hook, Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson. http://daac.ornl.gov/PI/BestPractices-2010.pdf
25
START QUIZ
26
Data Acquisition datesitespwght 1/2/2010110.2 2/5/201012lost 3/2/2010130.5 datesitespeciesweight_gm 1/2/2010110.2 2/5/201012Missing – eaten? 3/2/2010130.5 datesitespeciesweight_gmflagcomment 1/2/20101Acacia constricta0.2 2/5/20101Acacia greggii Missing from plot Eaten? 3/2/20101Acacia redolens0.5 datesitespwtflagcomment 1/2/20101Ac0.2 2/5/20101Ag Missing from plotEaten? 3/2/20101Ar0.5
27
Data Acquisition You might want to review this section again. Return
28
Data Acquisition Proceed to the next question Next
29
Data Acquisition consistent missing different special characters in the
30
Data Acquisition You might want to review this section again. Return
31
Data Acquisition Proceed to the next question Next
32
Data Acquisition “value missing” * - 9999
33
Data Acquisition You might want to review this section again Return
34
Data Acquisition Proceed to the next question Next
35
Data Acquisition valid or void null or NA missing or void missing or NA
36
Data Acquisition You might want to review this section again Return
37
Data Acquisition Proceed to the next question Next
38
Data Acquisition periods data flags dashes brackets
39
Data Acquisition You might want to review this section again Return
40
Data Acquisition Proceed to the next question Next
41
Data Acquisition numbers or special characters spaces or special characters dashes or lengthy descriptions errors or spaces
42
Data Acquisition You might want to review this section again Return
43
Data Acquisition Proceed to the next question Next
44
Data Acquisition Bird population monitoring in a river valley that will go on for several years Greenhouse plant growth study with a 3x3 factorial design and a one time destructive sampling of biomass Nitrogen concentrations in leaves from three different tree species in spring of 2010
45
Data Acquisition You might want to review this section again Return
46
Data Acquisition Proceed to the next question Next
47
Data Acquisition We want to hear from you! CLICK the arrow to take our short survey.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.