Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Entry and Assembly. Data Acquisition  Best Practices for Creating Data  Data Entry Options  Data Manipulation Options  Gathering Existing Data.

Similar presentations


Presentation on theme: "Data Entry and Assembly. Data Acquisition  Best Practices for Creating Data  Data Entry Options  Data Manipulation Options  Gathering Existing Data."— Presentation transcript:

1 Data Entry and Assembly

2 Data Acquisition  Best Practices for Creating Data  Data Entry Options  Data Manipulation Options  Gathering Existing Data

3 Data Acquisition  After completing this lesson, the participant will be able to: ◦ Describe the characteristics of an easily understood and manipulated dataset ◦ Identify data entry tools and validation measures that can be done as data is entered ◦ Define what a relational database is and why it is useful ◦ Describe documentation associated with using existing data

4 Data Acquisition CollectAssureDescribeDepositPreserveDiscoverIntegrateAnalyze

5 Data Acquisition  Whether recording data on paper forms or entering data digitally, accuracy and usability are important  Have a plan for storing your data before you collect it  The quality of your data records should be defensible  Following proper procedures when creating data results in digital datasets that are: ◦ Valid ◦ Organized ◦ Easy to understand ◦ Easy to subset

6 Data Acquisition Inconsistency between data collection events (& more than one activity type) – Date in title or as a data element – Column names vary – Order of columns differs Other Issues – Inconsistent Date formats – Different Site spellings – Typos in Site spellings – Poor column names (ex: Acult) – Inappropriate column use (‘got away’, J, N in Acult column) – Mean1 Label is in Species column – Text and numbers in same column

7 Data Acquisition Descriptive File Name Column data types are consistent: – only numbers, dates, or text Consistent Names, Codes, Formats (date) used in each column Similar data (separate samples) are all stored in one table with a single format – computing is simpler using a single table than multiple, unrelated small tables that require a lot of individual human manipulation and synchronization

8 Data Acquisition  Create simple, consistently formatted file names. Consider using a date suffix in the name to help with version control. MyFieldData_20110910a.xls  Avoid using -,+,*,^, /, and other higher ASCII characters in column names. Software may interpret them as operators or reserved characters.  Create readable, descriptive column names without spaces or special characters ◦ Soil T30  Soil_Temp_30cm ◦ Species-Code  Species_Code

9 Data Acquisition DateTimeNO3_N_ConcNO3_N_Conc_Flag 2008101113000.013 2008101113300.016 200810111400M 2008101114300.018 2008101115000.001Est ◦ Whenever possible, leave the value empty (NULL = no value) ◦ Use a Data_Flag column to qualify missing values (and other issues) Ex: NA, Est, Null (no qualification) ◦ For numeric fields, you might need to provide a distinct unlikely value such as 9999 to indicate a missing value ◦ For text fields, “NA” may be appropriate ( for “Not Applicable” or “Not Available”) M = missing; no sample collected Est = estimated from grab sample

10 Data Acquisition  Enter complete lines of data, don’t assume a person or program will know to ‘fill in below’ for cells in a spreadsheet Sorting can remove the best of orderly intentions! Also avoid creating columns that are rarely used

11 Data Acquisition  Document the (authoritative) data source  Obtain and keep all metadata about the format, content, quality, and limitations of the data  Document any data-sharing agreements or data-use constraints  Before use, evaluate the data for its fitness to your needs  Track the provenance of the data in derived data products  Consider informing the data source of the use of their data

12 Data Acquisition  Spreadsheets  Databases  Googledocs Forms

13 Data Acquisition Control data entry by using standardized pick lists Also look into using Excel’s automatic ‘data form’ feature

14 Data Acquisition 20 Avoid entry errors by trapping values - by range, type, etc. Let the spreadsheet do the work of warning you!

15 Data Acquisition

16 Data Acquisition

17 Data Acquisition  Great for Charts, Graphs, Calculations  Poor choice for large or complex datasets  Difficult to subset  Flexible about cell content type (no type enforcement)  Lack record integrity (can independently sort columns relative to other columns)  Easy to use – but harder to maintain as complexity and size of data grows (lots of repeats in records)  Not a math or graphics tool, but can provide data to those tools  Work well with lots of data  Easy to query and subset data  Data fields are typed – only integers in integer fields  Columns cannot be sorted independently of each other (a good thing!!)  Slight learning curve compared to a spreadsheet  Normalization reduces complexity and entry effort

18 Data Acquisition *siteID site_name latitude longitude description *siteID site_name latitude longitude description Site Each data value is stored in only one place Tables contain attributes describing a single thing (Site) Primary Keys (marked with *) establish row identities, and are shared between tables to link them Based on set theory, the query language provides a concise way of grouping, sorting, filtering, & summarizing *speciesID species_name common_name family order *speciesID species_name common_name family order Species *sampleID siteID sample_date speciesID height flowering flag comments *sampleID siteID sample_date speciesID height flowering flag comments SpeciesSample sets of well-defined tables formal relationships a query language for manipulating data

19 Data Acquisition DateSiteSpeciesHeightFlowering Advantages quality control performance organization Advantages quality control performance organization

20 Data Acquisition DateSiteSpeciesFlowering? 2/13/2010ABOGR2y 2/13/2010BHODRy 4/15/2010BBOER4y 4/15/2010CPLJAn SiteLatitudeLongitude A34.1-109.3 B35.2-108.6 C32.6-107.5 DateSiteSpeciesFlowering?LatitudeLongitude 2/13/2010ABOGR2y34.1-109.3 2/13/2010BHODRy35.2-108.6 4/15/2010BBOER4y35.2-108.6 4/15/2010CPLJAn32.6-107.5 Join tables and their data using queries

21 Data Acquisition DatePlotTreatmentSensorDepthSoilTemperature 2010-02-01CR3012.8 2010-02-01BC1013.2 2010-02-02AN015.1 OR Select * from SoilTemp where Treatment = ‘N’ and SensorDepth = ‘0’ OR Select Max(Soil_Temperature) from SoilTemp Select Date, Plot, Treatment, SensorDepth, Soil Temperature from SoilTemp where Date > ‘2010-01-01’

22 Data Acquisition  SAS, R, SPSS  Good for calculations, data analysis, subsetting data  Also can be used for quality assurance

23 Data Acquisition  Be aware of Best Practices when designing data file structures  Make sure you adequately document data that you obtain from others, and track the use of those data in your derived data products  Choose a data entry method that allows some validation of data as it is entered  Consider investing time in learning how to use a database if datasets are large or complex

24 Data Acquisition  Best Practices for Preparing Environmental Data Sets to Share and Archive. September 2010. Les A. Hook, Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson. http://daac.ornl.gov/PI/BestPractices-2010.pdf

25 START QUIZ

26 Data Acquisition datesitespwght 1/2/2010110.2 2/5/201012lost 3/2/2010130.5 datesitespeciesweight_gm 1/2/2010110.2 2/5/201012Missing – eaten? 3/2/2010130.5 datesitespeciesweight_gmflagcomment 1/2/20101Acacia constricta0.2 2/5/20101Acacia greggii Missing from plot Eaten? 3/2/20101Acacia redolens0.5 datesitespwtflagcomment 1/2/20101Ac0.2 2/5/20101Ag Missing from plotEaten? 3/2/20101Ar0.5

27 Data Acquisition You might want to review this section again. Return

28 Data Acquisition Proceed to the next question Next

29 Data Acquisition consistent missing different special characters in the

30 Data Acquisition You might want to review this section again. Return

31 Data Acquisition Proceed to the next question Next

32 Data Acquisition “value missing” * - 9999

33 Data Acquisition You might want to review this section again Return

34 Data Acquisition Proceed to the next question Next

35 Data Acquisition valid or void null or NA missing or void missing or NA

36 Data Acquisition You might want to review this section again Return

37 Data Acquisition Proceed to the next question Next

38 Data Acquisition periods data flags dashes brackets

39 Data Acquisition You might want to review this section again Return

40 Data Acquisition Proceed to the next question Next

41 Data Acquisition numbers or special characters spaces or special characters dashes or lengthy descriptions errors or spaces

42 Data Acquisition You might want to review this section again Return

43 Data Acquisition Proceed to the next question Next

44 Data Acquisition Bird population monitoring in a river valley that will go on for several years Greenhouse plant growth study with a 3x3 factorial design and a one time destructive sampling of biomass Nitrogen concentrations in leaves from three different tree species in spring of 2010

45 Data Acquisition You might want to review this section again Return

46 Data Acquisition Proceed to the next question Next

47 Data Acquisition We want to hear from you! CLICK the arrow to take our short survey.


Download ppt "Data Entry and Assembly. Data Acquisition  Best Practices for Creating Data  Data Entry Options  Data Manipulation Options  Gathering Existing Data."

Similar presentations


Ads by Google