Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007.

Similar presentations


Presentation on theme: "Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007."— Presentation transcript:

1 Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

2 Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

3 Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

4 Ecoinformatics A broad S&T discipline that incorporates both concepts and practical tools for the understanding, generation, processing, and propagation of ecological data, information and knowledge.

5 Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

6 Many studies employ a restricted scale of observation -- Commonly 1 m 2 The literature is biased toward single and small scale results

7 Space Parameters Time Thinking Outside the “Box” LTER Biocomplexity NEON, WATERS, OOI, …. Increase in breadth and depth of understanding.....

8 2001 2004 1998 2000 2003 Grand environmental challenges

9 More and more of the ecological questions that confront society are national, continental and global in scope Source: CDC Drought Source: Drought Monitor

10 LTER 26 NSF LTER Sites in the U.S. and the Antarctic: > 1,600 Scientists; 6,000+ Data Sets— different themes, methods, units, structure, ….

11

12 NEON Climate Domains 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 20 Northeast Mid Atlantic Southeast Atlantic Neotropical Great Lakes Prairie Peninsula Appalachians / Cumberland Plateau Ozarks Complex Northern Plains Central Plains Southern Plains Northern Rockies Southern Rockies / Colorado Plateau Desert Southwest Great Basin Pacific Northwest Pacific Southwest Tundra Taiga Pacific Tropical 11 10 9 8 7 6 5 4 3 2 1 12 16 15 14 13 17 19 18 20 19 18 16

13 Aquatic Arrays BioMesonet Tower and Sensor Arrays

14 Soil Sensor Arrays Micron-scale nitrate ISE

15 Small-Organism Tracking: Mobile animals as bio- sentinels for environmental change, forecasting biological invasions, emerging disease spread

16 Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

17 Characteristics of Ecological Data Complexity/Metadata Requirements Satellite ImagesDataVolume(perdataset) Low High High Soil Cores PrimaryProductivity GIS Population Data BiodiversitySurveys Gene Sequences Business Data Weather Stations Most Ecological Data MostSoftware

18 Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997) Data Entropy

19 A B Schema transform Coding transform Taxon Lookup Semantic transform Imagine scaling!! C B C Semantics

20 Semantics—Linking Taxonomic Semantics to Ecological Data Rhynchospora plumosa s.l. Elliot 1816 Gray 1834 Kral 1998 Peet 2002? Chapman 1860 R. plumosa R. Plumosa v. intermedia R. plumosa v. plumosa R. Plumosa v. interrupta R. plumosa R. intermedia R. pineticola R. plumosa v. plumosa R. plumosa v. pinetcola R. sp. 1  Taxon concepts change over time (and space)  Multiple competing concepts coexist  Names are re-used for multiple concepts from R. Peet ABC

21 What Users Really Want…

22 Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

23 Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no Standard Operating Procedures Policies Data sharing Computer use Archive storage

24 Ecoinformatics solutions  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

25  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival Ecoinformatics solutions

26 Data Design  Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.

27 Database Types  File-system based  Hierarchical  Relational  Object-oriented  Hybrid (e.g., combination of relational and object-oriented schema) Porter 2000

28 Data Design: 7 Best Practices  Assign descriptive file names  Use consistent and stable file formats  Define the parameters  Use consistent data organization  Perform basic quality assurance  Assign descriptive data set titles  Provide documentation (metadata) from Cook et al. 2000

29 1. Assign descriptive file names  File names should be unique and reflect the file contents  Bad file names Mydata 2001_data  A better file name Sevilleta_LTER_NM_2001_NPP.asc  Sevilleta_LTER is the project name  NM is the state abbreviation  2001 is the calendar year  NPP represents Net Primary Productivity data  asc stands for the file type--ASCII

30 2. Use consistent and stable file formats  Use ASCII file formats – avoid proprietary formats  Be consistent in formatting don’t change or re-arrange columns include header rows (first row should contain file name, data set title, author, date, and companion file names) column headings should describe content of each column, including one row for parameter names and one for parameter units within the ASCII file, delimit fields using commas, pipes (|), tabs, or semicolons (in order of preference)

31 3. Define the parameters  Use commonly accepted parameter names that describe the contents e.g., precip for precipitation  Use consistent capitalization e.g., not temp, Temp, and TEMP in same file  Explicitly state units of reported parameters in the data file and the metadata SI units are recommended  Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file e.g., use yyyymmdd; January 2, 1999 is 19990102

32 4. Use consistent data organization (one good approach) StationDateTempPrecip Units YYYYMMDDCmm HOGI19961001120 HOGI19961002143 HOGI1996100319-9999 Note: -9999 is a missing value code

33 4. Use consistent data organization (a second good approach) StationDateParameterValueUnit HOGI19961001Temp12C HOGI19961002Temp14C HOGI19961001Precip0mm HOGI19961002Precip3mm

34 5. Perform basic quality assurance  Assure that data are delimited and line up in proper columns  Check that there no missing values for key parameters  Scan for impossible and anomalous values  Perform and review statistical summaries  Map location data (lat/long) and assess errors  Verify automated data transfers e.g. check-sum techniques  For manual data transfers, consider double keying data and comparing 2 data sets

35 6. Assign descriptive data set titles  Data set titles should ideally describe the type of data, time period, location, and instruments used (e.g., Landsat 7).  Data set title should be similar to names of data files Good: “Shrub Net Primary Productivity at the Sevilleta LTER, New Mexico, 2000-2001” Bad: “Productivity Data”

36 7. Provide documentation (metadata)

37 Ecoinformatics solutions  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

38 High-quality data depend on:  Proficiency of the data collector(s)  Instrument precision and accuracy  Consistency (e.g., standard methods and approaches) Design and ease of data entry  Sound QA/QC  Comprehensive metadata (e.g., documentation of anomalies, etc.)

39 PlantLife Stage ______________ _______________ What’s wrong with this data sheet?

40 Important questions  How well does the data sheet reflect the data set design?  How well does the data entry screen (if available) reflect the data sheet?

41 PlantLife Stage ardiP/G V B FL FR M S D NP arpuP/G V B FL FR M S D NP atcaP/G V B FL FR M S D NP bamuP/G V B FL FR M S D NP zigrP/G V B FL FR M S D NP P/G V B FL FR M S D NP PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors:_________________________________ Date:___________________ Time:_________ Notes: _________________________________________ _______________________________________________ P/G = perennating or germinatingM = dispersing V = vegetatingS = senescing B = buddingD = dead FL = floweringNP = not present FR = fruiting

42 PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors Troy Maddux Date: 16 May 1991 Time: 13:12 Notes: Cloudy day, 3 gopher burrows on transect ardiP/G V B FL FR M S D NP Y N arpuP/G V B FL FR M S D NP Y N asbrP/G V B FL FR M S D NP Y N deobP/G V B FL FR M S D NP Y N

43 Ecoinformatics solutions  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

44 Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no Brunt 2000 Generic Data Processing

45 Ecoinformatics solutions  Project / experimental design  Data design  Data acquisition  QA/QC  Data documentation (metadata) – to be addressed  Data archival

46 Ecoinformatics solutions  Project / experimental design  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

47 Planning Problem Analysis and modeling Cycles of Research “A Conventional View” Collection Publication s Data

48 Cycles of Research “A New View” Planning Problem Definition (Research Objectives) Analysis and modeling Planning Collection Selection and extraction Archive of Data Original Observations Secondary Observations Publication s

49 Data Archive  A collection of data sets, usually electronic, stored in such a way that a variety of users can locate, acquire, understand and use the data.  Examples: ESA’s Ecological Archive NASA’s DAACs (Distributed Active Archive Centers)

50

51

52 Brunt (2000) Ch. 2 in Michener and Brunt (2000) Porter (2000) Ch. 3 in Michener and Brunt (2000) Edwards (2000) Ch. 4 in Michener and Brunt (2000) Michener (2000) Ch. 7 in Michener and Brunt (2000) Cook, R.B., R.J. Olson, P. Kanciruk, and L.A. Hook. 2000. Best practices for preparing ecological and ground-based data sets to share and archive. (online at http://www.daac.ornl.gov/cgi- bin/MDE/S2K/bestprac.html) References


Download ppt "Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007."

Similar presentations


Ads by Google