Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction: Context for the Week William Michener LTER Network Office Department of Biology University of New Mexico.

Similar presentations


Presentation on theme: "1 Introduction: Context for the Week William Michener LTER Network Office Department of Biology University of New Mexico."— Presentation transcript:

1 1 Introduction: Context for the Week William Michener LTER Network Office Department of Biology University of New Mexico

2 2 Ecological Informatics A broad interdisciplinary science that incorporates both conceptual and practical tools for the understanding, generation, processing, and propagation of ecological data and information.

3 3 Ecoinformatics Course Topics Introduction and ConceptsIntroduction and Concepts ActivitiesActivities –Data documentation (metadata) –Database design and creation Ecological and environmental databases Data models Querying with SQL –QA/QC –Website design for dynamic data delivery Basic InfrastructureBasic Infrastructure –Hardware/software –Wireless communication –GPS

4 4 Ecoinformatics Introduction and ConceptsIntroduction and Concepts – Vanderbilt and Romanello (Monday) ActivitiesActivities –Data documentation (metadata) – Romanello (Monday) –Database design and creation Ecological and environmental databases –Porter (Tuesday) Data models – Porter and McCartney (Tuesday) Querying with SQL – Porter and McCartney (Wednesday) –QA/QC – Vanderbilt (Tuesday) –Website design for data delivery – White (Friday) Basic InfrastructureBasic Infrastructure –Hardware/software – Murillo (Thursday) –Wireless communication – Vande Castle (Thursday) –GPS – Friggens (Thursday)

5 5 Ecoinformatics Introduction and ConceptsIntroduction and Concepts – Vanderbilt and Romanello (Monday) ActivitiesActivities –Data documentation (metadata) – Romanello (Monday) –Database design and creation Ecological and environmental databases –Porter (Tuesday) Data models – Porter and McCartney (Tuesday) Implementing a database in MS Access – Vanderbilt & McCartney (Tuesday) Querying with SQL – Porter and McCartney (Wednesday) –QA/QC – Vanderbilt (Tuesday) –Website design for data delivery – White (Friday) Basic InfrastructureBasic Infrastructure –Hardware/software – Murillo (Thursday) –Wireless communication – Vande Castle (Thursday) –GPS – Friggens (Thursday)

6 6 Ecoinformatics Introduction and ConceptsIntroduction and Concepts – Vanderbilt and Romanello (Monday) ActivitiesActivities –Data documentation (metadata) – Romanello (Monday) –Database design and creation Ecological and environmental databases –Porter (Tuesday) Data models – Porter and McCartney (Tuesday) Querying with SQL – Porter and McCartney (Wednesday) –QA/QC – Vanderbilt (Tuesday) –Website design for data delivery – White (Friday) Basic InfrastructureBasic Infrastructure –Hardware/software – Murillo (Thursday) –Wireless communication – Vande Castle (Thursday) –GPS – Friggens (Thursday)

7 7 Ecoinformatics Introduction and ConceptsIntroduction and Concepts – Vanderbilt and Romanello (Monday) ActivitiesActivities –Data documentation (metadata) – Romanello (Monday) –Database design and creation Ecological and environmental databases –Porter (Tuesday) Data models – Porter and McCartney (Tuesday) Querying with SQL – Porter and McCartney (Wednesday) –QA/QC – Vanderbilt (Tuesday) –Website design for data delivery – White (Friday) Basic InfrastructureBasic Infrastructure –Hardware/software – Murillo (Thursday) –Wireless communication – Vande Castle (Thursday) –GPS – Friggens (Thursday)

8 8 Ecoinformatics Introduction and ConceptsIntroduction and Concepts – Vanderbilt and Romanello (Monday) ActivitiesActivities –Data documentation (metadata) – Romanello (Monday) –Database design and creation Ecological and environmental databases –Porter (Tuesday) Data models – Porter and McCartney (Tuesday) Querying with SQL – Porter and McCartney (Wednesday) –QA/QC – Vanderbilt (Tuesday) –Website design for data delivery – White (Friday) Basic InfrastructureBasic Infrastructure –Hardware/software – Murillo (Thursday) –Wireless communication – Vande Castle (Thursday) –GPS – Friggens (Thursday)

9 9 Questions ???

10 Cyber-Infrastructure Challenges & Ecoinformatics: An Ecologist’s Perspective William Michener LTER Network Office Department of Biology University of New Mexico

11 Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

12 12 Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

13 Most studies use a single scale of observation -- Commonly 1 m 2 The literature is biased toward single and small scale results

14 Time (yrs) Variable Change transition from one state or condition to another

15 Space Parameters Time Thinking Outside the “Box” LTER Biocomplexity ???? – NEON, CUAHSI, CLEANER, …. Increase in breadth and depth of understanding.....

16 24 7 11 10 8 6 5 4 3 2 20 1 9 18 16 15 17 14 13 12 23 22 21 19 LTER 26 NSF LTER Sites in the U.S. and the Antarctic: > 1500 Scientists; 6,000+ Data Sets— different themes, methods, units, structure, ….

17

18 18 Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

19 19 Knowledge Data Information Phenomena

20 20 Knowledge Data Information Phenomena

21 Abstraction of phenomena

22 22 Knowledge Data Information Phenomena

23 23 Yon Hither Hunter-gatherers

24 24 A Paradigm Shift Taxon 1 Taxon 2 Taxon 3 Taxon 4 Abioticfactors Integrated,InterdisciplinaryDatabases Harvesters

25 25 Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997) Data Entropy

26 26 A B Schema transform Coding transform Taxon Lookup Semantic transform Imagine scaling!! C A B C Semantics

27 27 Semantics—Linking Taxonomic Semantics to Ecological Data Rhynchospora plumosa s.l. Elliot 1816 Gray 1834 Kral 1998 Peet 2002? Chapman 1860 R. plumosa R. Plumosa v. intermedia R. plumosa v. plumosa R. Plumosa v. interrupta R. plumosa R. intermedia R. pineticola R. plumosa v. plumosa R. plumosa v. pinetcola R. sp. 1  Taxon concepts change over time (and space)  Multiple competing concepts coexist  Names are re-used for multiple concepts from R. Peet ABC

28 28 Knowledge Data Information Phenomena

29 29 Characteristics of Ecological Data Complexity/Metadata Requirements Satellite ImagesDataVolume(perdataset) Low High High Soil Cores PrimaryProductivity GIS Population Data BiodiversitySurveys Gene Sequences Business Data Weather Stations Most Ecological Data MostSoftware

30 30 What Users Really Want…

31 31 Data Collection Analysis Translation Use By Non-Scientists Publish For Other Scientists

32 32 Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

33 33 A broad interdisciplinary science that incorporates both conceptual and practical tools for the understanding, generation, processing, and propagation of ecological data and information. Ecological Informatics

34 34 Data Design and Metadata Data Acquisition and Quality Control Access and Archiving Analysis and Interpretation Data Manipulation and Quality Assurance Project Initiation Publication

35 35 Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no

36 36 “Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

37 37 “Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

38 38 Project / Experimental Design ExperimentalDesign Analyses Data / Database Design

39 39 Project / Experimental Design Some Classic References –Green, R.H. 1979. Sampling Design and Statistical Methods for Environmental Biologists. John Wiley & Sons, Inc., New York. –Resetarits, Jr., W.J. and J. Bernardo (eds.). 1998. Experimental Ecology. Oxford University Press, New York. –Scheiner, S.M. and J. Gurevitch (eds.). 1993. Design and Analysis of Ecological Experiments. Chapman & Hall, New York. –Sokal, R.R. and F.J. Rohlf. 1995. Biometry. W.H. Freeman & Company, New York. –Underwood, A.J. 1997. Experiments in Ecology: Their Logical Design and Interpretation Using Analysis of Variance. Cambridge University Press, Cambridge, UK. –Gotelli, N.J. and A.M. Ellison. 2004. A Primer of Ecological Statistics. Sinauer Associates, Sunderland, MA.

40 40 “Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

41 41 Data Design Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.

42 42 Data Set Design: Best Practices Assign descriptive file names Use consistent and stable file formats Define the parameters Use consistent data organization Perform basic quality assurance Assign descriptive data set titles Provide documentation (metadata) from Cook et al. 2000

43 43 1. Assign descriptive file names File names should be unique and reflect the file contents Bad file names –Mydata –2001_data A better file name –Sevilleta_LTER_NM_2001_NPP.asc Sevilleta_LTER is the project name NM is the state abbreviation 2001 is the calendar year NPP represents Net Primary Productivity data asc stands for the file type--ASCII

44 44 2. Use consistent and stable file formats Use ASCII file formats – avoid proprietary formats Be consistent in formatting –don’t change or re-arrange columns –include header rows (first row should contain file name, data set title, author, date, and companion file names) –column headings should describe content of each column, including one row for parameter names and one for parameter units –within the ASCII file, delimit fields using commas, pipes (|), tabs, or semicolons (in order of preference) Don’t include summary statistics in the data file

45 45 3. Define the parameters Use commonly accepted parameter names that describe the contents (e.g., precip for precipitation) Use consistent capitalization (e.g., not temp, Temp, and TEMP in same file) Explicitly state units of reported parameters in the data file and the metadata (SI units are recommended) Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file –e.g., use yyyymmdd; January 2, 1999 is 19990102 Use a decimal point or extreme value (-9999) to define missing values

46 46 4. Use consistent data organization (one good approach) StationDateTempPrecip Units YYYYMMDDCmm HOGI19961001120 HOGI19961002143 HOGI1996100319-9999 Note: -9999 is a missing value code for the data set

47 47 4. Use consistent data organization (a second good approach) StationDateParameterValueUnit HOGI19961001Temp12C HOGI19961002Temp14C HOGI19961001Precip0mm HOGI19961002Precip3mm

48 48 5. Perform basic quality assurance Assure that data are delimited and line up in proper columns Check that there no missing values for key parameters Scan for impossible and anomalous values Perform and review statistical summaries Map location data (lat/long) and assess errors Verify automated data transfers For manual data transfers, consider double keying data and comparing 2 data sets

49 49 6. Assign descriptive data set titles Data set titles should ideally describe the type of data, time period, location, and instruments used (e.g., Landsat 7). Titles should be restricted to 80 characters. Data set title should be similar to names of data files –Good: “Shrub Net Primary Productivity at the Sevilleta LTER, New Mexico, 2000-2001” –Bad: “Productivity Data”

50 50 7. Provide documentation (metadata) To be discussed in detail

51 51 Database Types File-system based Hierarchical Relational Object-oriented Hybrid (e.g., combination of relational and object-oriented schema) Porter 2000

52 52 File-system-based Database Directory Files Porter 2000

53 53 Hierarchical Database Project Data setsInvestigators VariablesLocations CodesMethods Porter 2000

54 54 Relational Database Projects Data sets Locations Location_id Data_id Location_id Porter 2000

55 55 “Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

56 56 High-quality data depend on: Proficiency of the data collector(s) Instrument precision and accuracy Consistency (e.g., standard methods and approaches) –Design and ease of data entry Sound QA/QC Comprehensive metadata (e.g., documentation of anomalies, etc.)

57 57 How are data to be acquired? Automatic Collection ? Tape Recorder Data Sheet Field entry into hand-held computer

58 58 “Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

59 59 Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no Brunt 2000 Generic Data Processing

60 60 “Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

61 61 “Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

62 62 Traditional Fates of Data Post-Publication Paper to filing cabinets Data to floppy disks or tape Data and information lost over time entropy

63 63 Data Archive A collection of data sets, usually electronic, stored in such a way that a variety of users can locate, acquire, understand and use the data. Examples: –ESA’s Ecological Archive –NASA’s DAACs (Distributed Active Archive Centers)

64 64 Planning Problem Analysis and modeling Cycles of Research “A Conventional View” Collection Publication s Data

65 65 Cycles of Research “A New View” Planning Problem Definition (Research Objectives) Analysis and modeling Planning Collection Selection and extraction Archive of Data Original Observations Secondary Observations Publication s

66 66 Start small and keep it simple – building on simple successes is much easier than failing on large inclusive attempts.  Involve scientists - ecological data management is a scientific endeavor that touches every aspect of the research program. Scientists should be involved in the planning and operation of a data management system.  Support science – data management must be driven by the research and not the other way around, a data management system must produce the products and services that are needed by the community. Keys to Success

67 67 Data Value Time Serendipitous Discovery Inter-site Synthesis Gradual Increase In Data Equity Methodological Flaws, Instrumentation Obsolescence Non-scientific Monitoring Increasing value of data over time

68 68 LTER Data Access Policy 1)There are two types of data: Type I (data that is freely available within 2-3 years) with minimum restrictions and, Type II (Exceptional data sets that are available only with written permission from the PI/investigator(s)). Implied in this timetable, is the assumption that some data sets require more effort to get on-line and that no "blanket policy" is going to cover all data sets at all sites. However, each site would pursue getting all of their data on-line in the most expedient fashion possible. 2) The number of data sets that are assigned TYPE II status should be rare in occurrence and that the justification for exceptions must be well documented and approved by the lead PI and site data manager. Some examples of Type II data may include: locations of rare or endangered species, data that are covered by copyright laws (e.g. TM and/or SPOT satellite data) or some types of census data involving human subjects.

69 69 Reasons to Not Share Data: Fear of getting scooped Number of publications will decrease People will find errors Someone will misinterpret my data

70 70 Benefits of Data Sharing Publicity, accolades, media attention Renewed or increased funding Teaching: long-term data sets adapted for teaching & texts Archival: back-up copy of critical data sets Research: new synthetic studies peer-reviewed publications Document global and regional change Conservation and resource management: species and natural areas protection new environmental laws

71 71 Brunt (2000) Ch. 2 in Michener and Brunt (2000) Porter (2000) Ch. 3 in Michener and Brunt (2000) Edwards (2000) Ch. 4 in Michener and Brunt (2000) Michener (2000) Ch. 7 in Michener and Brunt (2000) Cook, R.B., R.J. Olson, P. Kanciruk, and L.A. Hook. 2000. Best practices for preparing ecological and ground-based data sets to share and archive. (online at http://www.daac.ornl.gov/cgi- bin/MDE/S2K/bestprac.html)

72 72 Thanks !!!


Download ppt "1 Introduction: Context for the Week William Michener LTER Network Office Department of Biology University of New Mexico."

Similar presentations


Ads by Google