Cushing – EIM 2008 1 Integrating Ecological Data Notes from the Grasslands ANPP Data Integration Project LTER Network Office,

1 Cushing – EIM 2008 1 Integrating Ecological Data Notes from the Grasslands ANPP Data Integration Project http://canopy. LTER Network Office, NSF CISE and BIO 04-0417311, 03-019309, 01-31952, 01-9309 Ecologists Daniel Milchunas SGS Esteban Muldavin JRN Judith Kruger Kruger NP Christine Laney JRN Information Managers Nicole Kaplan SGS Kristin Vanderbilt SEV Ken Ramsey JRN Jincheng Gao KNZ Computer Scientists, The Evergreen State College Judith B. Cushing, Juli Mallett, Lee Zeman, Natalie Kopytko Ecologist: Carri Leroy LTER and iLTER

2 Cushing – EIM 2008 2 Integrating Ecological Data Notes from the Grasslands ANPP* Data Integration Project 1.Motivation ANPP is important! Case study for CS 2.Challenges Sampling methods, idiosyncratic formats, species codes 3.The Data Model 4.Results & Products The database Preliminary scientific results Species code mappings – a web application: Specifik 5.Conclusions, Future Work Response variables “different” “Best Practices” and advice for curation * Above ground Net Primary Productivity

3 Cushing – EIM 2008 3 Motivation 1.ANPP is important! Broken down by plant species and life forms, the data could assess community and population responses to global change. 2. Case study for CS

4 Cushing – EIM 2008 4 Challenges: Sampling methods differ – even over grasslands! Site Sampling Method Times Measur ed per year Years of Data Number of Vegetation Types or other Relevant Treatments Numbe r of Sub- Sites Number of Sampling Units (replicates) Experimental Units* in each Sampling Unit (plots per rep) Total Number of Experiment al Units (plots) Kruger National Park (Kruger) Regression relationship 11735 9-41315-1435 Konza Prairie (KNZ) Biomass harvest 251124080 Jornada Basin (JRN) Regression relationship 317515 49735 Sevilleta Wildlife Refuge (SEV) Regression relationship 38331516720 Shortgrass Steppe (SGS) Biomass harvest 123163590

5 Cushing – EIM 2008 5 Challenges (cont) 2.Idiosyncratic data formats … Robust, repeatable data integration process Tools (scripts) for integration “best practices” dictate using csv, no blank fields, etc. correct data errors closer to collection Validate at curation 3.Site specific species codes used the PLANTS database codes a tool to map site codes to PLANTS db…..

6 Cushing – EIM 2008 6 The GDI Data Model

7 Cushing – EIM 2008 7 Results: The GDI Database 5 LTERS (JRN, SEV, SGS, KNZ, Kruger NP) 20 years’ data 1697 distinct plots 160,000 distinct measurements 1600 species

8 Cushing – EIM 2008 8 Results: The GDI Database (cont) 19801980 19851985 19901990 19951995 20002000 20052005 KNZ KRG SGS SEV JRN This database supports aggregation by species, family, growth form, vegetation biome type, and physical location, ETC., and cross-site analysis of abiotic drivers of ANPP, e.g., temperature and precipitation.

9 Cushing – EIM 2008 9 Preliminary Scientific Results CART Model explains 64% ANPP variation over 23 yrs at SEV, SGS, JRN Palmer Drought Severity Index (PSDI), temperatures (max, mean) precipitation.

10 Cushing – EIM 2008 10 Preliminary Scientific Results (cont)

11 Cushing – EIM 2008 11 Results: Web App Specifik Species Table LTERLTER CodePLANTS Code(s) (a code for each species that matches the LTER code) Accepted PLANTS code(s) (One synonym for each PLANTS Code) Perl Script PLANTS data FamilyRankGenusGenus Author SpeciesSpecies AuthorTrinomial Rank Variety or Subspecies Variety Author Scientific Name

12 Cushing – EIM 2008 12 Results: Web App Specifik (cont) Export species table as CSV Create a new species table in Specifik Upload species table CSV Answer questions about table schema Select correct PLANTS code for each species in your table from list Download new CSV table with PLANTS codes appended Import table to database Database with species data

13 Cushing – EIM 2008 13 Conclusions 1.Response (aka biotic) variables are “different” 2.Data integration prep. should be done at site Data Validation Tools Web Application – Specifik 3.Data integration will find data errors…. It’s good 4.Curator will probably be needed harvest coordination maintain integration tools 5.Interdisciplinary collaboration important!

14 Cushing – EIM 2008 14 Future Work 1.Work up the metadata… and release the database 2.Find a home and caretaker for it (SEV?) 3.Extend to other ecosystems and methodologies 4.Provide (or find) contextual data (ANPP drivers) 5.Determine appropriate analysis Drivers  response variables (connectivity) 6.Develop tools to make 3 & 4 easier

15 Cushing – EIM 2008 15 Take-Home Messages Collaboration between Information Managers, Ecologists, and Computer Scientists was good. Compare methodologies and identify sampling units early. Wherever possible, standardize units of measurement and derivations. Standardize species codes, vegetative characteristics and other metadata to facilitate analysis Exploratory analysis aids quality assurance Design the data model to support important & interesting analyses

