LTER IM Town Hall Panel Challenges in Integrating Diverse Data for Ecological Synthesis Special Roles & Responsibilities for Information Managers Judy Cushing The Evergreen State College Olympia WA NSF EIA , EIA NSF DBI , DBI , …
LTER IM Town Hall Panel Challenges in Integrating Diverse Data Lessons Learned from the Grasslands Data Integration (GDI) Project* Integrate Above-Ground Net Primary Productivity (ANPP) data, with its drivers (contextual data) for cross-site comparisons (Ecological Synthesis), past and future (come visit our poster!) Ecologists Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN), Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN), Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management Information Managers Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN), Mark Servilla (NET), Kristin Vanderbilt (SEV) Computer Scientists and Data Analysts Judy Cushing, Carri LeRoy, Juli Mallett, Lee Zeman
LTER IM Town Hall Panel What’s in the GDI Database? recorded or calculated annual aboveground NPP values from 5 LTERs: Jornada, Sevilleta,SGS, Konza, Kruger 4,126,700 grams, over 20 years in 1697 plots recorded or calculated annual aboveground NPP values from 5 LTERs: Jornada, Sevilleta,SGS, Konza, Kruger 4,126,700 grams, over 20 years in 1697 plots KNZ KRG SGS SEV JRN
LTER IM Town Hall Panel What’s did we Find? Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN), Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN), Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management 1. Ecology Environmental drivers of ANPP ANPP-based grassland community composition. 2.Preliminary definition & provision of contextual data – Ecotrends ++…. 3.Information Management: species table fixes, ideas for better experimental design documentation, scripting for data integration…. CHANGE LOGS WERE ESSENTIAL; USDA PLANTS DB 4. CS – case study on Data Integration; need for TOOLS: PASTA-LIKE SERVICE & TAXONOMIC CONCEPT SERVICE
LTER IM Town Hall Panel ANPP vs. Precip No climate data yet
LTER IM Town Hall Panel r = r = r = r = 0.196
LTER IM Town Hall Panel CART Model: Classification and Regression Tree Model, R 2 = 0.642!! Variables included in model: LTER, year, PDSI, NH4, NO3, absTmax, asbTmin, Tmax, Tmin, Tmean, Precip
LTER IM Town Hall Panel Lesson 1 What you (IMs) do is important ANPP – a critical ecological measure (indicator?) You (Kristin, Ken, Nicole) made GDI happen…. It’s a collaborative & interdisciplinary project – and not a technology problem…. IMs Computer Scientists Ecologists Statistician (Data Analyst) You know the issues, physically possess the data for important ecological & scientific DB problems e.g., global climate change, resource management
LTER IM Town Hall Panel Lesson 2 The GDI DB should be dynamic – Not Static A static data warehouse is an oxymoron as is “Museum of Innovation” More years, future years Current data – further refined More sites, different ecosystems
LTER IM Town Hall Panel Lesson 3 Volume Matters…. More sites, more years, more trouble…. More species codes Differences in experimental design Cross-site comparison highlights data anomalies High volumes make a qualitative difference A good data structure* matters even more…. * Ask me why GIS not been a priority to illustrate my field datasets….
LTER IM Town Hall Panel Lesson 4 Information Managers Critical Computer Science in Crisis…. There won’t be enough CS graduates … to do all the jobs … even today ….
LTER IM Town Hall Panel NSF’S ICER (CPATH) INITIATIVE INTEGRATIVE COMPUTING EDUCATION & RESEARCH NSF 1.CS content changed (changing!) radically…. 2.No uniform agreement on the core… 3.Graduates lack a systems approach…. 4.Dwindling pipeline…. 5.US industry [& science] competitiveness threatened….
LTER IM Town Hall Panel NSF’S ICER (CPATH) INITIATIVE NSF asked: Why is CS in crisis? What can be done? Northwest Region: Improve the quality of computing education …. Attract more people …. Improve retention…. Strengthen interdisciplinary connections…. Improve CS educational research …. Google asked: What can industry do? I ask: What should the LTER IMs do?
LTER IM Town Hall Panel Lesson 4 (cont) Computer Science in Crisis…. My charge on this panel: IMs typically come from “the sciences” (essential) Yet their tasks are programming & managing software projects. What skills or tools are essential for IMs? …As an educator, which are effectively learned on-the-job, and which require formal training? Tools are learned on the job, Skills through practice. (but should be demonstrable before hiring) Concepts require (some) formal training…. (there is a handful of critical concepts?)
LTER IM Town Hall Panel Lesson 4 (cont) What CS to do the GDI ? Concepts Formal Languages & Parsing Data Structures Abilities See patterns (and non-patterns) Learn new technology fast; see when the tools won’t do it Build new technology, services…. Skills (tools) Scripting Languages, Database tools and SQL But, CS is not enough… needed an interdisciplinary team…. historical perspective, ecology vision, statistical expertise Future tools – PASTA- like & TAXONOMIC SERVICES, Contextual data provision (ClimDB, EcoTrends)
LTER IM Town Hall Panel Questions? Judy Cushing