Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible.

Slides:



Advertisements
Similar presentations
Drawing & Document Management System or DMS
Advertisements

Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
SUNY Morrisville-Norwich Campus- Week 7 CITA 130 Advanced Computer Applications II Spring 2005 Prof. Tom Smith.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
End Show Introduction to Electronic Spreadsheets Unit 3.
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
Knowledge Process Outsourcing1 Turning Information into Knowledge... for YOU The Gyaan Team.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Agenda Overview 2.What is SharePoint? 3.NCDOT Websites 4.Roles 5.Search 6.SharePoint Interface.
Website Content, Forms and Dynamic Web Pages. Electronic Portfolios Portfolio: – A collection of work that clearly illustrates effort, progress, knowledge,
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
ClimDB/HydroDB (ClimHy) Integration ClimHy has been migrated from AND to LNO and will remain status quo in 2011 – Public page (
Global Science and Technology Watch Portal The home page of the GSTW provides access to creating Technology Information Papers (TIPs), searching TIPs Online,
Towards Improvingt the BNZ LTER Core Data Sets. Types of Core Data Climate Hydrology Element Cycling Population Biodiversity.
Tabs to main publication types Links in the orange navigation bar for: News Librarians Users Guide Price List alerts 1. Top Navigation Bar General.
Geospatial One Stop Modules Two and Three. Module 2 Inventory/Document existing Federal agency framework datasets and publish metadata to clearinghouse.
LTER IM Town Hall Panel Challenges in Integrating Diverse Data for Ecological Synthesis Special Roles & Responsibilities for Information Managers.
ClimDB/HydroDB A web harvester and data warehouse for hydrometeorological data 2011 StreamChemDB Oct Yang Xia (LTER Network Office, University of.
AON Data Questionnaire Results 21 Respondents Last Updated 27 March 2007 First AON PI Meeting Scot Loehrer, Jim Moore.
October 2003Bent Thomsen - FIT 3-21 IT – som værktøj Bent Thomsen Institut for Datalogi Aalborg Universitet.
Introduction to SPSS Edward A. Greenberg, PhD
WLE Information Management. Discussion points  What systems do we have?  Which to use for what purpose?  What information is missing and can be improved.
EcoTrends THE GOOD, THE BAD AND THE UGLY (and lessons learned along the way) OR THE GOOD, THE BETTER AND THE BEST (as JP might say) Christine Laney.
EML Congruency Checker A tool to assess and report on the quality of EML-based data packages.
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
Great Leads for the Savvy Sales Whiz A MINT Skills Workshop Professional Development Institute February 3, 2004.
TFS EDC’s Baseline Climate Monitoring Program
Software. Generic Software  e.g. word processing, spreadsheet and database. – This simply implies that any of the dozens of spreadsheet packages, for.
A short guide to publishing in European Journal of Soil Science EJSS wileyonlinelibrary.com/journal/ejss.
Writing a Research Manuscript GradWRITE! Presentation Student Development Services Writing Support Centre University of Western Ontario.
About the OECD Why am I here? Why is access to online information important? Libraries and Librarians play a crucial role in the innovation process.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
Towards Web Semantics Spreadsheets and the US Government Lee Feigenbaum, Cambridge Semantics Brand Niemann, U.S. EPA SICoP Special Conference February.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
 Agenda: 4/24/13 o External Data o Discuss data manipulation tools and functions o Discuss data import and linking in Excel o Sorting Data o Date and.
DATABASES Southern Region CEO Wednesday 13 th October 2010.
Chapter 17 Creating a Database.
DATA-MODEL ASSIMILATION CHALLENGES AND OPPORTUNITIES IN THE LTER PROGRAM Debra Peters Lead Research Scientist, USDA ARS, Jornada Experimental Range, Las.
Rachelle Howell and Ellen M. Rathje University of Texas at Austin NEEScomm IT Development Team.
Trends Vision Long-term time series of climate, biogeochemical, biotic & population data Create an “atlas” of these data in graphical (graphs & maps) &
Markle Site Map + Wireframes. FUNCTIONALITY: Links: Spec Section # Page Buttons: page map Program areas -- Public Engagement through Interactive Technologies.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Controlled Vocabulary VTC June 1, Agenda Review some past activities Plan some future activities.
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Mtivity Client Support System Quick start guide. Mtivity Client Support System We are very pleased to announce the launch of a new Client Support System.
Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla 14 September.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Animal Shelter Activity 2.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
Information Management Jornada Basin LTER. Jornada Information management system Six major components: a)Data management implementation/process b)Management.
LUQUILLO LONG-TERM ECOLOGICAL RESEARCH PROGRAM 18th Annual Meeting LUQUILLO LONG-TERM ECOLOGICAL RESEARCH PROGRAM Information Management Report January.
Corporate Data Vault Data Warehousing Workshop Sept Data Warehousing Workshop Sept
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
LTER GIS Working Group Update Adam Skibbe and Theresa Valentine 2012 June Water Cooler.
Software AS Module Heathcote Ch. 22. Importance of Information  Information technology is fundamental to the success of any business  The information.
Library Online Resource Analysis (LORA) System Introduction Electronic information resources and databases have become an essential part of library collections.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
Copyright 2007, Paradigm Publishing Inc. EXCEL 2007 Chapter 8 BACKNEXTEND 8-1 LINKS TO OBJECTIVES Import data from Access, a Web site, or a CSV text file.
Before you begin The activities documented in this file should cover the previous 12 months. The Committee on Student Members reviews Applications three.
Downloading Weather Observations
<Name of your University> PDMA Student Chapter Annual Report
Databases.
Microsoft Excel 2007 – Level 2
Navya Thum January 30, 2013 Day 5: MICROSOFT EXCEL Navya Thum January 30, 2013.
APE EAD3 introduction - DARIAH - Brussels
Presentation transcript:

Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

THANK YOU!!  LTER Information Managers – you know who you are  LTER Network Office – Mark, James B., Bob, Duane, Marshall, Inigo, James V.  NCEAS – Mark, Callie, Will, Jim, Rick  Jornada staff – Ken, Justin & technicians

Trends in Long-Term Ecological Data: a multi-agency synthesis project Objectives to create a platform for synthesis by producing a compendium of easily accessible long term graphs and data from long-term ecological research sites to illustrate the utility of this platform in addressing important within-site and network-level scientific questions

Products Folio-sized book to be published by Oxford Univ. Press Website (data, metadata, graphs) for synthesis and analysis

Book organization  Introduction: value and importance of long-term research  Within-site graphs/tables arranged by four themes in the LTER Planning Process Climate and variability in the physical environment, including disturbance characteristics Human population and economy Biogeochemistry (e.g., atmospheric deposition, surface water chemistry) Biotic structure (e.g., ANPP, plant biomass, species richness)  Among-site comparison graphs: e.g., atmospheric chemistry, N fertilization, climate variability, ENSO signal responses  Site descriptions and photos organized by biomes

Website – Trends Data Store  Initial design: Static datasets, metadata & book graphs listed by chapter and figure number, some search capability Metadata for static data provide access to raw data and the script used to generate the derived product Prototype near completion  Final design: Routinely harvested and derived data from ongoing projects Metadata & links back to sources and revisions Search, sort, analysis & graphing tools Prototype in development

Coming soon:

Participating Sites

Process Selecting variables  Submitted broad request for long-term data  Downloaded data from other online compilations  Examined submitted data for consistent variables across sites (e.g., precipitation, nitrogen, etc.)  Refined data request and requested additional data from sites for variables that should exist, but may not have been submitted (e.g., ANPP, species richness, etc.)  Generated “wish list” of variables that may be important for cross-site and network-level questions, but long-term data don’t exist yet at very many sites (e.g., soil respiration, foliar nutrients). This will be used in planning grant activities. Selecting variables for the webpage  Use variables from book in static form first  Update data sets with time and include additional variables

Contributors: 26 LTER (84%), 13 FS (12%), 6 ARS sites (3%) & Santa Rita ER (<1%) Climate datasets ~300 Biogeochemistry datasets ~150 Biotic datasets~100 Others ~50 Total : over 600 datasets Plus 190 llustrative graphs Human population and economy: collected for all LTER sites from census data (funded by NSF supplement) Metadata: Most data have at least rudimentary metadata, few have full EML with attribute level description of the datasets. Progress to date

What we’re doing with the data  Downloading and storing data & documentation  Writing R or SAS scripts to generate: Datasets containing monthly or annual averages or totals, depending on the variable Strict time plots with simple linear regression Tables that record all derived statistics Plots that show change over time among different sites for each variable Anomaly plots of monthly climate data  Generating metadata with EML for each derived product. Metadata contains links to original data and associated scripts.  Recording each product (data, metadata, graphs), along with links between products, in a multi-purpose database.

Step 1. Graph similar data through time for sites with those data. Step 2. Determine trend line by site. Nitrate in precipitation MULTI-SITE ANALYSES

Step 3. Compare slopes of trend lines among sites.

Step 4. Compare slopes spatially. Mean change in total deposition of N in nitrate form in precipitation

Challenges, solutions & opportunities  Obtaining data  Quality and quantity of data and documentation  Utilizing data toward specific goals  Properly documenting received data and products derived from the data  Making final products accessible to editorial committee and available on website

Obtaining Data: time-intensive and inconsistent process on both sides!  Located data on individual websites Few had their long-term data separated out from short-term data Unable to search for long-term data  Utilized metacat via LTER, KNB, Morpho Slow search engine Unable to search for particular record lengths Unable to sort filtered records by time Metadata often available without attached data files  No pre-knowledge of types of available long-term data beyond basics (precip, temp, etc).  Result: a lot of s and phone calls!

Challenges, opportunities & solutions  Obtaining data  Quality and quantity of data and documentation  Utilizing data toward specific goals  Properly documenting received data and products derived from the data  Making final products accessible to editorial committee and available on website

Quality and quantity of data and documentation  Lots of great data, varied level of detailed metadata in text or EML format  Small problems with single datasets  large problems with many datasets  Online data sometimes not quality- checked or ready for use – but no markers to say so  Examples:

Looks nice…but….

The nit-picky details  Dates as an example: 2-digit years range of dates in single cell (e.g., 02/01-03/2006 or 02/01/2006,02/03/2006) date with a letter appended to the end (ex: 02/01/1999A) single digit day and month, especially when there are no delimiters between month, day, year.

Preferred data formats for synthesis  Simple ascii delimited with commas, spaces, tabs, etc. with headers, or very simple excel spreadsheets. If fixed-width, give widths and spaces.  Metadata in separate file  All data in single file, not separated by year. If not possible, each file in exactly the same format. Complex formatting systems, like multisheets & several tables in one sheet, are more difficult to interpret and extract information.

Challenges, opportunities & solutions  Obtaining data  Quality and quantity of data and documentation  Utilizing data toward specific goals  Properly documenting received data and products derived from the data  Making final products accessible to editorial committee and available on website

Utilizing data toward specific goals  Selected variables with specified summary time spans (monthly or yearly) with specified units.  Converting short time scales to longer time scales – OK  Converting long time scales to shorter – Impossible  Unit conversion – often simple F  C W/m^2  MJ/m^2  Can be really difficult Flow in m from a weir  m^3/s using weir dimensions Raw shield count data without calibrations given  % moist: impossible.  Missing data – leads to bias in particular months/years especially with totals. Lots of consultation with metadata and PIs. What happens when metadata is incomplete & PIs are unavailable?

Challenges, opportunities & solutions  Obtaining data  Quality and quantity of data and documentation  Utilizing data toward specific goals  Properly documenting received data and products derived from the data  Making final products accessible to editorial committee and available on website

Properly documenting received data and products derived from the data  Morphing system Hierarchical folder system with s Attempted EML documentation. Help from NCEAS. Current Versioning System (CVS) & multipurpose SQL Server & MySQL database. Documentation of deriving data and graphs  EML template  Scripts Metacat (versioning)

Challenges, opportunities & solutions  Obtaining data  Quality and quantity of data and documentation  Utilizing data toward specific goals  Properly documenting received data and products derived from the data  Making final products accessible to editorial committee and available on website

Trends editorial page jornada-

Voting page

Trends IM meeting, 15 min breakout  Site involvement/commitment to Trends Within site:  Percentage of IM time/resources spent compared to PIs  Percentage of time/resources spent on Trends compared to time spent on site needs  Too much, enough, too little? Among sites:  Has there been communication between sites about trends data requests?  Has Trends triggered any new collaborations or strengthened old ones?  Communication Progress reports: often and/or adequate enough? Recommendations for further communications

Trends IM meeting, 15 min breakout  Keeping track of data use & proper citation Now (by the trends project itself) In the future via the website

Trends IM meeting, 15 min breakout  International site involvement Interest in Trends project – how can ILTER sites use the current set of data in their own research Reasons pro and con for initiating a similar effort among ILTER sites What would it take to do a Trends-like project at the international level? List of contacts