Information Management & Technology of the VCR/LTER Project

Slides:



Advertisements
Similar presentations
Desktop, Mobile & Web Based GIS/ Collaborative GIS
Advertisements

SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
2009 Mid–Term Review El Verde Field Station June 4, 2009.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Overview of Search Engines
Web Interfaces and Data Portals John Porter Department of Environmental Sciences University of Virginia.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Linux Operations and Administration
NETWORK CENTRIC COMPUTING (With included EMBEDDED SYSTEMS)
ClimDB/HydroDB (ClimHy) Integration ClimHy has been migrated from AND to LNO and will remain status quo in 2011 – Public page (
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
© 2007 by Prentice Hall 1 Introduction to databases.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
EASI a free web database application for collecting and managing monitoring records.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Information Management Jornada Basin LTER. Jornada Information management system Six major components: a)Data management implementation/process b)Management.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
General Architecture of Retrieval Systems 1Adrienn Skrop.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Information Retrieval in Practice
Big Data Analytics and Machine Intelligence Capability Team
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
Architecture Review 10/11/2004
IOT – Firefighting Example
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
John H. Porter and David E. Smith
WWW and HTTP King Fahd University of Petroleum & Minerals
DIAS & DIAS data release 2 years DIAS-GCI Cooperation Hiroko KINUTANI DIAS (Data Integration and Analysis System in Japan) , St. Petersburg.
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
Flanders Marine Institute (VLIZ)
CUAHSI HIS Sharing hydrologic data
A video coding and data visualization tool
Week 12 Option 3: Database Design
Software Documentation
OGSA Data Architecture Scenarios
Microsoft Access 2003 Illustrated Complete
Workshop on XML-Based Library Applications 5
Databases.
Content Management Systems
Printer Admin Print Job Manager
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
Evergreen Data Systems
Chapter 12: Automated data collection methods
Data Management: Documentation & Metadata
Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy

Administrative Software
Staying afloat in the sensor data deluge
Chapter 1: The Database Environment
Manuscript Transcription Assistant Initiative
The Database Environment
Getting Started With Solr
Reportnet 3.0 Database Feasibility Study – Approach
The Database Environment
Lab 2: Information Retrieval
Management of Streaming Data
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

Information Management & Technology of the VCR/LTER Project John Porter, VCR LTER

Roadmap Overview – Objectives, Resources Collecting metadata and data from researchers Curation of data and metadata Data Sharing - producing EML Special Topics Streaming Data Evolution of the VCR/LTER system

Objectives of Information Mgmt. Promote advances in ecological science through: Providing the information resources needed by VCR/LTER researchers Web site acts as the “file cabinet” for the project Make those resources available to the rest of the ecological research community

VCR/LTER Information Resources

Individual Researchers VCR LTER Information Management Data Sources Individual Researchers Technicians Automated Sensors GIS & Remote Sensing VCR LTER Information Management Spreadsheets DBMS Statistical Packages Metadata XML GIS

Data Priorities Long-Term Data Collected by VCR/LTER Researchers Baseline and Other Short-Term Data Collected by VCR/LTER Researchers Graduate students are strongly encouraged to include a “Data appendix” in their Thesis External Data useful to VCR/LTER Researchers

Identifying Data Existing long-term data are relatively easy they have date stamps that make them easy to extract from the metadatabase Brand new datasets require some detective work When I see new papers and presentations I always ask “where is the data” Investigators are reminded that we can’t highlight work in our reports and proposals that lack archived data

Motivating Investigators “If your research is important, then so is your data and it needs to be archived” Implicit: “If your data isn’t important enough to archive, your research probably isn’t important either” “NSF will be checking to see that the data that supports our major conclusions are in the data archive, so we can’t highlight your research unless you have shared the data”

Information Management System Metadata is generated using a “Metadatabase” system, consisting of a relational database and web forms for input Data is stored in native forms, typically comma-separated value files

VCR/LTER Metadatabase Personnel Projects Datasets Tables Data_Objects Keywords Publications Permissions & physical location of files Variables Codes ∞ (Junction tables not shown for many-to-many relations) Locations ∞

Metadata Editing

Metadatabase Advantages Flexible output using standard database tools (SQL interfaces in C, Python, PERL, R, Matlab etc.) Web-form-based User Interface can be exposed directly to researchers If someone wants a change – they can make it themselves That said, XML can also perform similarly to a relational database

Challenges Investigators often provide spreadsheet files that deviate from good practice Lack of consistent columns Column headings Multiple lines Include spaces Include mathematical operations or special characters Inconsistent handling of dates, times, codes These problems must be resolved before the data can be properly described in the metadatabase Specialized database input systems solve these problems, but are time-consuming to set up

EML Generation EML Index (resultSet) Metadatabase PERL program EML Documents The PERL program uses: SQL calls Retrieval of text files Calls to external programs that return information about data files (e.g., line terminators, header lengths) Retrieval of XML code snippets (e.g., spatial vector and raster, taxonomicCoverage) to populate the EML tags Subroutines correspond to specific EML tags

Direct style-sheet transform of EML document Retrieving Data Direct style-sheet transform of EML document

Sensor Data Processing

Processing of Streaming Data Previous Data Integrated Data New Data Transformations & Range Checks Analysis Merge Data Visualization “Problem” Observations Eliminate Duplicates

VCR Wireless

Streaming data from a network of ground-water wells

Real-world Example Here is an outline of how meterological data has processed at the VCR/LTER since 1994 Campbell Scientific “Loggernet” collects raw data to a comma-separated-value (CSV file) on a small PC on the Eastern Shore Every 3 hours , MSDOS BATCH file copies the CSV file to a server back at the Univ. of Virginia Shortly thereafter, a LINUX shell-script combines CSV files from different stations into a single file, and compresses and archives the original CSV files Runs a SAS statistical program that performs range checks, and merges with previous data A SAS program queries a web service to get code to run to address specific quality-control issues Runs another SAS job that updates graphics and reports for the current month

Workflow output

Addressing Sensor Problems Raw Level 0 Data Identify Problem Run Code Add to Sensor Problem Database Corrected Level 1 Data Generate Report Generate Code

Automatically generated code to address the problems Listing of Problems Automatically generated code to address the problems

Available Tools EML to KML web service EML to statistical program web service (now used by LTER Portal) PASTASummary – sends out data download reports to investigators for individual datasets PastaUseCount – Summarizes downloads for reports PastaMetacatSyncReport – compares an EML harvestlist with what is already published in PASTA and lists updates and new datasets emlStats – tallies statistics on EML files (number attributes, keywords, locations, taxonomic etc.) Stylesheet for ENDNOTE to work with Research.gov reports

VCR/LTER Information Management Timeline 1989- Metadata system created using Dbase III 1990- GIS Lab Established 1990 - Data Management Policy 1992 - Electronic Mail Calendar 1992 - Gopher Information Server 1993 - WWW Server 1994 - Online Research Summaries 1995 - Web-based Personnel Directory 1996 - Automated System for Research Summaries 1996 – ClimDB harvest document created 1996 - Biodiversity Database 1997 - Web form-based Information Management Tools, Dbase III system ported to MiniSQL 1999 - Automated Statistical Programs

VCR/LTER Information Management Timeline 2000 – EML 1.4 Metadata 2001 – ClimDB harvest document revised 2002 – Wireless Internet connection to island field site 2003 – Mapserver online maps created 2004 – Upgrade of computer systems 2004 – EML 2.1 Metadata 2005 – Web Page revised using PostNuke Content Management System 2005 – EML to SAS, SPSS and R software converters 2007 – Hog well wireless network installed 2008 – Upgrade of computers to Linux Virtual Machines 2009 – Web Page revised Drupal Content Management System

VCR/LTER Information Management Timeline 2010 – EML 2.1 Metadata 2011 – Web service for converting EML to R, SAS and SPSS programs 2010 – 40% of datasets made accessible through LTER Data Access Server 2011 – Keywords converted to LTER-standard keywords 2012 – All data access via LTER Data Access Server 2012 – Dataset displays shifted from locally-written programs to XSL transforms of EML 2013 – Selected datasets available via PASTA 2014 – spatialVector and spatialRaster data types added 2014 – PASTA repository used for all stable data packages 2014 – Tools for tracking and disseminating PASTA status reports 2015 – LTER Portal uses PASTAprog statistical program generation 2016 – Upgrade system, move to WordPress CMS for the web site 2017 – Move to server at UVA Data Center 2017 – Add taxonomicCoverage elements to appropriate datasets

Philosophy Create reusable workflows whenever possible Sufficiently automated that they require minimal supervision Use generic, widely available tools Less turnover, better transition tools Don’t put data or metadata into something you aren’t sure you can get it OUT of!

? JPORTER@Virginia.EDU Questions?