Download presentation
Presentation is loading. Please wait.
Published byGwen Rich Modified over 6 years ago
1
Information Management & Technology of the VCR/LTER Project
John Porter, VCR LTER
2
Roadmap Overview – Objectives, Resources
Collecting metadata and data from researchers Curation of data and metadata Data Sharing - producing EML Special Topics Streaming Data Evolution of the VCR/LTER system
3
Objectives of Information Mgmt.
Promote advances in ecological science through: Providing the information resources needed by VCR/LTER researchers Web site acts as the “file cabinet” for the project Make those resources available to the rest of the ecological research community
5
VCR/LTER Information Resources
6
Individual Researchers VCR LTER Information Management
Data Sources Individual Researchers Technicians Automated Sensors GIS & Remote Sensing VCR LTER Information Management Spreadsheets DBMS Statistical Packages Metadata XML GIS
7
Data Priorities Long-Term Data Collected by VCR/LTER Researchers
Baseline and Other Short-Term Data Collected by VCR/LTER Researchers Graduate students are strongly encouraged to include a “Data appendix” in their Thesis External Data useful to VCR/LTER Researchers
8
Identifying Data Existing long-term data are relatively easy
they have date stamps that make them easy to extract from the metadatabase Brand new datasets require some detective work When I see new papers and presentations I always ask “where is the data” Investigators are reminded that we can’t highlight work in our reports and proposals that lack archived data
9
Motivating Investigators
“If your research is important, then so is your data and it needs to be archived” Implicit: “If your data isn’t important enough to archive, your research probably isn’t important either” “NSF will be checking to see that the data that supports our major conclusions are in the data archive, so we can’t highlight your research unless you have shared the data”
10
Information Management System
Metadata is generated using a “Metadatabase” system, consisting of a relational database and web forms for input Data is stored in native forms, typically comma-separated value files
11
VCR/LTER Metadatabase
Personnel Projects Datasets Tables Data_Objects Keywords Publications Permissions & physical location of files Variables Codes ∞ (Junction tables not shown for many-to-many relations) Locations ∞
12
Metadata Editing
13
Metadatabase Advantages
Flexible output using standard database tools (SQL interfaces in C, Python, PERL, R, Matlab etc.) Web-form-based User Interface can be exposed directly to researchers If someone wants a change – they can make it themselves That said, XML can also perform similarly to a relational database
14
Challenges Investigators often provide spreadsheet files that deviate from good practice Lack of consistent columns Column headings Multiple lines Include spaces Include mathematical operations or special characters Inconsistent handling of dates, times, codes These problems must be resolved before the data can be properly described in the metadatabase Specialized database input systems solve these problems, but are time-consuming to set up
15
EML Generation EML Index (resultSet) Metadatabase PERL program
EML Documents The PERL program uses: SQL calls Retrieval of text files Calls to external programs that return information about data files (e.g., line terminators, header lengths) Retrieval of XML code snippets (e.g., spatial vector and raster, taxonomicCoverage) to populate the EML tags Subroutines correspond to specific EML tags
16
Direct style-sheet transform of EML document
Retrieving Data Direct style-sheet transform of EML document
17
Sensor Data Processing
18
Processing of Streaming Data
Previous Data Integrated Data New Data Transformations & Range Checks Analysis Merge Data Visualization “Problem” Observations Eliminate Duplicates
19
VCR Wireless
20
Streaming data from a network of ground-water wells
21
Real-world Example Here is an outline of how meterological data has processed at the VCR/LTER since 1994 Campbell Scientific “Loggernet” collects raw data to a comma-separated-value (CSV file) on a small PC on the Eastern Shore Every 3 hours , MSDOS BATCH file copies the CSV file to a server back at the Univ. of Virginia Shortly thereafter, a LINUX shell-script combines CSV files from different stations into a single file, and compresses and archives the original CSV files Runs a SAS statistical program that performs range checks, and merges with previous data A SAS program queries a web service to get code to run to address specific quality-control issues Runs another SAS job that updates graphics and reports for the current month
22
Workflow output
23
Addressing Sensor Problems
Raw Level 0 Data Identify Problem Run Code Add to Sensor Problem Database Corrected Level 1 Data Generate Report Generate Code
25
Automatically generated code to address the problems
Listing of Problems Automatically generated code to address the problems
26
Available Tools EML to KML web service
EML to statistical program web service (now used by LTER Portal) PASTASummary – sends out data download reports to investigators for individual datasets PastaUseCount – Summarizes downloads for reports PastaMetacatSyncReport – compares an EML harvestlist with what is already published in PASTA and lists updates and new datasets emlStats – tallies statistics on EML files (number attributes, keywords, locations, taxonomic etc.) Stylesheet for ENDNOTE to work with Research.gov reports
27
VCR/LTER Information Management Timeline
1989- Metadata system created using Dbase III 1990- GIS Lab Established Data Management Policy Electronic Mail Calendar Gopher Information Server WWW Server Online Research Summaries Web-based Personnel Directory Automated System for Research Summaries 1996 – ClimDB harvest document created Biodiversity Database Web form-based Information Management Tools, Dbase III system ported to MiniSQL Automated Statistical Programs
28
VCR/LTER Information Management Timeline
2000 – EML 1.4 Metadata 2001 – ClimDB harvest document revised 2002 – Wireless Internet connection to island field site 2003 – Mapserver online maps created 2004 – Upgrade of computer systems 2004 – EML 2.1 Metadata 2005 – Web Page revised using PostNuke Content Management System 2005 – EML to SAS, SPSS and R software converters 2007 – Hog well wireless network installed 2008 – Upgrade of computers to Linux Virtual Machines 2009 – Web Page revised Drupal Content Management System
29
VCR/LTER Information Management Timeline
2010 – EML 2.1 Metadata 2011 – Web service for converting EML to R, SAS and SPSS programs 2010 – 40% of datasets made accessible through LTER Data Access Server 2011 – Keywords converted to LTER-standard keywords 2012 – All data access via LTER Data Access Server 2012 – Dataset displays shifted from locally-written programs to XSL transforms of EML 2013 – Selected datasets available via PASTA 2014 – spatialVector and spatialRaster data types added 2014 – PASTA repository used for all stable data packages 2014 – Tools for tracking and disseminating PASTA status reports 2015 – LTER Portal uses PASTAprog statistical program generation 2016 – Upgrade system, move to WordPress CMS for the web site 2017 – Move to server at UVA Data Center 2017 – Add taxonomicCoverage elements to appropriate datasets
30
Philosophy Create reusable workflows whenever possible
Sufficiently automated that they require minimal supervision Use generic, widely available tools Less turnover, better transition tools Don’t put data or metadata into something you aren’t sure you can get it OUT of!
31
? Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.