Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office
Why LTER Data Co-op? A Diamond in the Rough Demonstrations How can I contribute data? How do I find data? How can I see who is using my data? How is Network synthesis enabled? How is provenance captured? Where do we go from here? Panel Discussion
LTER Network Office Its about community A cooperative … is an autonomous association of persons who voluntarily cooperate for their mutual social, economic, and cultural benefit. - Wikipedia Producers – LTER sites Middleware - PASTA Consumers – Science Community
LTER Network Office
Data producers can evaluate their data package prior to harvesting into PASTA Data packages are discovered via browsing and/or search tools Derived data may be generated when a data package insert or update event occurs Provenance metadata can be generated for derived data packages Data package use information is viewed by a contributor LTER Network Office
LTER Network Data Portal portal.lternet.edu
PASTA Web Service API
Subcomponent of the Data Package Manager component in PASTA Generates a quality report for each data package A quality report contains a set of quality checks Stored as XML but usually rendered in HTML for human readability 27 quality checks implemented in the NIS prototype (of 52 proposed by EML Metrics Working Group) Available to the greater ecoinformatics community via the Data Manager Library (ecoinformatics.org) LTER Network Office
An individual metric or a best practice May involve looking at: metadata (independent of data), or data (independent of metadata), or congruency between metadata and data Can result in one of four statuses valid info warn error LTER Network Office
Users can evaluate data packages before inserting them into PASTA An error status reported by any quality check blocks insertion of the data package into PASTA Every data package stored in PASTA has a quality report that can be accessed along with its metadata and data LTER Network Office
Data Package Quality Report
Evaluate Runs quality checks on the data package but doesnt insert it into PASTA May reveal more diagnostic information (as compared to harvest) because it doesnt necessarily halt after encountering the first error Harvest Runs quality checks on the data package; if no errors are discovered, inserts (or updates) the data package into PASTA May reveal less diagnostic information (as compared to evaluate) because it may halt as soon as an error is encountered Bottom line: Always evaluate before harvesting! LTER Network Office
EML is version or beyond Document is schema-valid EML Document is EML parser-valid All entity-level data URLs are live The packageId pattern matches scope.identifier.revision There are no duplicate entity names An entity-level URL which is not set to information returns data Data table does not have more fields than metadata attributes Data table does not have fewer fields than metadata attributes Database table can be created from EML metadata Field delimiter in metadata is a single character Document is schema-valid after dereferencing enumeratedDomain codes are unique (not yet implemented) LTER Network Office
Data can be loaded into the database Length of entityName is not excessive A methods element is present Record delimiter is present in metadata Data examined and possible record delimiters returned Number of records in metadata matches number of rows loaded At least one keyword element is present Dataset title length is at 5 least words Dataset abstract element is a minimum of 20 words...others not yet implemented LTER Network Office
Display downloaded data Display first insert row coverage element is present temporalCoverage element is present geographicCoverage element is present taxonomicCoverage element is present...others not yet implemented LTER Network Office
Data producers can evaluate their data package prior to harvesting into PASTA Data packages are discovered via browsing and/or search tools Derived data may be generated when a data package insert or update event occurs Provenance metadata can be generated for derived data packages Data package use information is viewed by a contributor LTER Network Office
North Inlet Meteorological – Air Temperature Yearly aggregation of data Down-sample Hourly to Daily and Monthly LTER Network Office … …
LTER Network Office PASTA NIN Workflow NIN Workflow Source Data
LTER Network Office PASTA NIN Workflow NIN Workflow Notify
LTER Network Office PASTA NIN Workflow NIN Workflow Request Data
LTER Network Office PASTA NIN Workflow NIN Workflow Source Data
LTER Network Office PASTA
LTER Network Office PASTA NIN Workflow NIN Workflow Derived Data
Subscribe to a Data Package event
LTER Network Office
Source Data Package Derived Data Package Workflow Description
Provenance Metadata
LTER Network Office
LTER Network Office
December 2012 Support DOI assignment to metadata and data objects Refine NIS Data Portal Complete metadata rendering Improve catalog browsing Hang out shingle Summer 2013 Standup DataONE member node