Download presentation
Presentation is loading. Please wait.
Published byJune Hunt Modified over 8 years ago
1
1 of 53 Lecture 3 Metadata Steve Burian Hydroinformatics Fall 2013 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732
2
2 of 53 Objectives Define science metadata Identify the types of information included in metadata records for environmental datasets Determine the dimensionality of a dataset, including the scale triplet of support, spacing extent for both space and time Generate metadata and describe datasets to support data sharing
3
3 of 53 The Data Life Cycle Plan Collect AssureDescribePreserveDiscoverIntegrateAnalyze
4
4 of 53 Metadata is “Information about Data” – WHO created the data? – WHAT is the content of the data? – WHEN were the data created? – WHERE is it geographically? – WHY were the data developed? – HOW were the data developed? What is Metadata? Content, quality, condition, and other characteristics
5
5 of 53 The Purpose of Metadata Support discovery of scientific data Facilitate acquisition, comprehension, and use of data by HUMANS Enable automated discovery, ingestion, processing and analysis by MACHINES
6
6 of 53 Metadata is All Around CC image by USDAgov on Flickr Details about the songs in your MP3 library. Details about the cereal you ate for breakfast this morning.
7
7 of 53 Data vs. Metadata Data Metadata 15.9 Little Bear River at Mendon Road Latitude = 43.0000 Longitude = -111.0000 Water temperature Degrees Celsius 9/30/2011 5:00 PM
8
8 of 53 provide When you provide data to someone else, what types of information would you want to include with the data? receive When you receive a dataset from an external source, what types of details do you want to know about the data? Sharing Data
9
9 of 53 Providing data: – Why were the data created? – What limitations do the data have? – What does the data mean? – How should the data be cited if it is re-used in a new study? Receiving data: – What are the data gaps? – What processes were used for creating the data? – Are there any fees associated with the data? – In what scale were the data created? – What do the values in the tables mean? – What software do I need in order to read the data? – What projection are the data in? – Can I give these data to someone else? Sharing Data
10
10 of 53 Necessary Meta/data Structure The degree of metadata format and structure necessary for different levels of projected secondary data utilization. (adapted from Michener et al., 1997).
11
11 of 53 Information Entropy Example of the normal degradation in information content associated with data and metadata over time (“information entropy”). (Figure taken from Michener, 2006).
12
12 of 53 What if instead? Paper using data is published Curated data published in a data repository Data annotated by additional users Data synthesized and leads to another publication Time Information Content of Data and Metadata
13
13 of 53 Metadata to Support Understanding and Using Data
14
14 of 53 Metadata for Data Use Research context – Hypotheses, site characteristics, experimental design, research methods Status of the dataset (e.g., raw? processed?) Spatial and temporal domain of the dataset Physical structure of the data
15
15 of 53 Scale Issues in Interpretation of Measurements and Modeling Results The Scale Triplet of Measurements Adapted from: Blöschl (1996) Spatial extent represented by grid Grid cell size Average over grid cell? Sample value at grid cell center? Interpretation for Geospatial Data
16
16 of 53 Issues in Data Interpretation Adapted from: Blöschl (1996) Spacing too large - aliasing
17
17 of 53 Issues in Data Interpretation Adapted from: Blöschl (1996) Extent too small - trend
18
18 of 53 Issues in Data Interpretation Adapted from: Blöschl (1996) Support too large - smoothing
19
19 of 53 Another Example: Raster Spatial Resolution Higher resolution Higher spatial accuracy Slower display Slower processing Larger file size Lower resolution Lower spatial accuracy Faster display Faster processing Smaller file size Decreasing Cell Size Increasing Cell Size 100 m 30 m 10 m 1 m Slide from David Tarboton
20
20 of 53 Data Use Case – NRCS SNOTEL
21
21 of 53 Data Use Case – NRCS SNOTEL “I am trying to obtain a record of hourly precipitation. Your sites have data for precipitation accumulation, but I cannot use that to back calculate precipitation for each hour because there are frequently losses in precipitation. For example, the accumulated precipitation might go from 7.5 to 7.4 to 7.3 and then increase again to 7.5. Can you offer any advice on obtaining the incremental rather than accumulated precipitation?”
22
22 of 53 An Interesting Response from NRCS “The short answer to your question is: don't use it. its all crap. The long answer is: the sensor used to detect precipitation and snow water equivalent is a pressure transducer modified from reading pressures of up to 2000 psi down to reading pressures of 0 to 5 psi and does not have the necessary stability in either accuracy or precision to measure hourly values of 0.1 inches especially within the environment of its setting with varying temperatures, barometric pressure, expansion and contraction of the precip gage itself, frost heave and the list goes on. These data on an hourly incremental basis are not stable. On a daily basis measured from midnight to midnight are sufficiently accurate for our purposes of water supply forecasting and hydrologic modeling - but even then can be plus/minus 0.5 inches at some of the more variable sites. As the time increment increases, the accuracy of the gage increases as well. daily values are decent, weekly more so and monthly pretty close as we edit the variability out. Hourly are crap. Bottom line - don't use hourly pcp data for anything other than amusement or instructional value of knowing/teaching what your data are, how they are collected and what they represent.”
23
23 of 53 Metadata Format and Standards
24
24 of 53 Web Quest…the good, the bad, the ugly Work in teams of 2-3 Seek two examples of scientific metadata – one you find good and one your find to be not so good – identify specifically why Make a list of the elements/information you find in the metadata You have 10 minutes, send me the link to your best example for both categories – good and bad to steve.burian@utah.edu
25
25 of 53 What Does Scientific Metadata Look Like?
26
26 of 53 A structure to describe data with: – Common terms to allow consistency between records – Common definitions for easier interpretation – Common language for ease of communication – Common structure to quickly locate information Encoding – structured text or Extensible Markup Language (XML) In search and retrieval, standards provide: – Documentation structure in a reliable and predictable format for computer interpretation – A uniform summary description of the dataset What is a Metadata Standard?
27
27 of 53 General Metadata Organization Information for data discovery – Title, keywords, spatial and temporal domain, abstract Information for interpretation and appropriate use – Research objectives, experimental design, sampling procedures, site selection, variables and units, data processing Information for automated use – Structural attributes of the data (schema) and format of the data (syntax)
28
28 of 53 Dublin Core Element Set – Emphasis on web resources, publications – http://dublincore.org/documents/dces/ Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) – Emphasis on geospatial data – Commonly used by federal agencies – http://www.fgdc.gov/metadata/geospatial-metadata-standards http://www.fgdc.gov/metadata/geospatial-metadata-standards International Standards Organization (ISO) 19115/19139 Geographic information: Metadata – Emphasis on geospatial data and services – http://www.fgdc.gov/metadata/geospatial-metadata- standards#fgdcendorsedisostandards http://www.fgdc.gov/metadata/geospatial-metadata- standards#fgdcendorsedisostandards Examples of Metadata Standards
29
29 of 53 Ecological Metadata Language (EML) – Focus on ecological data – http://knb.ecoinformatics.org/eml_metadata_guide.html http://knb.ecoinformatics.org/eml_metadata_guide.html Water Markup Language (WaterML) – Emphasis on time series of hydrologic observations – More of a data encoding language – https://portal.opengeospatial.org/files/?artifact_id=48531 https://portal.opengeospatial.org/files/?artifact_id=48531 Examples of Metadata Standards
30
30 of 53 Scientific Metadata Comparison
31
31 of 53 Many standards collect similar information Factors to consider: – Your data type: Are you working mainly with GIS data? Rastor/vector or point data? Do you work for a federal agency? – Consider the FGDC Content Standard for Digital Geospatial Metadata. Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling? – Consider using the ISO 19115-2 standard Are you mainly working with ecological data? – Consider Ecological Metadata Language (EML) Choosing a Metadata Standard
32
32 of 53 More Factors to consider: – Your organization’s policies – What resources ($$$) are available to create metadata? – What tools are available? – Availability of human support – Instructional materials – Use of controlled vocabularies – How many standards or output formats? Choosing a Metadata Standard
33
33 of 53 Tools for Creating Metadata FGDC CSDGM: – Mermaid (NOAA) http://www.ncddc.noaa.gov/metadata-standards/mermaid/http://www.ncddc.noaa.gov/metadata-standards/mermaid/ – Metavist (Forest Service) http://ncrs.fs.fed.us/pubs/viewpub.asp?key=2737http://ncrs.fs.fed.us/pubs/viewpub.asp?key=2737 – TKME (USGS) http://geology.usgs.gov/tools/metadata/tools/doc/tkme.htmlhttp://geology.usgs.gov/tools/metadata/tools/doc/tkme.html – ESRI ArcCatalog ISO: – ESRI ArcCatalog – XML Spy, Oxegyn, CatMD – http://www.fgdc.gov/metadata/iso-metadata-editor-review http://www.fgdc.gov/metadata/iso-metadata-editor-review EML: – Morpho (KNB) http://knb.ecoinformatics.org/morphoportal.jsphttp://knb.ecoinformatics.org/morphoportal.jsp
34
34 of 53 Metadata Editor Example – ArcCatalog Customizable metadata styles ISO 19139 ISO 19115 FGDC CSDGM INSPIRE
35
35 of 53 Metadata Standards for Non-Data Objects Community Surface Dynamics Modeling System (CSDMS) Metadata for model components Under development Image from Jon Goodall and Mostafa Elag, Phyllis Mbewe, University of South Carolina
36
36 of 53 Concerns About Creating Metadata ConcernSolution Workload required to capture accurate, robust metadata Incorporate metadata creation into data development process – distribute the effort Time and resources to create, manage, and maintain metadata Include in grant budget and schedule Readability / usability of metadata Use a standardized metadata format Discipline specific information and vocabularies ‘Profile’ standard to require specific information and use specific values
37
37 of 53 The Value of Metadata
38
38 of 53 The descriptive content of the metadata file can be used to identify, assess, and access available data resources Data Discovery and Reuse online access order process contacts use constraints access constraints data quality availability/pricing keywords geographic location time period attributes
39
39 of 53 Find data by: – themes / attributes – geographic location – time ranges – analytical methods used – sources and contributors – data quality Data Discovery and Reuse
40
40 of 53 Example of How Metadata is Used http://www.arcgis.com/home/index.html
41
41 of 53 Data.gov – Federal e-gov geospatial data portal – http://geo.data.gov http://geo.data.gov DataONE – Repository for data and metadata – http://cn.dataone.org http://cn.dataone.org US Geological Survey – USGS Core Science Metadata Clearinghouse – http://mercury.ornl.gov/clearinghouse http://mercury.ornl.gov/clearinghouse Other Data Portals
42
42 of 53 Metadata allows you to repeat scientific process if: – methodologies are defined – variables are defined – analytical parameters are defined Metadata allows you to defend your scientific process: – demonstrate process – increasingly GIS/data-savvy public requires metadata for consumer information Data Accountability INPUT RESULTS
43
43 of 53 Metadata can be a declaration of: – Purpose: the originator’s intended application of the data – Use Constraints: inappropriate applications of the data – Completeness: features or geographies excluded from the data – Distribution liability: explicit liability of the data producer and assumed liability of the consumer Data Liability
44
44 of 53 Metadata can be a means to improve communications among project participants using common: – descriptions & parameters – keywords, vocabularies, thesauri – contact information – attributes – distribution information If reviewed regularly by all participants, metadata created early and updated during the project improves opportunity for coordinating: – source data – analytical methods – new information Project Coordination
45
45 of 53 Value of Metadata to Data Producers Avoid data duplication Share reliable information Publicize efforts – promote the work of a scientist and his/her contributions to a field of study
46
46 of 53 Value of Metadata to Data Users Search, retrieve, and evaluate data set information from both inside and outside an organization Find data: Determine what data exists for a geographic location and/or topic Determine applicability: Decide if a data set meets a particular need Discover how to acquire the dataset you identified Process and use the dataset
47
47 of 53 Value of Metadata to Organizations Metadata helps ensure an organization’s investment in data – Documentation of data processing steps, quality control, definitions, data uses, and restrictions – Ability to use data after initial intended purpose Transcends people and time – Offers data permanence – Creates institutional memory Advertises an organization’s research – Creates possible new partnerships and collaborations through data sharing
48
48 of 53 Summary (1) Metadata is documentation of data A metadata record captures critical information about the content of a dataset – e.g., spatial and temporal support, spacing, extent Metadata allows data to be discovered, accessed, and re-used
49
49 of 53 Summary (2) Metadata standards provide structure and consistency to data documentation Standards and tools vary – Select according to defined criteria such as data type, organizational guidance, and available resources Metadata is of critical importance to data developers, data users, and organizations
50
50 of 53 References Michener, W.K. (2006). Meta-information concepts for ecological data management, Ecological Informatics, 1(1), 3-7, http://dx.doi.org/10.1016/j.ecoinf.2005.08.004.http://dx.doi.org/10.1016/j.ecoinf.2005.08.004 Michener, W.K., J.W. Brunt, J.J. Helly, T.B. Kirchner, S.G. Stafford (1997). Nongeospatial metadata for the ecological sciences, Ecological Applications, 7(1), 330-342, http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2 http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2 Blöschl, G. (1996). Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p. Credits: Many ideas and some slides in this presentation were taken from: Henkel, H., V. Hutchison, S. Strasser, S. Rebich Hespanha, K. Vanderbilt, L. Wayne, (2012). DataONE education modules, DataONE Project, University of New Mexico, Albuquerque, NM, Available at: http://www.dataone.org/education-modules. (last accessed 9-4-2012)http://www.dataone.org/education-modules
51
51 of 53 Assignment 1. Metadata and the Data Life Cycle Your employer is developing a hydrologic model for the Little Bear River in Cache Valley and wants to model the impact of changes in land cover on hydrology in this watershed between 2002 and 2012. Your boss has asked you whether s/he can use the United States Geological Survey (USGS) National Land Cover Dataset (available for 1992, 2001, and 2006) in the study.
52
52 of 53 National Land Cover Dataset GIS gridded data product Nation-wide coverage Data available for 1992, 2001, 2006 Vegetation/land cover types Used for model inputs and parameterization
53
53 of 53 For your recommendation, consider: 1.What does the data represent? 2.How were the data created, collected, and/or observed? 3.What was the source of the data? 4.What is the format or syntax of the data? 5.What manipulations, transformations, or derivations have been performed to produce the data? 6.What are the spatial and temporal support, spacing, and extent for these datasets? 7.What are appropriate uses for the dataset that you have selected? 8.What are the limitations to the data? 9.Are there differences in the way the data for the different years were produced that make them incompatible?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.