Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 of 53 Lecture 3 Metadata Steve Burian Hydroinformatics Fall 2013 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732.

Similar presentations


Presentation on theme: "1 of 53 Lecture 3 Metadata Steve Burian Hydroinformatics Fall 2013 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732."— Presentation transcript:

1 1 of 53 Lecture 3 Metadata Steve Burian Hydroinformatics Fall 2013 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732

2 2 of 53 Objectives Define science metadata Identify the types of information included in metadata records for environmental datasets Determine the dimensionality of a dataset, including the scale triplet of support, spacing extent for both space and time Generate metadata and describe datasets to support data sharing

3 3 of 53 The Data Life Cycle Plan Collect AssureDescribePreserveDiscoverIntegrateAnalyze

4 4 of 53 Metadata is “Information about Data” – WHO created the data? – WHAT is the content of the data? – WHEN were the data created? – WHERE is it geographically? – WHY were the data developed? – HOW were the data developed? What is Metadata? Content, quality, condition, and other characteristics

5 5 of 53 The Purpose of Metadata Support discovery of scientific data Facilitate acquisition, comprehension, and use of data by HUMANS Enable automated discovery, ingestion, processing and analysis by MACHINES

6 6 of 53 Metadata is All Around CC image by USDAgov on Flickr Details about the songs in your MP3 library. Details about the cereal you ate for breakfast this morning.

7 7 of 53 Data vs. Metadata Data Metadata 15.9 Little Bear River at Mendon Road Latitude = 43.0000 Longitude = -111.0000 Water temperature Degrees Celsius 9/30/2011 5:00 PM

8 8 of 53 provide When you provide data to someone else, what types of information would you want to include with the data? receive When you receive a dataset from an external source, what types of details do you want to know about the data? Sharing Data

9 9 of 53 Providing data: – Why were the data created? – What limitations do the data have? – What does the data mean? – How should the data be cited if it is re-used in a new study? Receiving data: – What are the data gaps? – What processes were used for creating the data? – Are there any fees associated with the data? – In what scale were the data created? – What do the values in the tables mean? – What software do I need in order to read the data? – What projection are the data in? – Can I give these data to someone else? Sharing Data

10 10 of 53 Necessary Meta/data Structure The degree of metadata format and structure necessary for different levels of projected secondary data utilization. (adapted from Michener et al., 1997).

11 11 of 53 Information Entropy Example of the normal degradation in information content associated with data and metadata over time (“information entropy”). (Figure taken from Michener, 2006).

12 12 of 53 What if instead? Paper using data is published Curated data published in a data repository Data annotated by additional users Data synthesized and leads to another publication Time Information Content of Data and Metadata

13 13 of 53 Metadata to Support Understanding and Using Data

14 14 of 53 Metadata for Data Use Research context – Hypotheses, site characteristics, experimental design, research methods Status of the dataset (e.g., raw? processed?) Spatial and temporal domain of the dataset Physical structure of the data

15 15 of 53 Scale Issues in Interpretation of Measurements and Modeling Results The Scale Triplet of Measurements Adapted from: Blöschl (1996) Spatial extent represented by grid Grid cell size Average over grid cell? Sample value at grid cell center? Interpretation for Geospatial Data

16 16 of 53 Issues in Data Interpretation Adapted from: Blöschl (1996) Spacing too large - aliasing

17 17 of 53 Issues in Data Interpretation Adapted from: Blöschl (1996) Extent too small - trend

18 18 of 53 Issues in Data Interpretation Adapted from: Blöschl (1996) Support too large - smoothing

19 19 of 53 Another Example: Raster Spatial Resolution Higher resolution Higher spatial accuracy Slower display Slower processing Larger file size Lower resolution Lower spatial accuracy Faster display Faster processing Smaller file size Decreasing Cell Size Increasing Cell Size 100 m 30 m 10 m 1 m Slide from David Tarboton

20 20 of 53 Data Use Case – NRCS SNOTEL

21 21 of 53 Data Use Case – NRCS SNOTEL “I am trying to obtain a record of hourly precipitation. Your sites have data for precipitation accumulation, but I cannot use that to back calculate precipitation for each hour because there are frequently losses in precipitation. For example, the accumulated precipitation might go from 7.5 to 7.4 to 7.3 and then increase again to 7.5. Can you offer any advice on obtaining the incremental rather than accumulated precipitation?”

22 22 of 53 An Interesting Response from NRCS “The short answer to your question is: don't use it. its all crap. The long answer is: the sensor used to detect precipitation and snow water equivalent is a pressure transducer modified from reading pressures of up to 2000 psi down to reading pressures of 0 to 5 psi and does not have the necessary stability in either accuracy or precision to measure hourly values of 0.1 inches especially within the environment of its setting with varying temperatures, barometric pressure, expansion and contraction of the precip gage itself, frost heave and the list goes on. These data on an hourly incremental basis are not stable. On a daily basis measured from midnight to midnight are sufficiently accurate for our purposes of water supply forecasting and hydrologic modeling - but even then can be plus/minus 0.5 inches at some of the more variable sites. As the time increment increases, the accuracy of the gage increases as well. daily values are decent, weekly more so and monthly pretty close as we edit the variability out. Hourly are crap. Bottom line - don't use hourly pcp data for anything other than amusement or instructional value of knowing/teaching what your data are, how they are collected and what they represent.”

23 23 of 53 Metadata Format and Standards

24 24 of 53 Web Quest…the good, the bad, the ugly  Work in teams of 2-3  Seek two examples of scientific metadata – one you find good and one your find to be not so good – identify specifically why  Make a list of the elements/information you find in the metadata  You have 10 minutes, send me the link to your best example for both categories – good and bad to steve.burian@utah.edu

25 25 of 53 What Does Scientific Metadata Look Like?

26 26 of 53 A structure to describe data with: – Common terms to allow consistency between records – Common definitions for easier interpretation – Common language for ease of communication – Common structure to quickly locate information Encoding – structured text or Extensible Markup Language (XML) In search and retrieval, standards provide: – Documentation structure in a reliable and predictable format for computer interpretation – A uniform summary description of the dataset What is a Metadata Standard?

27 27 of 53 General Metadata Organization Information for data discovery – Title, keywords, spatial and temporal domain, abstract Information for interpretation and appropriate use – Research objectives, experimental design, sampling procedures, site selection, variables and units, data processing Information for automated use – Structural attributes of the data (schema) and format of the data (syntax)

28 28 of 53 Dublin Core Element Set – Emphasis on web resources, publications – http://dublincore.org/documents/dces/ Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) – Emphasis on geospatial data – Commonly used by federal agencies – http://www.fgdc.gov/metadata/geospatial-metadata-standards http://www.fgdc.gov/metadata/geospatial-metadata-standards International Standards Organization (ISO) 19115/19139 Geographic information: Metadata – Emphasis on geospatial data and services – http://www.fgdc.gov/metadata/geospatial-metadata- standards#fgdcendorsedisostandards http://www.fgdc.gov/metadata/geospatial-metadata- standards#fgdcendorsedisostandards Examples of Metadata Standards

29 29 of 53 Ecological Metadata Language (EML) – Focus on ecological data – http://knb.ecoinformatics.org/eml_metadata_guide.html http://knb.ecoinformatics.org/eml_metadata_guide.html Water Markup Language (WaterML) – Emphasis on time series of hydrologic observations – More of a data encoding language – https://portal.opengeospatial.org/files/?artifact_id=48531 https://portal.opengeospatial.org/files/?artifact_id=48531 Examples of Metadata Standards

30 30 of 53 Scientific Metadata Comparison

31 31 of 53 Many standards collect similar information Factors to consider: – Your data type: Are you working mainly with GIS data? Rastor/vector or point data? Do you work for a federal agency? – Consider the FGDC Content Standard for Digital Geospatial Metadata. Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling? – Consider using the ISO 19115-2 standard Are you mainly working with ecological data? – Consider Ecological Metadata Language (EML) Choosing a Metadata Standard

32 32 of 53 More Factors to consider: – Your organization’s policies – What resources ($$$) are available to create metadata? – What tools are available? – Availability of human support – Instructional materials – Use of controlled vocabularies – How many standards or output formats? Choosing a Metadata Standard

33 33 of 53 Tools for Creating Metadata FGDC CSDGM: – Mermaid (NOAA) http://www.ncddc.noaa.gov/metadata-standards/mermaid/http://www.ncddc.noaa.gov/metadata-standards/mermaid/ – Metavist (Forest Service) http://ncrs.fs.fed.us/pubs/viewpub.asp?key=2737http://ncrs.fs.fed.us/pubs/viewpub.asp?key=2737 – TKME (USGS) http://geology.usgs.gov/tools/metadata/tools/doc/tkme.htmlhttp://geology.usgs.gov/tools/metadata/tools/doc/tkme.html – ESRI ArcCatalog ISO: – ESRI ArcCatalog – XML Spy, Oxegyn, CatMD – http://www.fgdc.gov/metadata/iso-metadata-editor-review http://www.fgdc.gov/metadata/iso-metadata-editor-review EML: – Morpho (KNB) http://knb.ecoinformatics.org/morphoportal.jsphttp://knb.ecoinformatics.org/morphoportal.jsp

34 34 of 53 Metadata Editor Example – ArcCatalog Customizable metadata styles ISO 19139 ISO 19115 FGDC CSDGM INSPIRE

35 35 of 53 Metadata Standards for Non-Data Objects Community Surface Dynamics Modeling System (CSDMS) Metadata for model components Under development Image from Jon Goodall and Mostafa Elag, Phyllis Mbewe, University of South Carolina

36 36 of 53 Concerns About Creating Metadata ConcernSolution Workload required to capture accurate, robust metadata Incorporate metadata creation into data development process – distribute the effort Time and resources to create, manage, and maintain metadata Include in grant budget and schedule Readability / usability of metadata Use a standardized metadata format Discipline specific information and vocabularies ‘Profile’ standard to require specific information and use specific values

37 37 of 53 The Value of Metadata

38 38 of 53 The descriptive content of the metadata file can be used to identify, assess, and access available data resources Data Discovery and Reuse online access order process contacts use constraints access constraints data quality availability/pricing keywords geographic location time period attributes

39 39 of 53 Find data by: – themes / attributes – geographic location – time ranges – analytical methods used – sources and contributors – data quality Data Discovery and Reuse

40 40 of 53 Example of How Metadata is Used http://www.arcgis.com/home/index.html

41 41 of 53 Data.gov – Federal e-gov geospatial data portal – http://geo.data.gov http://geo.data.gov DataONE – Repository for data and metadata – http://cn.dataone.org http://cn.dataone.org US Geological Survey – USGS Core Science Metadata Clearinghouse – http://mercury.ornl.gov/clearinghouse http://mercury.ornl.gov/clearinghouse Other Data Portals

42 42 of 53 Metadata allows you to repeat scientific process if: – methodologies are defined – variables are defined – analytical parameters are defined Metadata allows you to defend your scientific process: – demonstrate process – increasingly GIS/data-savvy public requires metadata for consumer information Data Accountability INPUT RESULTS

43 43 of 53 Metadata can be a declaration of: – Purpose: the originator’s intended application of the data – Use Constraints: inappropriate applications of the data – Completeness: features or geographies excluded from the data – Distribution liability: explicit liability of the data producer and assumed liability of the consumer Data Liability

44 44 of 53 Metadata can be a means to improve communications among project participants using common: – descriptions & parameters – keywords, vocabularies, thesauri – contact information – attributes – distribution information If reviewed regularly by all participants, metadata created early and updated during the project improves opportunity for coordinating: – source data – analytical methods – new information Project Coordination

45 45 of 53 Value of Metadata to Data Producers Avoid data duplication Share reliable information Publicize efforts – promote the work of a scientist and his/her contributions to a field of study

46 46 of 53 Value of Metadata to Data Users Search, retrieve, and evaluate data set information from both inside and outside an organization Find data: Determine what data exists for a geographic location and/or topic Determine applicability: Decide if a data set meets a particular need Discover how to acquire the dataset you identified Process and use the dataset

47 47 of 53 Value of Metadata to Organizations Metadata helps ensure an organization’s investment in data – Documentation of data processing steps, quality control, definitions, data uses, and restrictions – Ability to use data after initial intended purpose Transcends people and time – Offers data permanence – Creates institutional memory Advertises an organization’s research – Creates possible new partnerships and collaborations through data sharing

48 48 of 53 Summary (1) Metadata is documentation of data A metadata record captures critical information about the content of a dataset – e.g., spatial and temporal support, spacing, extent Metadata allows data to be discovered, accessed, and re-used

49 49 of 53 Summary (2) Metadata standards provide structure and consistency to data documentation Standards and tools vary – Select according to defined criteria such as data type, organizational guidance, and available resources Metadata is of critical importance to data developers, data users, and organizations

50 50 of 53 References Michener, W.K. (2006). Meta-information concepts for ecological data management, Ecological Informatics, 1(1), 3-7, http://dx.doi.org/10.1016/j.ecoinf.2005.08.004.http://dx.doi.org/10.1016/j.ecoinf.2005.08.004 Michener, W.K., J.W. Brunt, J.J. Helly, T.B. Kirchner, S.G. Stafford (1997). Nongeospatial metadata for the ecological sciences, Ecological Applications, 7(1), 330-342, http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2 http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2 Blöschl, G. (1996). Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p. Credits: Many ideas and some slides in this presentation were taken from: Henkel, H., V. Hutchison, S. Strasser, S. Rebich Hespanha, K. Vanderbilt, L. Wayne, (2012). DataONE education modules, DataONE Project, University of New Mexico, Albuquerque, NM, Available at: http://www.dataone.org/education-modules. (last accessed 9-4-2012)http://www.dataone.org/education-modules

51 51 of 53 Assignment 1. Metadata and the Data Life Cycle Your employer is developing a hydrologic model for the Little Bear River in Cache Valley and wants to model the impact of changes in land cover on hydrology in this watershed between 2002 and 2012. Your boss has asked you whether s/he can use the United States Geological Survey (USGS) National Land Cover Dataset (available for 1992, 2001, and 2006) in the study.

52 52 of 53 National Land Cover Dataset GIS gridded data product Nation-wide coverage Data available for 1992, 2001, 2006 Vegetation/land cover types Used for model inputs and parameterization

53 53 of 53 For your recommendation, consider: 1.What does the data represent? 2.How were the data created, collected, and/or observed? 3.What was the source of the data? 4.What is the format or syntax of the data? 5.What manipulations, transformations, or derivations have been performed to produce the data? 6.What are the spatial and temporal support, spacing, and extent for these datasets? 7.What are appropriate uses for the dataset that you have selected? 8.What are the limitations to the data? 9.Are there differences in the way the data for the different years were produced that make them incompatible?


Download ppt "1 of 53 Lecture 3 Metadata Steve Burian Hydroinformatics Fall 2013 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732."

Similar presentations


Ads by Google