Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 4 Data Management & Metadata

Similar presentations


Presentation on theme: "Lecture 4 Data Management & Metadata"— Presentation transcript:

1 Lecture 4 Data Management & Metadata
Steve Burian Hydroinformatics Fall 2014 This work was funded by National Science Foundation Grants EPS and EPS

2 Objectives Describe the data life cycle and data management
Develop data management techniques that improve organization, facilitate analysis, improve reproducibility, and improve capacity for data re-use Identify the types of information included in metadata records for environmental datasets Determine the dimensionality of a dataset, including the scale triplet of support, spacing extent for both space and time

3 Quiz You have 5 minutes to show us what you know/learned.

4 The Data Life Cycle Plan Collect Assure Describe Preserve Discover
Integrate Analyze

5 Activity 1 Work in teams of 2 or 3 to share ideas, but you are required to submit your own DMP with Assignment 1. Send me your draft plan in 20 minutes.

6 The Data Life Cycle Plan Collect Assure Describe Preserve Discover
Integrate Analyze

7 What is Metadata? Metadata is “Information about Data”
WHO created the data? WHAT is the content of the data? WHEN were the data created? WHERE is it geographically? WHY were the data developed? HOW were the data developed? Greek --- with, about, between, or among; typically used as prefix to mean “one level of description higher” Content, quality, condition, and other characteristics

8 The Purpose of Metadata
Support discovery of scientific data Facilitate acquisition, comprehension, and use of data by HUMANS Enable automated discovery, ingestion, processing and analysis by MACHINES There is a saying: “order is for the feeble minded only, while the genius masters the chaos” – not operationally practical…

9 Data vs. Metadata Data 15.9

10 Metadata Data Data vs. Metadata 15.9 Little Bear River at Mendon Road
Latitude = 15.9 Longitude = Water temperature Degrees Celsius 9/30/2011 5:00 PM

11 Sharing Data Providing data: Receiving data:
Why were the data created? What limitations do the data have? What does the data mean? How should the data be cited if it is re-used in a new study? Receiving data: What are the data gaps? What processes were used for creating the data? Are there any fees associated with the data? In what scale were the data created? What do the values in the tables mean? What software do I need in order to read the data? What projection are the data in? Can I give these data to someone else?

12 Necessary Meta/data Structure
The degree of metadata format and structure necessary for different levels of projected secondary data utilization. (adapted from Michener et al., 1997).

13 Metadata to Support Understanding and Using Data

14 Metadata for Data Use Research context
Hypotheses, site characteristics, experimental design, research methods Status of the dataset (e.g., raw? processed?) Spatial and temporal domain of the dataset Physical structure of the data

15 Scale Issues in Interpretation of Measurements and Modeling Results
The Scale Triplet of Measurements Interpretation for Geospatial Data Spatial extent represented by grid Average over grid cell? Sample value at grid cell center? Grid cell size Adapted from: Blöschl (1996)

16 Issues in Data Interpretation
Spacing too large - aliasing Adapted from: Blöschl (1996)

17 Issues in Data Interpretation
Extent too small - trend Adapted from: Blöschl (1996)

18 Issues in Data Interpretation
Support too large - smoothing Adapted from: Blöschl (1996)

19 Metadata Format and Standards

20 What Does Scientific Metadata Look Like?

21 What is a Metadata Standard?
A structure to describe data with: Common terms to allow consistency between records Common definitions for easier interpretation Common language for ease of communication Common structure to quickly locate information Encoding – structured text or Extensible Markup Language (XML) In search and retrieval, standards provide: Documentation structure in a reliable and predictable format for computer interpretation A uniform summary description of the dataset

22 General Metadata Organization
Information for data discovery Title, keywords, spatial and temporal domain, abstract Information for interpretation and appropriate use Research objectives, experimental design, sampling procedures, site selection, variables and units, data processing Information for automated use Structural attributes of the data (schema) and format of the data (syntax)

23 Examples of Metadata Standards
Dublin Core Element Set Emphasis on web resources, publications Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) Emphasis on geospatial data Commonly used by federal agencies International Standards Organization (ISO) 19115/ Geographic information: Metadata Emphasis on geospatial data and services

24 Examples of Metadata Standards
Ecological Metadata Language (EML) Focus on ecological data Water Markup Language (WaterML) Emphasis on time series of hydrologic observations More of a data encoding language There are many standards available to document data. Each has a different focus, yet ask for similar information about the data set.

25 The Value of Metadata

26 Data Discovery and Reuse
The descriptive content of the metadata file can be used to identify, assess, and access available data resources online access order process contacts ACCESS use constraints access constraints data quality availability/pricing ASSESS keywords geographic location time period attributes IDENTIFY

27 Data Accountability Metadata allows you to repeat scientific process if: methodologies are defined variables are defined analytical parameters are defined Metadata allows you to defend your scientific process: demonstrate process increasingly GIS/data-savvy public requires metadata for consumer information INPUT RESULTS

28 Project Coordination Metadata can be a means to improve communications among project participants using common: descriptions & parameters keywords, vocabularies, thesauri contact information attributes distribution information If reviewed regularly by all participants, metadata created early and updated during the project improves opportunity for coordinating: source data analytical methods new information

29 Value of Metadata to Data Producers
Avoid data duplication Share reliable information Publicize efforts – promote the work of a scientist and his/her contributions to a field of study

30 Value of Metadata to Data Users
Search, retrieve, and evaluate data set information from both inside and outside an organization Find data: Determine what data exists for a geographic location and/or topic Determine applicability: Decide if a data set meets a particular need Discover how to acquire the dataset you identified Process and use the dataset

31 Value of Metadata to Organizations
Metadata helps ensure an organization’s investment in data Documentation of data processing steps, quality control, definitions, data uses, and restrictions Ability to use data after initial intended purpose Transcends people and time Offers data permanence Creates institutional memory Advertises an organization’s research Creates possible new partnerships and collaborations through data sharing

32 Summary (1) Metadata is documentation of data
A metadata record captures critical information about the content of a dataset e.g., spatial and temporal support, spacing, extent Metadata allows data to be discovered, accessed, and re-used

33 Summary (2) Metadata standards provide structure and consistency to data documentation Standards and tools vary Select according to defined criteria such as data type, organizational guidance, and available resources Metadata is of critical importance to data developers, data users, and organizations

34 References Michener, W.K. (2006). Meta-information concepts for ecological data management, Ecological Informatics, 1(1), 3-7, Michener, W.K., J.W. Brunt, J.J. Helly, T.B. Kirchner, S.G. Stafford (1997). Nongeospatial metadata for the ecological sciences, Ecological Applications, 7(1), , Blöschl, G. (1996). Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p. Credits: Many ideas and some slides in this presentation were taken from: Henkel, H., V. Hutchison, S. Strasser, S. Rebich Hespanha, K. Vanderbilt, L. Wayne, (2012). DataONE education modules, DataONE Project, University of New Mexico, Albuquerque, NM, Available at: (last accessed )

35 Assignment 1. Metadata and the Data Life Cycle
Your employer is developing a hydrologic model for the Little Bear River in Cache Valley and wants to model the impact of changes in land cover on hydrology in this watershed between 2002 and Your boss has asked you whether s/he can use the United States Geological Survey (USGS) National Land Cover Dataset (available for 1992, 2001, and 2006) in the study.

36 National Land Cover Dataset
GIS gridded data product Nation-wide coverage Data available for 1992, 2001, 2006 Vegetation/land cover types Used for model inputs and parameterization

37 For your recommendation, consider:
What does the data represent? How were the data created, collected, and/or observed? What was the source of the data? What is the format or syntax of the data? What manipulations, transformations, or derivations have been performed to produce the data? What are the spatial and temporal support, spacing, and extent for these datasets? What are appropriate uses for the dataset that you have selected? What are the limitations to the data? Are there differences in the way the data for the different years were produced that make them incompatible?


Download ppt "Lecture 4 Data Management & Metadata"

Similar presentations


Ads by Google