FGDC Metadata and the Biological Data Profile Anchorage, AK January 25, 2006 Terry Giles USGS-FORT (970)
Metadata Workshop Topics Metadata: what it is, why you need it, and how to write good metadata. U.S. Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) and the Biological Data Profile. Implementation: decisions, challenges, and resources. Tools and resources for metadata creation and management.
Trainer’s Goals Everyone learn / meet your goals for the class Experience that metadata isn’t that scary Have fun!
Introduction: What are Metadata? Definitions Examples Types of information included
Introduction: What Are Metadata? In your own words – what does “metadata mean to you? Metadata are literally “data about data” - they describe the content, quality, condition, and other characteristics of the data.
Metadata in the Real World What are some everyday examples of “metadata”? Examples: food product labels, library records, information on a video or DVD, published maps, etc., etc., etc.
Working With Data provide When you provide data to someone else, what types of information would you want to include with the data? receive When you receive a dataset from an external source, what types of information do you want to know about the data?
Metadata describes the Who, What, Where, Why, and How Who created and maintains the data? Why were the data created? What is the content and structure of the data? When collected? When published? Where is the geographic location? Storage location? How were the data produced?
Mining Existing Resources Metadata is not a new or alien concept. We all have a strong history of documenting methodology, describing appropriate uses of the data, and writing summaries about data completeness and currentness. What existing documentation (i.e. metadata) materials do you have in your program / office?
Examples of existing “data about data” materials: Methodology documentation Database help records BASIS+ project/task entries Data sharing and licensing agreements Project agreement, documentation and reports Data requests – data use guidelines
The Key Point: All of us have personal experience with creating metadata.
The Value of Metadata Data developers Data users Organizations
Value to Data Developers? Avoid duplication Share reliable information Publicize efforts Reduce workload Documenting data is critical to preserving its usefulness over time; without proper documentation, no data set is complete
Value to Data Users? Search, retrieve, and evaluate data set information both inside and outside organizations Finding data - determine which data exist for a geographic location and/or topic Applicability - determine if a dataset meets your needs Access and transfer - acquire the dataset you identified, process and use the dataset
Value to Organizations? Organizes and maintains an organization’s investment in data Documentation of data processing steps, quality control, definitions, data uses and restrictions, etc. Transcends people and time; offers data permanence and creates institutional memory Saves time, money, frustration
Value to Organizations? “Advertising”: Provide information about datasets to data catalogs and clearinghouses External data sharing and data transfer: Provide information that is critical for others to understand and correctly use your data Helps share data with other agencies, lead to potential partnerships
Value to All: Data developers Data users Organizations Metadatahelps…
What’s new about metadata (i.e. why are we here today)? Creating and managing metadata standardized in a standardized format using a common set of terms.
Why Have a Metadata Standard?
Why Have a Standard? Helps you determine: If a set of data is available and fit for your use How to access and transfer the data set
Why Have a Standard? Helps to create: Common terms Common definitions Common language Common structure
Why Have a Standard? Establishes names of metadata elements and compound elements Defines information about values provided for metadata elements The standard serves as a uniform summary description of the data set Online systems rely on documentation being predictable in form and content
The Key to Using the Standard… If you’re creating metadata for the first time, it may seem complex - stick with it Don’t create your own version of the standard - you’ll only confuse people Find the fields that are pertinent to your data and your organization’s needs Build a template; use the template Ask questions!
Establishment of U.S. Metadata Standards
Executive Order (1994) Defines the responsibilities of the Federal Geographic Data Committee (FGDC) Requires that metadata be available to the public Requires creation of metadata for data sets from 1995 forward
FGDC’s Responsibilities Federal Geographic Data Committee (FGDC) is responsible for coordinating: development of National Spatial Data Infrastructure establishment of National Geospatial Data Clearinghouse development of standards cooperative efforts with State, Local, and tribal governments, and private sector implementation of digital geospatial data framework
Executive Order Federal Agencies responsible for: standardized documentation of all new data collected or produced beginning in January 1995 plans to document data previously collected or produced (legacy data) to the “extent practicable” making metadata and data available to the public utilize Clearinghouse to determine if data has already been collected or cooperative efforts are possible
FGDC Profiles and Extensions Extension: extended elements to the standard are elements outside the standard but needed by the data set producer Profile: document that describes the application of the Standard to a specific community Examples: Biological Data Profile, Shoreline Data Profile, Remote Sensing Extensions
Biological Data Profile: Defines Additional Elements Taxonomy Methodology Analytical tools
Biological Data Profile: Documents three types of data sets Explicitly biological Biological and geospatial Explicitly geospatial
National Research Council recommended in 1994 the establishment of a National Biotic Resource Information System to coordinate distributed databases and disseminate new data and information ~ NBII NBII established a federation of biological information sources and tools to help users find biological information and to combine information from various sources. NBII has a biological information focus, on both geospatial and non-geospatial data.
International Organization for Standardization (ISO) Metadata Standard ISO has been approved - an abstract standard that specifies general content for the metadata, but does not specify the format. ISO is under development - XML implementation schema specifying the metadata record format. The FGDC is developing metadata content for the U.S. National Profile of ISO
International Organization for Standardization (ISO) Metadata Standard ISO not yet the official U.S. metadata standard (important if need to provide FGDC-compliant metadata!). Software tools under development. Metadata created before the release of the ISO standard will not need to be altered. Updates and more information:
Other Metadata Standards Ecological Metadata Language (EML) Used for the Long-term Studies Section (LTSS) publicly accessible registry describing scientific data sets on ecology and the environment. Darwin Core Used for the Global Biodiversity Information Facility (GBIF) portal of collection and observation data.
The FGDC CSDGM Standard and the Biological Data Profile What the CSDGM Standard and the Biological Data Profile are Details about the Sections and Terms of the Standard
Biological Data Profile Workbook
FGDC metadata standard: overview Seven Major Metadata Sections: Section 1 - Identification Information* Section 2 - Data Quality Information Section 3 - Spatial Data Information Section 4 - Spatial Reference Information Section 5 - Entity and Attribute Information Section 6 - Distribution Information Section 7 - Metadata Information* Three Supporting Sections: Section 8 - Citation Information* Section 9 - Time Period Information* Section 10 - Contact Information* * Minimum required metadata
FGDC Metadata Standard: All the Details FGDC Metadata Standards: Content Standard for Digital Geospatial Metadata (CSDGM) (version 2.0), FGDC-STD Content Standard for Digital Geospatial Metadata, Part 1: Biological Data Profile, FGDC-STD (Note: The FGDC biological data profile is sometimes also referred to as the “NBII extension”)
FGDC Metadata Workbook & Graphic Representations Primary FGDC digital geospatial metadata standard FGDC metadata including the Biological Data Profile workbook.doc
1.2.1Abstract - a brief narrative summary of the data set. Type: Text Domain: free text FGDC Metadata Element Data ElementDefinition Choice of integer, real, text, date Valid values that can be assigned or “free text”, “free date”, or “free time” Element number
FGDC Graphic Representation color symbology A tool to visually describe the structure of the metadata standard; depicting information, organization, reporting requirements, and structure of the standard through the use of color and the relationship of information through the use of symbology.
Graphical Representation of the Elements Section 10/06/95 Data Elements (raised 3-d boxes) Compound Elements (not raised)
How Are Elements Grouped? Compound elements are composed of other compound or data elements. The composition is represented by nested boxes. Compound Element 1 Compound Element 1.1 Data Element Data Element Data Element /06/95
If an element can be repeated independently from other elements, a label below the element name states how many times the element may be repeated. If there is no label, the element does not repeat independently from other elements. What Can Repeat? How Many Times? Compound Element 1 (can be repeated unlimited times) Compound Element 1.1 Data Element Data Element Data Element /06/95
What’s Mandatory? What’s Not? Mandatory if applicable: Mandatory if applicable: must be provided if the data set exhibits the defined characteristic Meaning Data Element Compound Element Mandatory: Mandatory: must be provided Optional: Optional: provided at the discretion of the data producer Biological Data Profile yellowgreenbluered outline and text Biological Data Profile elements (yellow, green, or blue with red outline and text).
Navigating the FGDC Standard Status Progress Maintenance and Update Frequency 10/06/95 Attribute Accuracy Quantitative Attribute Accuracy Assessment Attribute Accuracy Value Attribute Accuracy Explanation Attribute Accuracy Report Mandatory if Applicable Optional Keywords Theme (can be repeated unlimited times) Theme Keyword Thesaurus Theme Keyword (can be repeated...) Place (can be repeated unlimited times) Place Keyword Thesaurus Place Keyword (can be repeated...)
How Do You Write Good Metadata? Rules and Tips for Creating Quality Metadata Files
Good Metadata: Steps to Quality Metadata Organize your information Write your metadata Review for accuracy and completeness Have someone else read your file Revise it, based on comments from your reviewer Review it once more before you publish it
Write simply but completely Document for a general audience Be consistent in style and terminology Good Metadata: Keep your readers in mind
Good Metadata: Think about the long-term effects Don’t use jargon Define technical terms and acronyms: CA, LA, GPS, GIS Clearly state data limitations Don’t use ALL CAPITAL LETTERS Use subheadings and/or bulleted lists Cite examples Use “none” or “unknown” meaningfully
Good Metadata: The Title Critical in helping readers find your data. A complete title includes: What, Where, When, Scale, Who An informative title includes: Topic, Timeliness of the data, Specific information about place and geography
Good Metadata: The Title If the data are officially published, in the title include: Series name Issue number Name of publisher Location of publisher
Good Metadata: The Title Which is better? Rivers Greater Yellowstone Rivers from 1:126,700 Forest Visitor Maps ( ) Examples of useful titles: Near Real Time Advanced Very High Resolution Radiometer (AVHRR) Data-Satellite Imagery from NOAA CSC Coastal Remote Sensing Division Ace Basin, South Carolina National Estuarine Research Reserve Digital Line Boundary
Vague: We checked our work and it looks complete. Specific:We checked our work using 3 separate sets of check plots reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections. Good Metadata: Be Specific, Quantify when you can
Use unambiguous words Use descriptive words Fully qualify geographic locations – where is ‘Portland’? Good Metadata: Select keywords wisely
Don’t use symbols that might be misinterpreted # % { } | / \ ~ Don’t use characters with dual interpretations Don’t use tabs or indents Be careful with the use of carriage returns Use “none” or “unknown” meaningfully Good Metadata: Remember, a computer will read your metadata
Have someone else read it If you’re the only reviewer, put it away and read it again later Check for clarity and omissions Can a novice understand what you wrote? Are your data properly documented for posterity? Good Metadata: Review your final product
Does the documentation present all the information needed to use or reuse the data? Are any pieces missing? Good Metadata: Review your final product
Taking a Closer Look: FGDC Metadata – Section by Section
Originator 8.1Originator name of an organization / individual that developed data set Publication Date 8.2Publication Date Publication Time 8.3Publication Time Title 8.4Title Edition 8.5Edition Geospatial Presentation Form BDP8.6 Geospatial Presentation Form Supporting Section 8: Citation Information
Series Information 8.7Series Information Publication Information 8.8Publication Information Place and Publisher Other Citation Details 8.9Other Citation Details Online Linkage 8.10Online Linkage online resource for the dataset – a URL Larger Work Citation 8.11Larger Work Citation Supporting Section 8: Citation Information
Single Date 9.1Single Date OR Multiple Date(s) 9.2Multiple Date(s) OR Range of Date(s) 9.3Range of Date(s) BDP allows use of Geologic Age information for the Time Period Supporting Section 9: Time Period Information
Contact Person Primary 10.1Contact Person Primary OR Contact Organization Primary 10.2Contact Organization Primary Contact Position 10.3Contact Position title of individual Contact Address 10.4Contact Address Minimal: type, city, state / province, postal code Contact Voice Telephone 10.5Contact Voice Telephone Supporting Section 10: Contact Information
Contact TDD/TTY Telephone 10.6Contact TDD/TTY Telephone Contact Facsimile Telephone 10.7Contact Facsimile Telephone Contact Electronic Mail Address 10.8Contact Electronic Mail Address Hours of Service 10.9Hours of Service Contact Instructions Contact Instructions supplemental instructions on how or when to contact the Contact Person or Organization Supporting Section 10: Contact Information
Citation 1.1Citation Originator, Publication Date, Title Description 1.2Description Abstract and Purpose Time Period of Content 1.3Time Period of Content Date(s) and Currentness Reference Status 1.4Status Progress and Maintenance Spatial Domain 1.5Spatial Domain Description of Geographic Extent; N, S, E, and W bounds Section 1: Identification Information
Keywords 1.6Keywords Theme Thesaurus and Keyword Taxonomy BDP1.7 Taxonomy Thesaurus, Keywords, Classification System, Procedures, Taxonomic Classification: rank and value Access Constraints 1.7Access Constraints restrictions and legal prerequisites for accessing the data; including protection of privacy or intellectual property Use Constraints 1.8Use Constraints restrictions for using the data set after access is granted Point of Contact 1.9Point of Contact Section 1: Identification Information
Browse Graphic 1.10Browse Graphic Data Set Credit 1.11 Data Set Credit recognition of those who contributed to the data set Security Information 1.12Security Information Native Data Set Environment 1.13Native Data Set Environment software used to create / analyze / export the dataset Cross Reference 1.14Cross Reference other, related data sets that are likely to be of interest Analytical Tool BDP1.15Analytical Tool tools, models, or statistical procedures
Attribute Accuracy 2.1Attribute Accuracy Logical Consistency Report 2.2Logical Consistency Report Completeness Report 2.3Completeness Report Positional Accuracy 2.4Positional Accuracy Horizontal and Vertical Lineage 2.5Lineage Methods used; Sources used; and Process Step Cloud Cover 2.6Cloud Cover Section 2: Data Quality Information
Section 2.5 Lineage: Taking a closer look The lineage and process step metadata elements
What do these have in common?
What do the terms “lineage” and “process step” mean to you?
Section 2.5 Lineage: Definition “Information about the events, parameters, and source data which constructed the data set, and information about the responsible parties.” What were the ingredients (source data)? How were they combined (process steps)? Who did the work (contact)?
Mandatory M if Applicable Optional What were the ingredients (source data)?
How were they combined (process steps)? Mandatory M if Applicable Optional
Who did the work (contact)? Mandatory M if Applicable Optional
Section 2.5 Data Lineage and Process Who would use this type of information? What is its value?
Indirect Spatial Reference 3.1Indirect Spatial Reference Direct Spatial Reference Method 3.2Direct Spatial Reference Method Point and Vector Object Information 3.3Point and Vector Object Information Type, Topology level, Count OR Raster Object Information 3.4Raster Object Information Type, Count Section 3: Spatial Data Organization Information
Horizontal Coordinate System Definition 4.1Horizontal Coordinate System Definition Geographic 4.1.1Geographic OR Planar 4.1.2Planar OR Map Projection OR Grid Coordinate System OR Local Planar Local Local Geodetic Model 4.1.4Geodetic Model Vertical Coordinate System 4.2Vertical Coordinate System 4.2.1Altitude System Definition 4.2.2Depth System Definition Section 4: Spatial Reference Information
Detailed Description 5.1Detailed Description Entity Type label, description & source for dataset features Attributes label (field name), field definition, definition source (authority ~ USGS, FIPS, etc.), and domain values / valid values AND/OR Section 5: Entity and Attribute Information
AND/OR Overview Description 5.2Overview Description 5.2.1Entity and Attribute Overview detailed summary of the information contained in a data set 5.2.2Entity and Attribute Detail Citation reference to the complete description of the entity types, attributes, and attribute values for the data set Section 5: Entity and Attribute Information
Enumerated Domain: list of possible values (conservation status ranks, domain tables) Range Domain: numeric values between limits (lat / long) Codeset Domain: values defined by a set of codes (FIPS codes, Quad codes, HUC codes, USESA values) Unrepresentable Domain: values not in a known predefined set (any text or comment field, common names) Section 5: Attribute Domain Values
Distributor 6.1Distributor Contact information for obtaining the data set Resource Description 6.2Resource Description Internal identifier such as a dataset name or code Distribution Liability 6.3Distribution Liability Liability statement regarding the data set Standard Order Process 6.4Standard Order Process general data request process, instructions, and fees Custom Order Process 6.5Custom Order Process Technical Prerequisites 6.6Technical Prerequisites Available Time Period 6.7Available Time Period Section 6: Distribution Information
Metadata Date 7.1Metadata Date date metadata created or last reviewed Metadata Review Date 7.2Metadata Review Date Metadata Future Review Date 7.3Metadata Future Review Date Metadata Contact 7.4Metadata Contact person or organization responsible for metadata content Metadata Standard Name 7.5Metadata Standard Name Metadata Standard Version 7.6Metadata Standard Version Section 7: Metadata Information
Metadata Time Convention 7.7Metadata Time Convention Metadata Access Constraints 7.8Metadata Access Constraints Metadata Use Constraints 7.9Metadata Use Constraints Metadata Security Information 7.10Metadata Security Information Metadata Extensions 7.11Metadata Extensions Example: Biological Data Profile Section 7: Metadata Information
Implementation – Where Do You Go From Here?
Implementation Decisions: Overview Getting support for metadata development? What is a “data set?” What needs metadata? When is the best time to collect metadata? When should you document legacy data? Who is the metadata being created for? For what purpose? External - Clearinghouse, Projects, Internet Internal - Data Library, Archives Who should create the metadata? Single individual or team approach What metadata creation tools?
Getting Support: The Value of Metadata is Organization Wide Saves time, money, frustration Preserve institutional memory and investment in data; written documentation rather than in one person’s brain Partnerships and “advertising” data collections, help share reliable information Efficiency – identify and use existing datasets, avoid duplication of effort Gives the data set creator(s) credit
Funding Metadata Development Internal funding options? External - include as project deliverable / scope of work, therefore included in the project budget!
Deciding what data sets need metadata? Dataset definitions: EO 12906, FGDC, NBII Historical data: NSDI “Guide for Federal Agencies” Prioritize data sets: Mission critical, High demand, Projects, Web downloadable, Legacy, Others? Data inventory: Format, Resolution, Contacts, Geographic locations, etc General rule of thumb: The “what if Joe Scientist gets hit by a bus tomorrow?” test
Who contributes to creating a metadata file? Single individual or team approach? Team Leader / Project Manager GIS Specialist Field Personnel Database Manager Science Staff Data Analysis Lead
More Metadata Training & Info FGDC metadata trainer directory FGDC metadata training calendar NBII metadata training program Other Training Materials list
Metadata Resources on the Web Federal Geographic Data Committee (FGDC): National Biological Information Infrastructure (NBII) / Biological Data Profile: USGS metadata resource page (includes factsheets, FAQs, MP validation tool):
Metadata Tools and Resources
Metadata Creation Tools Software Name Spatial Capture TemplatesBDPPrice ArcCatalog (ArcGIS 8.x) YESImport existing metadata, Advanced Synchronization tools NOPart of ArcGIS 8.x ArcView 3.x Extensions YESStores as.dbf files; can be reused as a “template” NOExtensions are free MetavistNOStores as XML files; can start with existing metadata to use as a template YESFree SMMSYESYES – creates whole record templates, and citations & contacts YES~ $600 TKMENOStart with existing metadata record to use as a template YESFree
Metadata Creation Tools Summary and review of a variety of metadata software tools: Check back periodically – these review sites are updated on an ongoing basis!
Metadata Validation Tools CNS - “Chew-n-Spit” Pre-parser to prepare for using “MP” (probably not needed if using tool such as ArcCatalog ) MP - “Metadata Parser” Validation that metadata are FGDC compliant; critical if posting metadata to an online clearinghouse
CNS: “Chew-n-Spit” Pre-parser Metadata formatting tool Pre-parser for formal metadata to convert records that cannot be parsed by MP into records that can be parsed by MP Uses ASCII text files from metadata creation tool Not all ASCII text files require CNS prior to using MP Metadata Tool CNS review to MP
MP: Metadata Parser Metadata quality control and output configuration tool Compiler to parse formal metadata Checks the syntax against the FGDC standard (including the Biological Data Profile) Checks structure and values of elements Generates output suitable for viewing with a web browser or text editor (can create.html,.sgml,.diff, and.xml files for serving on the clearinghouse or the Internet) Reads ASCII text, XML, or SGML files
MP: Correcting Errors All errors must be fixed in the original metadata record (and then re-run through MP) Errors are labeled with numbers - how can you find out where they are? Open a DOS window and edit the file c:\metanbii\cns_out.txt (the edit feature contains line numbers) OR View CNS output “cns_out.txt” in and use “Find” feature of WordPad to locate error
Metadata Process Summary Final Output MP Correct Errors Output to MP Metadata Tool Clearinghouse
Metadata software – a closer look
ArcCatalog (ArcGIS 8.x or 9.x)
ArcCatalog: Metadata Options
ArcCatalog (ArcGIS 8.x or 9.x) Advanced tools and customizations (such as specifying which parts of the metadata record to automatically update) can be downloaded from the ESRI ArcObjects website: Select “Samples”, “Metadata”, “Tools”, see “Advanced Synchronization” or “Set Synchronized Properties”
ArcCatalog: Advanced Synchronization
ArcCatalog: Set Synchronized Properties
ArcCatalog with NPS NBII Extension
ArcView 3.x Extension: Metadata Collector v2.0
Metavist
Metavist 2005 Information Author:David Rugg, USFS Vendor:USDA Forest Service, North Central Research Station Platform needed:Microsoft Windows 2000 or XP, with the Microsoft.Net Framework version 1.1. Metadata stored as:XML files Software license:NOT required Software updates:Software new, not known Cost:FREE Request a copy:
Metavist 2005 Software Features Creates FGDC compliant metadata, including the Biological Data Profile for the FGDC standard. Covers all FGDC elements plus Biological Data Profile elements (Taxonomy, Methodology, and Analytical Tools). Geospatial metadata elements NOT automatically collected from GIS data layers. Metadata stored / output in XML format. Import existing metadata file (must be XML file with proper formatting). Template – create a partial record with template components, import to start a new metadata record.
Spatial Metadata Management System (SMMS)
TKME “Another editor for formal metadata”
Metadata Clearinghouses: Sharing and Discovery
Metadata clearinghouses A metadata clearinghouse is a location — typically accessed through the Internet — to search for spatial data sets Clearinghouses make metadata records easy to find
Clearinghouse examples FGDC Geospatial Data Clearinghouse Montana State Library Natural Resource Information System GIS Nebraska Geospatial Data Clearinghouse Wisconsin Land Information Clearinghouse
National Geospatial Data Clearinghouse (NGDC) Has the people and infrastructure to help you find out who has what geographic information. Set of information services that use hardware, software, and telecommunications networks to provide searchable access to information. Includes federal, state, university, and vendor participants in the United States and abroad. User can search all or part of the community in a single session
NBII Clearinghouse