Worked example: Global Change Information System Peter Fox, and … others Xinformatics 4400/6400 Week 11, April 19, 2016
We are still not done..
Assignment 3 3
4
Reading – long ago 5
Epistemology Theory of knowledge – and to do this effectively you need to be concerned with: –Truth, belief, and justification –Means of production of knowledge –Skepticism about different knowledge claims Recall the data-information-knowledge ecosystem? Understanding what part this plays in your modeling and architecture can be critical 6
Classical view of knowledge 7
Toward verifiable science assessment reporting: Cognition and the Global Change Information System (GCIS)
Overview Global Change Research and the U.S. National Climate Assessment Global Change Information System –What is it and what does it contain? –Bias and traceable science assessments –What did we do and why? –Underlying methods and technologies –Examples of implementation and encodings Sneak peak of more verifiable science… 9
Coordinates Federal research to better understand and prepare the nation for global change Prioritizes and supports cutting edge scientific work in global change Assesses the state of scientific knowledge and the Nation’s readiness to respond to global change Communicates research findings to inform, educate, and engage the global community The Program: U.S. Global Change Research Program 10
11 Global Change Research Act (1990), Section 106 …not less frequently than every 4 years, the Council… shall prepare… an assessment which– integrates, evaluates, and interprets the findings of the Program and discusses the scientific uncertainties associated with such findings; analyzes the effects of global change on the natural environment, agriculture, energy production and use, land and water resources, transportation, human health and welfare, human social systems, and biological diversity; and analyzes current trends in global change, both human- induced and natural, and projects major trends for the subsequent 25 to 100 years.
National Climate Assessments Climate Change Impacts on the United States (2000) Global Climate Change Impacts in the United States (2009) 12 Climate Change Impacts in the United States (2014) See:
NCA
Outline for Third NCA Report Letter to the American People Executive Summary: Report Findings Introduction Our Changing Climate Sectors & Sectoral Cross-cuts Regions & Biogeographical Cross-cuts Responses –Decision support –Mitigation –Adaptation Agenda for Climate Change Science The NCA Long-term Process Appendices –Commonly Asked Questions –Expanded Climate Science In fo 14
Regions & Biogeographical Cross-Cuts Coasts, Development, and Ecosystems Oceans and Marine Resources
Sectors Water Resources Energy Supply and Use Transportation Agriculture Forestry Ecosystems and Biodiversity Human Health
Sectoral Cross-Cuts Water, Energy, and Land Use Urban Systems, Infrastructure, and Vulnerability Impacts of Climate Change on Tribal, Indigenous, and Native Lands and Resources Land Use and Land Cover Change Rural Communities Biogeochemical Cycles
18 globalchange.gov - v2.0
National Climate Assessment
Data and The National Climate Assessment The Challenge More than 250 named authors (>1000 contributing!) 827 pages 43 Chapters and Appendices 284 Figures More than 600 Images 3395 References Approximately 83 data sources used across as many as 235 instances 20
Recall: Data quality needs: fitness for use Measuring Climate Change: –Model validation: gridded contiguous data with uncertainties –Long-term time series: bias assessment is the must, especially sensor degradation, orbit and spatial sampling change Studying phenomena using multi-sensor data: –Cross-sensor bias is needed Realizing Societal Benefits through Applications: –Near-Real Time for transport/event monitoring - in some cases, coverage and timeliness might be more important that accuracy –Pollution monitoring (e.g., air quality exceedance levels) – accuracy Educational (users generally not well-versed in the intricacies of quality; just taking all the data as usable can impair educational lessons) – only the best products
Definitions – for a climate scientist Bias has two aspects: –Systematic error resulting in the distortion of measurement data caused by prejudice or faulty measurement technique –A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment: Psychological: for example, when data providers audit their own data, they usually have a bias to overstate its quality. Sampling: Sampling procedures that result in a sample that is not truly representative of the population sampled. (Larry English) 22
Global Change Information System (GCIS) Long Term Vision: The Global Change Information System (GCIS) is intended to eventually become a unified web based source of authoritative, accessible, usable and timely information about climate and global change for use by scientists, decision makers, and the public. 23
Long Term Vision: The Global Change Information System (GCIS) is intended to eventually become a unified web based source of authoritative, accessible, usable and timely information about climate and global change for use by scientists, decision makers, and the public. Initial Prototype: Coincident with the release of the Third National Climate Assessment (NCA) on May , the GCIS supports the distribution, presentation and documentation needs of the NCA, integrating that content into the USGCRP web site and demonstrating the potential for GCIS to support the long term vision. 24
Information Quality Act Reproducibility means that the information is capable of being substantially reproduced, subject to an acceptable degree of imprecision. For information judged to have more (less) important impacts, the degree of imprecision that is tolerated is reduced (increased). With respect to analytic results, "capable of being substantially reproduced'' means that independent analysis of the original or supporting data using identical methods would generate similar analytic results, subject to an acceptable degree of imprecision or error. Transparency is not defined in the OMB Guidelines, but the Supplementary Information to the OMB Guidelines indicates (p. 8456) that "transparency" is at the heart of the reproducibility standard. The Guidelines state that "The purpose of the reproducibility standard is to cultivate a consistent agency commitment to transparency about how analytic results are generated: the specific data used, the various assumptions employed, the specific analytic methods applied, and the statistical procedures employed. If sufficient transparency is achieved on each of these matters, then an analytic result should meet the reproducibility standard." In other words, transparency - and ultimately reproducibility - is a matter of showing how you got the results you got. 25
Complete Traceability for NCA Content Traceable Sources Traceable Data References Image sources Data sources Link to datasets Complete metadata Traceable Processes Description of methods Access to process info & review Traceable Tools Transparency Reproducibility Access to computer code Description of systems and platforms Easier. Harder
Traceable accounts = assembled provenance as evidence “… prepare a summary ‘traceable account’ (a few sentences to a paragraph) that describes the main factors that contributed to the conclusion and level of confidence” “In addition to providing a summary traceable account, use the appropriate term below in a parenthetical phrase following the finding to convey to readers the level of confidence associated with the finding”
Global Change Content Elements Reports, Figures, Images, Research Papers, Journals, Measurements, Datasets, Instruments, Agencies, Projects, People, Models, Algorithms, … Findings – “Climate is changing.” “Sea Level is Rising.” Concepts: “Impacts of Climate Change on Human Health” “Adaptation” 28
GCIS – choices and direction Create an entity from the structured metadata about each thing – tag with related concepts. Identify it with a persistent, controlled identifier. Present with a human readable web page and a machine interface. Represent all relationships between items. 29
Linked Open Data 30
31 Identifier Resolution doi: /MEASURES/GSSTF/DATA308 A common, persistent, citable reference to that dataset. We build GCIS specific identifiers from those: Then we can resolve it (with content negotiation) on our site, and link it with identifiers for our other resources, including asserting equivalence and linking with the data center responsible for stewardship and distribution of the actual data. We can also refer and link to other repositories of information about those resources.
32 Content Negotiation The server response from the URI depends on what you ask for: A traditional browser will ask for HTML, and receive and render a human readable description of the resource. Web services can request formal, structured XML or RDF metadata about the resource. The goal is to provide a curated collection of authoritative global change information, but always link back to the data center or publisher responsible for the long term stewardship of the resource.
33 GCIS Structured Data Server data.globalchange.gov
GCIS Structured Data Server Capture – Obtain from a variety of sources: manual input by trusted parties – support staff, agency partners, data centers; automated harvesting from publishers, agency data centers, etc. Identify – Assign persistent, resolvable, controlled identifiers to each element. Organize – Capture, discover and represent relationships between elements, including across various types of elements; across data centers; and across agency boundaries. Present – Provide machine accessible interfaces to retrieve structured metadata, and to search/data mine it. Maintain – Develop tools and processes to ensure quality and integrity of database contents over time. 34
The use case-driven iterative approach More details at: 35 Fox and McGuinness 2008
Ontology via use case 36 Title: Find data used to generate a report figure Actor and system: A reader of the National Climate Assessment Flow of interactions: A reader wishes to identify the source of the data used to produce a particular figure in the NCA. A reference to the paper in which the image contained in this figure was originally published appears in the figure caption. Clicking that reference displays a page of metadata information about the paper, including links to the datasets used in that paper. Pursuing each of those links presents a page of metadata information about the dataset, including a link back to the agency/data center web page describing the dataset in more detail and making the actual data available for order or download. The first use case
Ontology via use case 37 Title: Find data used to generate a report figure Actor and system: A reader of the National Climate Assessment Flow of interactions: A reader wishes to identify the source of the data used to produce a particular figure in the NCA. A reference to the paper in which the image contained in this figure was originally published appears in the figure caption. Clicking that reference displays a page of metadata information about the paper, including links to the datasets used in that paper. Pursuing each of those links presents a page of metadata information about the dataset, including a link back to the agency/data center web page describing the dataset in more detail and making the actual data available for order or download. The first use case
Ontology Development Criteria Following Gruber (1993) –Clarity –Coherence –Extendability –Minimal encoding bias –Minimal ontological commitment Following Fox and Lynnes (2007) –Contextual relevance –Maturity –Intention for use –Fitness for use 38
Evaluation Vocabulary Readiness Level - VRL 1 to 9 scale (a TRL*-like scale) Principles to assign a VRL –VRL little to no implementation in application or service –VRL demonstrated application –VRL widely available and used via application or service 39
Example 40 VSTO = Virtual Solar-Terrestrial Observatory -
An intuitive concept map of the 1st use case 41 Ma et al. 2014
Classes and properties recognized from the use case An intuitive concept map of the use case 42 Ma et al. 2014
Classes and properties recognized from the use case An intuitive concept map of the use case From an intuitive model to an ontology: (1)A defined class or property should be meaningful and robust enough to meet the requirements of various use cases (2)An ontology can be extended by adding classes and properties recognized from new use cases through the iterative approach 43 Ma et al. 2014
NCA links to GCIS entities 44
Dataset metadata from a figure 45
Dataset metadata from a image in a figure 46
Title: Identify roles of people in the generation of a chapter in the draft NCA3 Actor and system: a viewer of the GCIS website Flow of interactions: A viewer sees that Chapter 6 (Agriculture) in the draft NCA3 was written by a group of authors mentioned in a list. On the title page of that chapter the reader can view the role of each author, e.g., convening lead author, lead author or contributing author, in the generation of this report chapter. We decided to use the PROV-O ontology to describe this use case The second use case 47
The three Starting Point classes in PROV-O ontology and the properties that relate them Source: 48
Mapping the use case into PROV-O isA Writing of Chapter 6 in NCA3 Chapter 6 in NCA3 Author of Chapter 6 49
Roles of agents in an activity in PROV-O Source: 50
Mapping roles of chapter authors into PROV-O Writing of Chapter 6 in NCA3 isA Author of Chapter 6 isA Convening lead author Lead author Contributing author isA 51
Here only three of the eight authors of this chapter are shown. Each author had a specific role for this chapter. Roles of people in the activity ‘Writing of Chapter 6’ Ma et al. 2014
53 Certain types of extreme weather events have become more frequent and intense, including heat waves, floods, and droughts in some regions. The increased intensity of heat waves has been most prevalent in the western parts of the country, while the intensity of flooding events has been more prevalent over the eastern parts. Droughts in the Southwest and heat waves everywhere are projected to become more intense in the future. ATMOSPHERIC/OCEAN INDICATORS > EXTREME WEATHER EXTREME WEATHER > EXTREME PRECIPITATION PRECIPITATION > PRECIPITATION RATE EXTREME WEATHER > HEAT/COLD WAVE FREQUENCY/INTENSITY NATURAL HAZARDS > HEAT NATURAL HAZARDS > FLOODS, PRECIPITATION > PRECIPITATION AMOUNT PRECIPITATION >RAIN SURFACE WATER > FLOODS ATMOSPHERIC PHENOMENA > DROUGHT, EXTREME WEATHER > EXTREME DROUGHT, NATURAL HAZARDS > DROUGHTS GCMD v8.0 Sample finding: Global Change Keywords (GCMD)
(a) Classes and properties representing a brief structure of the NCA3 GCIS Ontolog y (version 1.2) Ma et al. 2014
55 (b) Classes and properties relevant to the findings of the NCA3 and each chapter in it Ma et al. 2014
Epistimology 56
(c) Classes and properties about sensors, instruments, platforms, and algorithms, etc. through which datasets are generated 57 Ma et al. 2014
(part of) GCIS Ontology 58 For more info, see
Traceable accounts… 59
Key Message vs. “General” Message
Re-using existing ontologies for the GCIS ontology By such mappings we can use reasoners that are suitable for the PROV-O ontology, and thus to retrieve provenance graphs from the established GCIS 61 Full GCIS Ontology documents are available at: imsap/GCISOntology Ma et al. 2014
Final output of the GCIS ontology Ontology documentation – IMSAP/2/GCISOntology_v_1_2.htmhttp://escience.rpi.edu/ontology/GCIS- IMSAP/2/GCISOntology_v_1_2.htm Concept map – 1G0CSWH-2YH4/GCIS_Ontology_v1_2.cmaphttp://cmapspublic3.ihmc.us/rid=1MCJMLST0- 1G0CSWH-2YH4/GCIS_Ontology_v1_2.cmap Ontology RDF serialized in Turtle format – IMSAP/2/GCISOntology_v_1_2.ttlhttp://escience.rpi.edu/ontology/GCIS- IMSAP/2/GCISOntology_v_1_2.ttl 62
GCIS Database/API 63 RESTful API at data.globalchange.govdata.globalchange.gov URLs correspond to ontology URIs Primary storage : RDBMS (PostgreSQL) Representation is serialized (for JSON) or used in templates (for Turtle) Turtle representation is exported into a triple store (Virtuoso) which provides a SPARQL endpoint.
Two Parallel Paths Traceable Sources Traceable Data References Image sources Data sources Link to datasets Complete metadata Traceable Processes Description of methods Access to process info & review Traceable Tools 1. Third National Climate Assessment (NCA3) Access to computer code Description of systems and platforms 2. GCIS
Two Parallel Paths Traceable Sources Traceable Data References Image sources Data sources Link to datasets Complete metadata Traceable Processes Description of methods Access to process info & review Traceable Tools 1. NCA3 release Access to computer code Description of systems and platforms 2. Populate GCIS
Interagency Information Integration GCIS can use relationships between all relevant information about global change across the agencies: o From observations to datasets to research papers to models to analyses to organizations to people to synthesized reports to human impacts... o Determine agency interdependencies -- An EPA analysis uses a NOAA model dependent on observations from a NASA satellite. o Can present unique interagency metrics "How many papers referenced datasets from a specific satellite?" o Direct users back to agency data centers for more detailed information and the actual content and data.
GCIS Data Mining Structured information with relationships allows integrated data mining, searching, metrics. o What projects provided data used to produce figures that were referenced in the 2014 NCA section about coastal sea level rise impacts? o Which data centers hold data referenced by papers related to forests in the midwest? o Which agencies have people working on projects related to societal impacts of extreme weather events? o Show me the latest papers about health impacts of air quality in California. Which datasets were used in the analysis of air quality in California?
iPython meets NCA NCA=National Climate Assessment Stace Beaulieu
Staff (some of many contributors) U.S. Global Change Research Program (USGCRP), National Coordination Office (NCO): Robert Wolfe 1, Curt Tilmes 1, Steve Aulenbach 2, Brian Duggan 2, Justin Goldstein 2, Amanda McQueen 2, Julie Morris 2, Glynis Lough 2 National Climate Assessment (NCA) Technical Support Unit (TSU): David Easterling 3, Paula Hennon 4, Angel Li 4, April Sides 6, Mark Phillips 5, Sarah Champion 4, Andrew Buddenberg 4, Devin Thomas 6 Habitat Seven (NCA Web Design and Development): Jamie Herring, Phil Evans, Aires Almeida, Graham Blair Rensselaer Polytechnic Institute (RPI) Tetherless World Constellation (TWC) (Semantic Web Information Modeling): Peter Fox, Xiaogang Ma, Patrick West, Stephan Zednik, Jin Zheng Forum One (globalchange.gov Web Design, Development and Integration): Michael Rader, John Schneider, Keenan Holloway, Sarah LeNguyen 1.NASA 2.University Corporation for Atmospheric Research 3.NOAA/NCDC 4.The Cooperative Institute for Climate and Satellites (CICS), North Carolina State University 5.National Environmental Modeling and Analysis Center (NEMAC), UNC Asheville 6.ERT, Inc. 69
See also Ma, X., Fox, P., Tilmes, C., Jacobs, K., Waple, A., Capturing and presenting provenance of global change information. Nature Climate Change. 4, 409–413. doi: /nclimate2141 Tilmes, C., Fox, P., Ma, X., McGuinness, D., Privette, A.P., Smith, A., Waple, A., Zednik, S., Zheng, J., Provenance representation for the National Climate Assessment in the Global Change Information System. IEEE Transactions on Geoscience and Remote Sensing, 51 (11), Xiaogang Ma, Jin Guang Zheng, Justin C. Goldstein, Stephan Zednik, Linyun Fu, Brian Duggan, Steven M. Aulenbach, Patrick West, Curt Tilmes, Peter Fox 2014, Ontology engineering in provenance enablement for the National Climate Assessment, Environmental Modelling and Software, 16, doi: /j.envsoft –Open access (until October 17, 2014): 70