Presentation is loading. Please wait.

Presentation is loading. Please wait.

Foundations VI: Provenance

Similar presentations


Presentation on theme: "Foundations VI: Provenance"— Presentation transcript:

1 Foundations VI: Provenance
Deborah McGuinness TA Weijing Chen Semantic eScience Week 9, October 31, 2011

2 References PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular Explanation Interlingua. AAAI 2007 Workshop on Explanation-aware Computing, Vancouver, Can., 7/07. Stanford Tech report KSL Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the Semantic Web: The Inference Web Approach. Web Semantics: Science, Services and Agents on the World Wide Web Special issue: International Semantic Web Conference Edited by K.Sycara and J.Mylopoulis. Volume 1, Issue 4. Journal published Fall, McGuinness, D.L.; Zeng, H.; Pinheiro da Silva, P.; Ding, L.; Narayanan, D.; Bhaowal, M. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. The Workshop on the Models of Trust for the Web (MTW'06), Edinburgh, Scotland, May 22, More from Note the LOGD converter generates PML 2

3 Semantic Web Methodology and Technology Development Process
Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Adopt Technology Approach Leverage Technology Infrastructure Science/Expert Review & Iteration Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Develop model/ ontology Small Team, mixed skills

4 Ingest/pipelines: problem definition
Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision We often fail to capture, represent and propagate manually generated information that need to go with the data flows Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects The task of event determination and feature classification is onerous and we don't do it until after we get the data

5 Fox VSTO et al.

6 Use cases Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, :09UT taken by the ACOS Mark IV polarimeter? What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO? Find all good images on March 21, 2008. Why are the quick look images from March 21, 2008, 1900UT missing? Why does this image look bad?

7 Fox VSTO et al.

8 Fox VSTO et al.

9 Provenance Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility Knowledge provenance; enrich with ontologies and ontology-aware tools

10 Semantic Technology Foundations
PML – Proof Markup Language – used for knowledge provenance interlingua Inference Web Toolkit – used to manipulate and access knowledge provenance OWL-DL ontologies (including SWEET and VSTO ontologies) PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular Explanation Interlingua. AAAI 2007 Workshop on Explanation-aware Computing, Vancouver, Can., 7/07. Stanford Tech report KSL Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the Semantic Web: The Inference Web Approach. Web Semantics: Science, Services and Agents on the World Wide Web Special issue: International Semantic Web Conference Edited by K.Sycara and J.Mylopoulis. Volume 1, Issue 4. Journal published Fall, 2004

11 Inference Web Explanation Architecture
WWW Toolkit IWTrust Trust computation OWL-S/BPEL SDS Trace of web service discovery Proof Markup Language (PML) End-user friendly visualization IW Explainer/ Abstractor Learners * Learning Conclusions Expert friendly Visualization Trust JTP/CWM KIF/N3 IWBrowser Theorem prover/Rules Justification search engine based publishing SPARK-L SPARK IWSearch Trace of task execution Provenance provenance registration Text Analytics IWBase UIMA Trace of information extraction Semantic Web based infrastructure PML is an explanation interlingua Represent knowledge provenance (who, where, when…) Represent justifications and workflow traces across system boundaries Inference Web provides a toolkit for data management and visualization

12 Global View and More Views of Explanation filtered focused global
(in PML) abstraction discourse trust provenance Explanation as a graph Customizable browser options Proof style Sentence format Lens magnitude Lens width More information Provenance metadata Source PML Proof statistics Variable bindings Link to tabulator Explanation as a graph Customizable browser options proof style , sentence format, lens magnitude, lens width More information, provenance metadata, source PML, proof statistics

13 Provenance View Source metadata: name, description, …
Views of Explanation Source metadata: name, description, … Source-Usage metadata: which fragment of a source has been used when filtered focused global Explanation (in PML) abstraction discourse trust provenance

14 Trust View (preliminary) simple trust representation
Views of Explanation filtered focused global Trust Tab Explanation (in PML) abstraction Detailed trust explanation discourse trust provenance (preliminary) simple trust representation Provides colored (mouseable) view based on trust values Enables sharing and collaborative computation and propagation of trust values Fragment colored by trust value

15 Discourse View (Limited) natural language interface
Views of Explanation (Limited) natural language interface Mixed initiative dialogue Exemplified in CALO domain Explains task execution component powered by learned and human generated procedures filtered focused global Explanation (in PML) abstraction discourse trust provenance

16 Selected IW and PML Applications
Portable proofs across reasoners: JTP (with temporal and context reasoners (Stanford); CWM (W3C), SNARK(SRI), … Explaining web service composition and discovery (SNRC) Explaining information extraction (more emphasis on provenance – KANI, UIMA) Explaining intelligence analysts’ tools (NIMD/KANI) Explaining tasks processing (SPARK / CALO) Explaining learned procedures (TAILOR, LAPDOG, / CALO) Explaining privacy policy law validation (TAMI) Explaining decision making and machine learning (GILA) Explaining trust in social collaborative networks (TrustTab) Registered knowledge provenance: IW Registrar (Explainable Knowledge Aggregation) Explaining natural science provenance – VSTO, SPCDIS, …

17 PML1 vs. PML2 PML1 was introduced in 2002 PML2 improves PML1 by
It has been used in multiple contexts ranging from explaining theorem provers to text analytics to machine learning. It was specified as a single ontology PML2 improves PML1 by Adopting a modular design: splitting the original ontology into three pieces: provenance, justification, and trust This improves reusability, particularly for applications that only need certain explanation aspects, such as provenance or trust. Enhancing explanation vocabulary and structure Adding new concepts, e.g. information Refining explanation structure

18 PML Provenance Ontology
Scope: annotating provenance metadata Highlights Information Source Hierarchy Source Usage

19 Referencing, Encoding and Annotating a Piece of Information
Referencing a piece of information using URI Encoding the content of information Complete Quote: <hasRawString>(type TonysSpecialty SHELLFISH) </hasRawString> Obtained from URL: <hasURL> Annotations For human consumption: <hasPrettyString>Tonys’ Specialty is ShellFish</hasPrettyString> For machine consumption Language: <hasLanguage rdf:resource=" /> Format: <hasFormat " />

20 Source Hierarchy Source is the container of information
Our source hierarchy offers Many well-known sources such as Sensor (e.g. geo-science) InferenceEngine (e.g. reasoner) WebService (e.g. workflow) Finer granularity of source than just document DocumentFragment (for text analytics)

21 Source Usage Source Usage
logs the action that accesses a source at a certain dateTime to retrieve information is part of PML1 Example: Source #ST was accessed on certain date <pmlp:SourceUsage rdf:about="#usage1"> <pmlp:hasUsageDateTime> T10:30:00Z</pmlp:hasUsageDateTime> <pmlp:hasSource rdf:resource="#ST"/> </pmlp:SourceUsage>

22 PML Justification Ontology
Scope: annotating justification process Highlights Template for question-answer/justification Four types of justification

23 Four Types of Justification
Goal conclusion without justification Assumption conclusion assumed (using Assumption Rule) asserted by an InferenceEngine, no antecedent Direct Assertion conclusion directly asserted (using DirectAssertion rule) by an InferenceEngine, no antecedent Regular conclusion derived from antecedent conclusions

24 PML Trust Ontology Scope: annotate trust and belief assertions
Highlights Extensible trust representation (user may plug in their quantitative metrics using OWL class inheritance feature) Has been used to provide a trust tab filter for wikipedia – see McGuinness, Zeng, Pinheiro da Silva, Ding, Narayanan, and Bhaowal. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. WWW2006 Workshop on the Models of Trust for the Web (MTW'06), Edinburgh, Scotland, May 22, 2006.

25

26 Fox VSTO et al.

27 Quick look browse Fox VSTO et al.

28

29 Visual browse

30

31

32 Search and structured query

33 Search Fox VSTO et al.

34 Provenance within SemantAqua

35 SemantAqua System Architecture
Virtuoso We are using regulation data from 4 states: MASS, CA, RI, NY and 1 regulation data from EPA (total 5) Preprocessing regulation data: identify correct limit for each contaminant(some of data contain English words, not just number), write adhoc code to convert them into the format that our converter is able to process. Some links to regulation data: page 100 (RI) (EPA) (MA) Data range: the echo data range: 10/31/ /30/2010 the usgs date range: to access

36 Provenance Preserves provenance in the Proof Markup Language (PML).
Data Source Level Provenance: The captured provenance data are used to support provenance-based queries. Reasoning level provenance: When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data, the converted data, and regulatory data. the user can select the data organizations he/she trusts and the portal will use only data from the selected organizations.

37 Visualization Presents analyzed results with Google Map
Presents explanation of water source pollution Presents possible health effect of contaminant Presents “Facet” type filter to select type of data Presents link to the authority, where user can report problems. Report Problem: Depends on the state, problem report link will be changing to show correct link that user show report problem to in that state (currectly encoded 4 states:NY, RI, MASS, CA. + EPA) Currently, we only model health effect for water sites, and haven’t implement health effect based reasoning. Note: these are new features and were discussed in as “future work” in the paper. tml

38 Visualization Presents analyzed results with Google Map
Presents explanation of water source pollution Presents possible health effect of contaminant Presents “Facet” type filter to select type of data Presents link to the authority, where user can report problems. Report Problem: Depends on the state, problem report link will be changing to show correct link that user show report problem to in that state (currectly encoded 4 states:NY, RI, MASS, CA. + EPA) Currently, we only model health effect for water sites, and haven’t implement health effect based reasoning. Note: these are new features and were discussed in as “future work” in the paper. tml

39 Visualization Presents analyzed results with Google Map
Presents explanation of water source pollution Presents possible health effect of contaminant Presents “Facet” type filter to select type of data Presents link to the authority, where user can report problems. Report Problem: Depends on the state, problem report link will be changing to show correct link that user show report problem to in that state (currectly encoded 4 states:NY, RI, MASS, CA. + EPA) Currently, we only model health effect for water sites, and haven’t implement health effect based reasoning. Note: these are new features and were discussed in as “future work” in the paper. tml

40 Visualization Presents analyzed results with Google Map
Presents explanation of water source pollution Presents possible health effect of contaminant Presents “Facet” type filter to select type of data Presents link to the authority, where user can report problems. Report Problem: Depends on the state, problem report link will be changing to show correct link that user show report problem to in that state (currectly encoded 4 states:NY, RI, MASS, CA. + EPA) Currently, we only model health effect for water sites, and haven’t implement health effect based reasoning. Note: these are new features and were discussed in as “future work” in the paper. tml

41 Visualization Presents analyzed results with Google Map
Presents explanation of water source pollution Presents possible health effect of contaminant Presents “Facet” type filter to select type of data Presents link to the authority, where user can report problems. Report Problem: Depends on the state, problem report link will be changing to show correct link that user show report problem to in that state (currectly encoded 4 states:NY, RI, MASS, CA. + EPA) Currently, we only model health effect for water sites, and haven’t implement health effect based reasoning. Note: these are new features and were discussed in as “future work” in the paper. tml

42 Visualization Time series Visualization:
Presents data in time series visualization for user to explore and analyze the data Violation, measured value: Plese make sure all parameters are selected as shown in this image: facility permit, characteristic, test type, and the click “click”. This link explains what are test type(get it from “?” next to test type from the image): EPA conducts up to 5 test types for each characteristic: C1, C2, C3, Q1 and Q2. Three test types (C1, C2, C3) use concentration-based limits, while the other two (Q1, Q2) use quantity- or mass-based limits. More information can be found at the data dictionary of ECHO. Violation, measured value: 971 Limit value: 400

43 Selected Results Provenance information encoded using semantic web technology supports transparency and trust. SemantAqua provides detailed provenance information: Original data, intermediate data, data source “What if” Scenario: User can apply a stricter regulation from another state to a local water source. User may be interested only in certain sources and can use the interface to control queries SWQP responses may not be trusted by some users if there is no mechanism that provides the option to examine how the responses are obtained.

44 Aim at providing at least as much provenance as SemantAqua
Questions?


Download ppt "Foundations VI: Provenance"

Similar presentations


Ads by Google