Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emerging Trends in Provenance Deborah L. McGuinness Tetherless World Constellation Chair Rensselaer Polytechnic Institute SWPM Workshop at ISWC November.

Similar presentations


Presentation on theme: "Emerging Trends in Provenance Deborah L. McGuinness Tetherless World Constellation Chair Rensselaer Polytechnic Institute SWPM Workshop at ISWC November."— Presentation transcript:

1 Emerging Trends in Provenance Deborah L. McGuinness Tetherless World Constellation Chair Rensselaer Polytechnic Institute SWPM Workshop at ISWC November 7, 2010 Shanghai, China

2 Outline –Some historical explanation & provenance settings –Selected current provenance settings Virtual Observatory Open Data –Discussion topics

3 Selected Background Bell Labs: designing description logics & environments aimed at supporting applications such as configuration. –led to research on making DL-based systems useful – with focus on explanation Stanford: focus on ontology-enabled xx, large hybrid systems, later x informatics –led to ontology evolution and diagnostic environments, renewed explanation, now from a broader perspective expanding beyond FOL and adding emphasis on provenance

4 Background cont. Rensselaer Polytechnic Institute/ TWC: next generation web, web science research center, open data, next generation semantic eScience –Led to more connections with social platforms, empowering collections (of users, data, etc.)

5 Explanation via Graph Explanation via Customized Summary Explanation via Annotation Inference Web (IW) End Users End-User Interact ion services Distributed PML data Data Access & Data Analysis Services Validate PML data Access published PML data Inference Web is a semantic web-based knowledge provenance management infrastructure: Uses a provenance interlingua (PML) for encoding and interchange of provenance metadata in distributed environments Provides interactive explanation services for end-users Provides data access and analysis services for enriching the value of knowledge provenance It has been used in a wide range of applications

6 Proof/Provenance Markup Language (PML) A kind of linked data on the Web Modularized & extensible –Provenance: annotate provenance properties –Justification: encodes provenance relations (including support for multiple justifications) –Trust: add trust annotation Semantic Web based Enterprise Web World Wide Web DD PML data PML data D D D PML data PML data … PML data D D PML data PML data D

7 7 Making Systems Actionable using Knowledge Provenance Mobile Wine Agent GILA Combining Proofs in TPTP CALO 7 Knowledge Provenance in Virtual Observatories 7 Intelligence Analyst Tools NOW including Data-gov

8 User Require Provenance! Users demand it! If users (humans and agents) are to use, reuse, and integrate system answers, they must trust them. Intelligence analysts: (from DTO/IARPA’s NIMD) Andrew. Cowell, Deborah McGuinness, Carrie Varley, and David A. Thurman. Knowledge-Worker Requirements for Next Generation Query Answering and Explanation Systems. Proc. of Intelligent User Interfaces for Intelligence Analysis Workshop, Intl Conf. on Intelligent User Interfaces (IUI 2006), Sydney, Australia. Intelligent Assistant Users: (from DARPA’s PAL/CALO) Alyssa Glass, Deborah L. McGuinness, Paulo Pinheiro da Silva, and Michael Wolverton. Trustable Task Processing Systems. In Roth-Berghofer, T., and Richter, M.M., editors, KI Journal, Special Issue on Explanation, Kunstliche Intelligenz, 2008. Virtual Observatory Users: (from NSF’s VSTO) Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07). Vancouver, British Columbia, Canada. And… as systems become more diverse, distributed, embedded, and depend on more varied data and communities, more provenance and more types are needed.

9 Two Application Scenarios: 1.Interdisciplinary next generation virtual observatories 2.Open Linked Data

10 10 CHIP Pipeline ( Chromospheric Helium Image Photometer ) Mauna Loa Solar Observatory (MLSO) Hawaii National Center for Atmospheric Research (NCAR) Data Center. Boulder, CO Intensity Images (GIF) Velocity Images (GIF) Follow-up Processing on Raw Data (e.g., Flat Field Calibration) Quality Checking (Images Graded: GOOD, BAD, UGLY) Raw Image Data Captured by CHIP Chromospheric Helium-I Image Photometer Raw Data Capture Publishes 10

11 11 Semantic Provenance Capture for Data Ingest Systemcs (SPCDIS) Fact: Scientific data services are increasing in usage and scope, and with these increases comes growing need for access to provenance information. Provenance Project Goal: to design a reusable, interoperable provenance infrastructure. Science Project Goal: design and implement an extensible provenance solution that is deployed at the science data ingest/ product generation time. Outcome: implemented provenance solution in one science setting AND operational specification for other scientific data applications. Extends vsto.org

12 ACOS Data Ingest Typical science data processing pipelines Distributed Some metadata in silos Much metadata lost Many human-in-loop decisions, events No metadata infrastructure for any user Community is broadening Chromospheric Helium Imaging Photometer (CHIP) Data Ingest ACOS – Advanced Coronal Observing System 12

13 The Advanced Coronal Observing System case for Provenance ?? ? SourceProcessingProduct Provenance metadata currently not propagated with or linked to the data products Processing metadata Origin (observation) metadata Data products are the result of “black box” systems Most users do not know what calibrations, transformations, and QA processing have been applied to the data product 13

14 Advanced Coronal Observing System (ACOS) Provenance Use Cases What were the cloud cover and seeing conditions during the observation period of this image? What calibrations have been applied to this image? Why does this image look bad? 14

15 PML Usage in SPCDIS Justification –Explanation –Causality graph Provenance –Conclusion –Source –Engine –Rule Trust –Trust/Belief metrics NodeSet Justification Conclusion NodeSet Justification Conclusion NodeSet Justification Conclusion Engine Rule hasAntecedentList hasSourceUsage hasInferenceRule hasInferenceEngine SourceUsage Source DateTime 15

16 20080602 Fox VSTO et al.16

17 17 Tools

18 PML in Action This is the PML provenance encoding for a “quick look” gif file, which is generated from two image data datasets Node set for the quickloook gif file hasConclusion: a reference to the gif file itself InferenceStep : how the gif file was derived hasAntecedents hasInferenceRule hasInferenceEngine The “antecedents” of the quicklook gif file are other node sets

19 A PML-Enhanced Image provenance CHIP Quick-Look CHIP PML-Enhance Quick-Look

20 Integrated View Observer log’s information added into quicklook image’s provenance

21 Provenance aware faceted search Tetherless World Constellation21

22 Current Issues Successful interdisciplinary VO; needed provenance Successful provenance integration for experts; needs to support more diverse audience –As the user base diversifies, what updates are needed? –Will a domain ontology for MLSO/NCAR-affiliated staff be understandable by citizen scientists?... No –How can our representational infrastructure be extended with contextual information relevant to user needs? E.g., linking data products from one part of the CHIP pipeline to specific solar events or events at MLSO (such as reports of bad weather) –Should provenance ontologies provide extensional capabilities to include domain-informed extensions – yes –[1] Stephan Zednik, Peter Fox and Deborah L. McGuinness, “System Transparency, or How I Learned to Worry about Meaning and Love Provenance!” Proceedings of IPAW 2010 –[2] James R. Michaelis, Li Ding, Zhenning Shangguan, Stephan Zednik, Rui Huang, Paulo Pinheiro da Silva, Nicholas Del Rio and Deborah L. McGuinness, “Towards Usable and Interoperable Workflow Provenance: Empirical Case Studies Using PML” Proceedings of SWPM 2009 –[3] AGU 2010 with papers with Fox, et al, McGuinness et al., Zednick et al,, West. et. al, Michaelis et al, … 22

23 User Annotations (James Michaelis) Allowing users to annotate provenance elements is a potential solution Allow a user community to make replies to questions from individuals E.g., citizen scientists can get information extensions through help of project staff Additionally, allow user community to assert information on provenance elements Vision: to incrementally aggregate information attached to provenance traces, through these annotations. 23

24 User Annotations Allowing users to annotate provenance elements is a potential solution Allow a user community to make replies to questions from individuals E.g., citizen scientists can get information extensions through help of project staff Additionally, allow user community to assert information on provenance elements Vision: to incrementally aggregate information attached to provenance traces, through these annotations. 24

25 User Annotations Can expand information attached to provenance records in two ways: Clarification: Providing an answer to a question about a provenance element (such as an expanded definition of its purpose). Context Extension: Provide supplemental information outside the scope of a provenance record, which may aid in provenance understanding. 25

26 User Annotations Types of annotations Assertion: A user directly asserts a clarification or context extension Clarification Request: A user makes a request for a clarification on a provenance element. Context Extension Request: A user makes a request for a context extension. Reply: A user replies to a clarification request or context extension request. Discussions may feature participants with different backgrounds. At a high level, such users can be distinguished by Roles (e.g., Staff, Citizen Scientist) 26

27 Use Case 1A 27 Flatten: Apply flat field calibration to an image, using averaged bias and flat files for the corresponding processing day. Server Response Request Processing Details for Intensity Image 20101007. 232213.chp.hsh.gif Server Response Definition for function Flatten Alice Web Service Intensity Image: 20101007. 232213.chp.hsh.gif ACTIVITY IDPERFORMED BY FUNCTION ID:1Flatten ID:2CenterImage Type: Clarification Request Topic: Flatten (Function Definition) Text: Could someone provide a definition of “Flat Field Calibration”? Annotation Submission

28 Use Case 1B 28 Server Response Annotation Submission Request Details for Annotation: Annotation_1 Type: Clarification Request Topic: Flatten Text: Could someone provide a definition of “Flat Field Calibration”? Type: Reply Reply To: Annotation_1 Clarification On: Flatten Author: Bob Role: Staff Reply: A definition of Flat Field Calibration is given at the provided link. Link: http://www.phys.vt.edu/~jhs/SIP/processing.html Web Service Bob

29 Annotation Structure – Use Cases 1A, 1B 29 Annotation_1 Topic Has Text Could someone provide a definition of “Flat Field Calibration”? Has Author Alice Annotation_2 Bob Has Author Clarification For Reply To A definition of Flat Field Calibration is given at the provided link. Flatten Type Reply Type Has Text Has Link http://www.phys.vt.edu/~jhs/SIP/processing.html Clarification Request Staff Role

30 Use Case 2 30 For each listed image i = {0 … n} Annotation Submission Type: Assertion Author: Bob Topic: (all applicable images viewed) Text: CME Event observed in referenced images. Initial Server Response List of Intensity Images For 2010-08-01 – 2010-08-04 Request Visualization of listed image i Server Response Bob inspects each image to see if it has visual evidence of Coronal Mass Ejection related activity Web Service Bob Visualization of image I ID: image_i

31 Related Work & Status myExperiment[1] –Social networking site for exchanging workflow-centric materials –Support primarily for annotation on workflow-scripts, as opposed to provenance-based information Tupelo[2] –Semantic Content Repository, designed to facilitate provenance storage/querying –Uses Open Provenance Model (OPM) –User annotations/discussions supported for URI-based content, but no specific focus on aggregating content directly on provenance elements Status – draft PMLA module. Implementation and evaluation with SPCDIS 31 [1] http://tupeloproject.ncsa.uiuc.edu/ [2] http://www.myexperiment.org/

32 Example Population Science Issues (with NIH) Do policies (taxation, smoking bans, etc) impact health and health care costs? What data should we display to help scientists and lay people evaluate related questions? What data might be presented so that people choose to make (positive) behavior changes? What does the following data show? What are appropriate follow ups?

33 PopSciGrid (Alpha)

34 PopSciGrid

35 PopSciGrid II

36 PopSciGrid III

37 Drill Down Questions Should we focus on prevalence? What is prevalence (definition)? How is it measured (overall / in this data set)? Conditions under which the data was obtained (date, sample set, extenuating conditions, …) Do we need more data, more inference, more xxx…

38 Our Position System Transparency supports user understanding and trust Our Research Goal: Provide interoperable infrastructure that supports explanations of sources, assumptions, and answers as an enabler for trust

39 Mashup Provenance from data-gov Critical for making demos useful, understandable, and actionable Dataset Demo Agency

40 Provenance Events CSV2RDF SemDiff Archive Enhance visualize derive create derive revision

41 Sample Application Domain (with Xian Li) Study of Supreme Court Justices needs data from different sources Judicial Databases e.g. SCDB (Spaeth 1999 ) Newspaper Comments e.g. The New York Times Biographical Directories e.g. Who's Who in America Public opinions (Tate and Handberg. 1991 ) Court cases, votes (Segal, and Spaeth. 1993 ; Schubert, 1965 ; Pritchett, 1948 ; Rohde, D. and Spaeth, 1976 ; ) Personal attributes: education, nominator, … (Segal. and Spaeth, 1993, 2002 )

42 Sample Use Case (with Li and Lebo) Surprise Application reports that Robert H. Jackson was nominated by a Green Party President There hasn't been a Green Party President

43 Use Case Green Party President? o User believes that the System is Incorrect o Look for provenance of information to identify whether it is the source that is incorrect or the application interpreted the source incorrectly.

44 Provenance Encoding ns:subject http://dbpedia.org/resources/ Robert_H._Jackson ns:subject http://dbpedia.org/resources/ Robert_H._Jackson ns:query_template http://dbpedia.org/sparql?query=select... %JUSTICE%... ns:query_template http://dbpedia.org/sparql?query=select... %JUSTICE%... pmlj:InferenceStep ns:query_uri ns:query_result ns:output_format ns:service_uri … pmlj:isConsequenceOf pmlj:InferenceStep pmlj:isConsequenceOf “Green” “DBpedia” Query Creation Query Execution Attribution located! Distrust event

45 Challenges for Data Aggregators (with Tim Lebo, Greg Williams) 45

46 Challenges for Data Aggregators 46

47 Assumptions and Objectives Most data are from third-party sources Data are updated regularly and irregularly Complete interpretation is not immediately possible Subsequent interpretations should be backward-compatible Distinguishing among sources Minimizing manual modifications Tracing to source data Attributing data authors and curators 47

48 Approach 48 Capturing conversion provenance, exposed as linked data: 1 – Following redirects 2 – Retrieving data file 3 – Unzipping 4 – Manual tweaks 5 – Converter invocation 6 – Predicate lineage 7 – Tracing triple to table cell 8 – Populating endpoint Parameterized interpretation parameters

49 Future Directions Presenting provenance information in LOGD dataset description pages Extending visualization APIs to incorporate provenance within interface Leveraging provenance connectivity to investigate latent associations among datasets and presentations 49 US-UK Foreign Aid Comparison Queried as RDF Providing direct link to original data

50 Discussion Provenance is growing in acceptance, need, and type Some interlinguas have emerged that have significant usage and have shown significant value Interdisciplinary eScience and open data are increasing the need and potentially pace. A few trends we have observed: –Domain-specific extensions can be of value –Techniques for supporting interaction with large diverse communities are needed (we believe user annotation is one such critical technique) –Data aggregators face additional challenges if provenance is not available… and may accelerate the demand for provenance and provenance standards –Getting back to the portion of the source used is critical for some –Tracking manipulations is critical for some –Providing and creating provenance as part of a larger eco-system is key Open (govt, science, etc) data (along with semantic web applications with embedded information about knowledge provenance and term meaning) is providing many new opportunities and will continue to change our lives. Questions? dlm cs rpi edu 50


Download ppt "Emerging Trends in Provenance Deborah L. McGuinness Tetherless World Constellation Chair Rensselaer Polytechnic Institute SWPM Workshop at ISWC November."

Similar presentations


Ads by Google