Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008.

Slides:

Advertisements

Similar presentations

GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.

Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)

Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.

International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Paul Smart, Ali.

Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK

Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.

An introduction to collections and collection-level description Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London,

Kien A. Hua Division of Computer Science University of Central Florida.

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

Reinventing the Electronic Medical Record (EMR) Position We are making a national investment of billions of dollars to scale up the use of EMRs to improve.

IRIS Services Initiative Improving Data Access and Integration for the GeoSciences Linus Kamb, Joanna Muench, Tim Ahern IRIS Data Management Center.

How do you squeeze all of a research project into the repository? Michael Wood Institutional Repository Manager ARROW Community Day, Melbourne 27 th September.

Mining the web to improve semantic-based multimedia search and digital libraries

MIKE2.0 Methodology Presentation to Wiki Wednesday community, London 6 June 2007

GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.

Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.

Australian Society of Archivists Victorian Branch Seminar Accessibility over time – the retention, use and re-use of information in the.

Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.

Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.

Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.

1 Digital Libraries and Evidence in the Developing World Context Dr. Jon Ferguson Senior Health Database Scientist IMMPACT Project University of Aberdeen.

Libraries and Institutional Content Management Systems

Workflow Validation Kerstin Kleese van Dam Michela Taufer.

V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.

January, 23, 2006 Ilkay Altintas

4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.

Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.

WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.

JISC CETIS Conference, Oxford, November 2004 Repositories: State of ELF “volunteer”: Martin Morrey Intrallect Ltd.

Towards a Provenance Architecture Karen Schuchardt PNNL.

Exploring the Applicability of Scientific Data Management Tools and Techniques on the Records Management Requirements for the National Archives and Records.

Surveillance, Events and the Semantic Web From E-Gov to Connected Governance: the Role of Cloud Computing, Web 2.0 and Web 3.0 Semantic Technologies Washington,

Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.

A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.

Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.

NanoHUB.org and HUBzero™ Platform for Reproducible Computational Experiments Michael McLennan Director and Chief Architect, Hub Technology Group and George.

Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.

11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.

1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.

Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.

Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.

PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.

Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.

The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.

DSpace vs Fedora Ralph LeVan OCLC Research. What Do You Want From a Repository? How do you create your metadata? How do you assemble your objects? How.

Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.

Transforming video & photo collections into valuable resources John Waugaman President - Tygart Technology, Inc.

Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”

National Library of Finland Strategic, Systematic and Holistic Approach in Digitisation Cultural unity and diversity of the Baltic Sea Region – common.

26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.

Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.

Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.

Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.

Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.

Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.

Chapter 1 Overview of Databases and Transaction Processing.

Context-driven Access to Personalized Digital Multimedia Libraries Invited Talk at the 1st International Conference on Digital Libraries New Dehli, India.

Digital Video Library - Jacky Ma.

Information Day on “Search Engines for Audio-Visual Content”

Joseph JaJa, Mike Smorul, and Sangchul Song

Ahmet Fatih Mustacoglu

Geospatial and Problem Specific Semantics Danielle Forsyth, CEO and Co-Founder Thetus Corporation 20 June, 2006.

Peggy van der Kreeft Deutsche Welle

Brian Matthews STFC EOSCpilot Brian Matthews STFC

Anatomy of a modern data-driven content product

Presentation transcript:

Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Motivation Identify provenance models and architectures that will support a variety of real world scientific research Promote collaboration and interoperability Review requirements identified by the community Identify new requirements from our own use case studies that span a number of domains Methods

Use case studies Encountered two types of workflow Automated (eg. Pipelines)‏ User-Driven, research oriented (eg. Digital Libraries, Data Lineage)‏

Use case type comparison

Sensor Analysis SOA based runtime intrusion detection system to prevent attacks on sensitive systems. Large scale data streaming (~30TB per day)‏ Too much provenance, system would be quickly overwhelmed, record only significant events

Subsurface Modelling Understand how contaminants react and move through environments by simulating experiments that would not be feasible otherwise Research often follows many branches of investigation with complex relationships between simulations.

Archive, Data Mining Document data context and relationships to improve effectiveness of facility Use of data extraction and harvesting to capture provenance and meta-data Track relationships between experiments and computations Allows for better collaboration and understanding

Requirements Summary Record provenance about process, data, relationships Group items together for comparison Record arbitrary meta-data Standards-based search capability Examine process and data that led to result Identify the overall impact on a workflow due to changes in process/data

Influences on Architecture

Challenges Multiple language bindings Information overload Scalability Should scale to billions of triples Augmentation – user annotation Filtering User/Application specific views

Questions...