Capturing and Supporting Contexts for Scientific Data Sharing via the Biological Sciences Collaboratory George Chin Jr. and Carina S. Lansing (PNNL) Appeared.

Slides:



Advertisements
Similar presentations
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Advertisements

Presentation by Priyanka Sawarkar
Using the Collaborative Tools in NEESgrid Charles Severance University of Michigan.
HP Quality Center Overview.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Administration & Workflow
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Modification, Reuse and Subversion: Digital Object Collections and the Humanities Dr. Robin Boast MAA: Museum of Archaeology & Anthropology.
GOES Users’ Conference III May 10-13, 2004 Broomfield, CO Prepared by Integrated Work Strategies, LLC GOES USERS’ CONFERENCE III: Discussion Highlights.
Understanding Metamodels. Outline Understanding metamodels Applying reference models Fundamental metamodel for describing software components Content.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Libraries and Institutional Content Management Systems
Editing Description Logic Ontologies with the Protege OWL Plugin.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
GMD German National Research Center for Information Technology Innovation through Research Jörg M. Haake Applying Collaborative Open Hypermedia.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
ViciDocs for BPO Companies Creating Info repositories from documents.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
1 CSE 2102 CSE 2102 CSE 2102: Introduction to Software Engineering Ch9: Software Engineering Tools and Environments.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
Using the SAS® Information Delivery Portal
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Natural Resource Program Center Data Manager’s Conference Data Store and NatureBib April 3, 2008 Brent Frakes.
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
© 2008 IBM Corporation ® IBM Cognos Business Viewpoint Miguel Garcia - Solutions Architect.
XML Registries Source: Java TM API for XML Registries Specification.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Informational Objects TypeExamples 1. Structured Items Vouchers, Travel Orders, Invoices, Purchase Orders 2. Semi-Structured Items Letters, Memoranda,
Fisheries Oceanography Collaboration Software Donald Denbo NOAA/PMEL-UW/JISAO Presented by Nancy Soreide NOAA/PMEL AMS 2002/IIPS 10.3.
Crystal25 Hunter Valley, Australia, 11 April 2007 Crystal25 Hunter Valley, Australia, 11 April 2007 JAINIS (JCU and Indiana Instrument Services): A Grid.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
LRI Université Paris-Sud ORSAY Nicolas Spyratos Philippe Rigaux.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
-KHUSHBOO BAGHADIYA.  Introduction  System Description  iCAT in use  Evolution of the system  Evolution of modeling  Evolution of features  Evolution.
Scientific Annotation Middleware (SAM) Jim Myers, Elena Mendoza PNNL Al Geist, Jens Schwidder ORNL.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Application Ontology Manager for Hydra IST Ján Hreňo Martin Sarnovský Peter Kostelník TU Košice.
Adapting the Electronic Laboratory Notebook for the Semantic Era Tara Talbott, Michael Peterson, Jens Schwidder, James D. Myers 2005 International Symposium.
12 Copyright © 2009, Oracle. All rights reserved. Managing Backups, Development Changes, and Security.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Object storage and object interoperability
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
The Integrated Spectral Analysis Workbench (ISAW) DANSE Kickoff Meeting, Aug. 15, 2006, D. Mikkelson, T. Worlton, Julian Tao.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
International Planetary Data Alliance Registry Project Update September 16, 2011.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Using E-Business Suite Attachments
Chapter 2 Database System Concepts and Architecture
Exploitation of ISS Scientific data - sustainability
MANAGING DATA RESOURCES
Knowledge Based Workflow Building Architecture
Presentation transcript:

Capturing and Supporting Contexts for Scientific Data Sharing via the Biological Sciences Collaboratory George Chin Jr. and Carina S. Lansing (PNNL) Appeared in ACM CSCW 2004: Computer Supported Cooperative Work (Conference) Slides by Paulo Shakarian CMSC828R

Outline Motivation Pilot Experiment Basic data sharing External database access Metadata Data organization Data provenance Collaborative analysis Task management Implementation Related work Comparison with SIBDATA

Motivation Early approaches focused on tool-centric approaches to scientific collaboration A panel around the time of the publication concluded that “Collaboration is driven both by the need to share data and to share knowledge about the data”

Pilot Experiment Authors presented biologists with a Web-based collaboratory prototype that allowed users to place and retrieve data files into a common repository. The prototype was analogous to a distributed file system with a graphical user interface Biologists provided feedback Lessons learned from pilot on next three slides. Biological Sciences Collaboratory (BSC) developed to support the lessons learned.

Lessons Learned from Pilot (1/3) 1.General data set properties – Basic data set properties such as owner, creation date, size, format, etc. 2.Experimental properties – Conditions and properties of the scientific experiment that generated or is to be applied to the data 3.Data provenance – Relationship of data to previous versions and other data sources 4.Integration – Relationship of data subsets within a full data set

Lessons Learned from Pilot (2/3) 5.Analysis and interpretation – Notes, experiences, interpretations, and knowledge generated from analysis of data 6.Physical organization – Mapping of data sets to physical storage structure such as a file system, database, or some other data repository 7.Project organization – Mapping of data sets to project hierarchy or organization

Lessons Learned from Pilot (3/3) 8.Scientific organization – Mapping of data sets to some scientific classification, hierarchy, or organization 9.Task – Research task(s) that generated or applies data set 10.Experimental process – Relationship of data and tasks to overall experimental process 11.User community – Application of data sets to different organizations of users

Basic Data Sharing with BSC User interface through web- based portal Supports a variety of formats – including various instrument data, spreadsheets, images, and publications. Supports standard formats, schemas, and ontologies in biological science – Micro-gene expression data society (MGED) – Ensures interoperability with MGED-compliant archives

Basic Data Sharing with BSC BSC provides data-translation tools – BSC maintains a repository of such translation tools, including user-defined tools – BSC can also identify translation paths between known formats, and semi-automatically apply them Biologists can delineate projects in BSC using the tabbed interface.

External Database Access BSC has the capability to access external databases – GeoBank, TIGR, KEGG, PubMed, etc. – Provides standard database access tools – When accessed, data query is executed and data transferred from databases to local copy in BSC – Biologists can treat result of query as Either an isolated version Or maintain links back to DB – Can have updates to data be done via notification or automatically – Service subscription capabilities – securely place and retried data to/from BSC

Metadata Meta-data associated with a dataset (generally constant – see figure on the right) Meta-data associated with particular attributes (changes from experiment to experiment) No mention of standardization of metadata (i.e. DublinCore)

Data Organization BSC allows collaborative access and manipulation of shared data – regardless of where the data sets reside (flat files, database, etc.) Provides active links to data sources Viewer used to partition data based on different data-sets, sub-theories, or tasks assigned to team members (see example, next slide)

Additional data- viewing tools – File system view just one type of view (top) – Biologists may need other views Based on divisions of overall project (middle) Based on scientific organization (bottom) – i.e. by taxonomy of organism under study.

Data Provenance As more experiments are run over a data set – historical version management becomes an issue Data provenance tool depicts a tree for historical lineage of a data set Allows comparisons of different versions and branches of the tree

Collaborative Analysis Collaborative analysis is a process of brainstorming where researches share their individual interpretations, understanding, and insights which build upon one another to form cogent findings and conclusions. Facilitated in BSC by allowing electronic notes attached to data. – Verbal – Textual – Markings on drawings/figures via different overlays

Collaborative Analysis Also supported via a free- form electronic notebook BSC also supports Collaborative Analysis by allowing researchers to share analysis and tools. – Analysis results can be stored just like any other dataset Also supports integration with teleconferencing packages

Task Management BSC allows biologists to define and track experimental tasks. PM’s may query task list in different ways

Task Management BSC also provides workflow- management capabilities Captures, manages, and supplies standard paths for analysis Synchronized with task-list

Task Management Workflow tool allows biologists to work with and link combinations of analysis and visualization tools in useful and novel ways – i.e. repetitively applying tools in a particular analysis or experiment Execution history viewer allows biologists to highlight and re- instantiate particular paths of past workflow executions Various authorization levels used to provide scientists cross-project access Publication of data to larger scientific community also supported – Automatic notification of updates – General bulletin board service

Implementation Based on Collaborator for Multi-scale Chemical Sciences (CMCS) Written in Java, using Apache Jetspeed Collaboration tools through Univ. Michigan’s CHEF For content management, uses Scientific Annotation Middleware (SAM) (based on open source implementation of WebDAV protocol called Jakarta Slide) Testbed deployed to a ground of biologists at PNNL and external biologists from the Shewanella Federation One result of testbed: biologists need an organizing context when working with shared data sets – i.e. biologists need to see and understand relationships among datasets before they can be effectively shared – Supported in BSC through free-form text

Related PNNL Publication Abstract The Collaboratory for Multi-scale Chemical Science (CMCS) is developing a powerful informatics-based approach to synthesizing multi-scale information in support of systems-based research and is applying it within combustion science. An open source multi-scale informatics toolkit is being developed that addresses a number of issues core to the emerging concept of knowledge grids including provenance tracking and lightweight federation of data and application resources into cross-scale information flows. The CMCS portal is currently in use by a number of high-profile pilot groups and is playing a significant role in enabling their efforts to improve and extend community maintained chemical reference information.

Comparison with SIBDATA Results pilot experiment provides interesting insights into scientific collaboration Workspace, and differing options for links to external sources was something earlier discussed for SIBDATA Data Provenance may be something worth looking into for SIBDATA Workflow management and repetitive application of tools may also be useful for SIBDATA

Questions