SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.

Slides:



Advertisements
Similar presentations
Earth System Curator Spanning the Gap Between Models and Datasets.
Advertisements

1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
SDM center All-hands breakout session notes March 2002 Gatlinburg TN.
Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
James Martin CpE 691, Spring 2010 February 11, 2010.
NCSU-1V1/26-Mar-021 Context-sensitive Service Composition for Support of Scientific Workflows Mladen A. Vouk North Carolina State University, Raleigh,
Summary of SDM ETC Kickoff for the Data Integration Task Terence Critchlow Calton Pu Ling Liu David Buttler Bertram Ludaescher Amarnath Gupta Mladen Vouk.
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
DataFoundry: An Approach to Scientific Data Integration Terence Critchlow Ron Musick Ida Lozares Center for Applied Scientific Computing Tom SlezakKrzystof.
Bronis R. de Supinski Center for Applied Scientific Computing Lawrence Livermore National Laboratory June 2, 2005 The Most Needed Feature(s) for OpenMP.
Making the Most of What We Know: Towards Effective Use of Genomics Data Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
WebRatio BPM: a Tool for Design and Deployment of Business Processes on the Web Stefano Butti, Marco Brambilla, Piero Fraternali Web Models Srl, Italy.
1 Dan Quinlan, Markus Schordan, Qing Yi Center for Applied Scientific Computing Lawrence Livermore National Laboratory Semantic-Driven Parallelization.
Automatic Data Ramon Lawrence University of Manitoba
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
The Software Product Life Cycle. Views of the Software Product Life Cycle  Management  Software engineering  Engineering design  Architectural design.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
New Task Group CRIS Architecture & Development Maximilian Stempfhuber RWTH Aachen University Library
January, 23, 2006 Ilkay Altintas
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Odyssey A Reuse Environment based on Domain Models Prepared By: Mahmud Gabareen Eliad Cohen.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
BIEN Confederated DB (S) Analytical DB(s) Heterogeneous source database(s) of Plots/Specimens/Occurrences Synonymy Names Reference taxonomy *** *** Feedback.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
Software automation – What STAB sees as key aims? 1.Brief review of activities and recommendations (so far) 2.Reality checks 3. Things to do…
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Architectural Mismatch: Why reuse is so hard? Garlan, Allen, Ockerbloom; 1994.
1 February 1-7, Cancun, Mexico ACHI’09 A structured approach to support 3D User Interface Development Juan Manuel Gonzalez-Calleros, Jean Vanderdonckt.
UCSD Neuron-Centered Database
Lawrence Livermore National Laboratory
Web Ontology Language for Service (OWL-S)
Interlib Technology Integration
Grid Based Data Integration with Automatic Wrapper Generation
TargetDB and PEPCDB •
Business Process Management and Semantic Technologies
Supporting High-Performance Data Processing on Flat-Files
Presentation transcript:

SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta Mladen Vouk NCSU Tom Potok ORNL Matt Coleman LLNL September 2002 UCRL-PRES-???????

SDM center Outline l Motivation l System architecture l Status

SDM center Different users end up doing the same thing. Motivated by current state of the art in genomics data access. Source Specific Schema The user is required to perform all data management tasks. dbEST SCoP SWISS-PROT User applications Transform Map data format similar concepts ParseAccess input/the data output PDB

SDM center What is a realistic environment? A single location that provides effective access to of data and tools from many sources through an intuitive and useful interface. Transform Map data format similar concepts Parse Access input/ the data output : User applications

SDM center Motivating use case: Identifying model sequences Matt MILLAFSSGRRLDFVHRSGVF FFQTLLWILCATVCGTEQYFN Hundreds of sequences Clusfavor Gene name / accession # Genbank Sequence Blast against HTGS Model builder Homologs Filter Subseq to 2000bp Accession # Transfac Sequence Model sequence

SDM center SDM Center Data Integration Infrastructure Program Data Source DB Interface User (Matt) Data Sources

SDM center SDM Center Data Integration Infrastructure Data Source User (Matt) Workflow Agent Service registry and brokering Data Integration Agent(s) Communication Protocol Gateway Program DB Interface Data Sources

SDM center SDM Center Data Integration Infrastructure Data Source User (Matt) Workflow Agent Service registry and brokering Data Integration Agent(s) Other Agents (e.g., VIPAR) Database Access Communication Protocol Gateway Program Interfacing Other I/O Agents Program DB Interface Data Sources

SDM center SDM Center Data Integration Infrastructure XML Wrapper Data Source User (Matt) Workflow Agent Service registry and brokering Data Integration Agent(s) Wrapper based Agent Other Agents (e.g., VIPAR) Database Access Communication Protocol Gateway Program Interfacing Other I/O Agents Program DB Interface Data Sources Extraction Rules Human Knowledge GUI Code Generator

SDM center SDM Center Data Integration Infrastructure XML Wrapper Data Source User (Matt) Workflow Agent Service registry and brokering Data Integration Agent(s) Data Mediation Wrapper based Agent Other Agents (e.g., VIPAR) Database Access Communication Protocol Gateway Program Interfacing Other I/O Agents Extraction Rules Human Knowledge GUI Code Generator Program DB Interface Data Sources Executable Workflow Plan: “Matt’s WF”

SDM center SDM Center Data Integration Infrastructure User (Matt) Workflow Agent Service registry and brokering Data Integration Agent(s) Data Mediation Wrapper based Agent Other Agents (e.g., VIPAR) Database Access Communication Protocol Gateway XML Wrapper Data Source Executable Workflow Plan: “Matt’s WF” Program Interfacing Other I/O Agents Parameterized Workflow Specification (PWS) Source Capabilities (SC) Binding Patterns User Agent User constraints & parameters Workflow Resolution Service (WRS) Domain Map/Ontology Workflow Instantiation Service (WIS) WF feasible WF infeasible: report reason Data RegistrationServices Registration DB Program DB Interface Data Sources Extraction Rules Human Knowledge GUI Code Generator

SDM center Status l Focus has been on developing a prototype of Matt’s workflow  Demonstrate basic infrastructure functionality  Provide a useful tool for Matt to use in his research efforts l Flushed out the details of architecture  Interconnections between components better defined l We have a prototype of that system in place  Wrappers generated from XWrap by GT  Combined into coherent workflow by SDSC  Workflow based interface completed by NCSU l The following presentations will go into more details about what has been accomplished and what our current tasks are

SDM center Questions?

SDM center People LLNL l Terence Critchlow (lead) Georgia Tech l Calton Pu l Ling Liu l David Buttler l Dan Rocco l Henrique Paques l Wei Han Target Users l Matt Coleman (LLNL)  Allen Christian (LLNL)  Phil Bourne (PDB) SDSC l Bertram Ludaescher l Amarnath Gupta l Ilkay Altintas Agent Technology l Tom Potok (ORNL) l Joel Reid (ORNL) l Mladen Vouk (NCSU) l Munindar Singh (NCSU) l Sandeep Chandra (NCSU) l Zhengang Cheng (NCSU) l Sangeeta Bhagwanani (NCSU)

SDM center This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W ENG-48.