SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.

Slides:



Advertisements
Similar presentations
CICC June meeting IUPUI team: Kelsey Forsythe Malika Mahoui Deepthi Jonnala Usha Cheemakurthi.
Advertisements

Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.
1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
SDM center All-hands breakout session notes March 2002 Gatlinburg TN.
AHM 2002 Tutorial on Scientific Data Mediation Example 1.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
NCSU-1V1/26-Mar-021 Context-sensitive Service Composition for Support of Scientific Workflows Mladen A. Vouk North Carolina State University, Raleigh,
Summary of SDM ETC Kickoff for the Data Integration Task Terence Critchlow Calton Pu Ling Liu David Buttler Bertram Ludaescher Amarnath Gupta Mladen Vouk.
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
DataFoundry: An Approach to Scientific Data Integration Terence Critchlow Ron Musick Ida Lozares Center for Applied Scientific Computing Tom SlezakKrzystof.
A FRAMEWORK BASED ON WEB SERVICES ORCHESTRATION FOR BIOINFORMATICS WORKFLOW MANAGEMENT Laboratory for Bioinformatics (LBI), Institute of Computing (IC)
Bronis R. de Supinski Center for Applied Scientific Computing Lawrence Livermore National Laboratory June 2, 2005 The Most Needed Feature(s) for OpenMP.
Making the Most of What We Know: Towards Effective Use of Genomics Data Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
The Integrated Molecular Analysis of Genomes and their Expression Consortium’s Data Mining Tools: Introducing the IQ Peg Folta Lawrence Livermore National.
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
1 Dan Quinlan, Markus Schordan, Qing Yi Center for Applied Scientific Computing Lawrence Livermore National Laboratory Semantic-Driven Parallelization.
SOLUTION: Source page understanding – Table interpretation Table recognition Table pattern generalization Pattern adjustment Information extraction & semantic.
Automatic Data Ramon Lawrence University of Manitoba
The information integration wizard (Iwiz) project Report on work in progress Joachim Hammer Presented by Muhammed Al-Muhammed.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
Bioinformatics.
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
AIXM Users’ Conference, March Implementing AIXM in Instrument Flight Procedures Automation Presenter: Iain Hammond MacDonald, Dettwiler &
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
K Phone: Web: A Software Package for the Design and Analysis of Microbial Functional.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
Supporting Scientific Collaboration Online SCOPE Workshop at San Diego Supercomputer Center March 19-22, 2008.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
David Chiu and Gagan Agrawal Department of Computer Science and Engineering The Ohio State University 1 Supporting Workflows through Data-driven Service.
Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.
GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
UCSD Neuron-Centered Database
Lawrence Livermore National Laboratory
Heat Simulations with COMSOL
Grid Based Data Integration with Automatic Wrapper Generation
Supporting High-Performance Data Processing on Flat-Files
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory March 2002

SDM center Outline l Motivation l Approach l Specific use cases l Introduction to others

SDM center Different users end up doing the same thing. Motivated by current state of the art in genomics data access. Source Specific Schema The user is required to perform all data management tasks. dbEST SCoP SWISS-PROT User applications Transform Map data format similar concepts ParseAccess input/the data output PDB

SDM center What is the ideal environment? A single location that provides effective access to a consistent view of data and tools from many sources through an intuitive and useful interface. Transform Map data format similar concepts Parse Access input/ the data output : User applications

SDM center What is the ideal environment? A single location that provides effective access to a consistent view of data and tools from many sources through an intuitive and useful interface. Transform Map data format similar concepts Parse Access input/ the data output : User applications a realistic

SDM center SDM Center Data Integration Infrastructure Matt GUIGUI Query Dispatch and Collection (QDaC) : Medline XPath Wrapper XPath Wrapper VIPAR Wrapper Externa l Tools Metadata Registry XWrap DF PDB XPath Wrapper XPath Wrapper XPath Wrapper Model- Based Mediator Semantic Wrapper Semantic Wrapper Semantic Wrapper

SDM center The more sources queried, the more valuable the results :::::: Unfortunately, Matt cannot query all of the relevant data sources. Use case 1: Find everything related to a sequence Matt MILLAFSSGRRLDFVHRSGVF FFQTLLWILCATVCGTEQYFN Blast :::::: Provide access to many more sources than Matt currently has

SDM center Use case 1: Find everything related to a sequence :::::: Matt Additional Desired Capabilities Handle hundreds of sequences Search using other tools Preprocess sequence(s) Use results as input to other tools and queries Blast

SDM center Use case 2: Identifying model sequences Matt MILLAFSSGRRLDFVHRSGVF FFQTLLWILCATVCGTEQYFN Hundreds of sequences Clusfavor Gene name / accession # Genbank Sequence Blast against HTGS Model builder Homologs Filter Subseq to 2000bp Accession # Transfac Sequence Model sequence

SDM center Summary l Matt’s current research objectives focus on Use Case 2  That is our current target l Details of current status in following talks  Context-sensitive Service Composition for Support of Scientific Workflows  Mladen A. Vouk  XWRAPComposer: A wrapper generation system for Integrating Bioinformatics Data Sources  Ling Liu  Constructing Workflows by Integrating Interactive Information Sources  Amarnath Gupta

SDM center Questions?

SDM center People LLNL l Terence Critchlow (lead) Georgia Tech l Calton Pu l Ling Liu l David Buttler l Dan Rocco l Henrique Paques l Wei Han SDSC l Bertram Ludaescher l Amarnath Gupta l Ilkay Altintas Agent Technology l Tom Potok (ORNL) l Mladen Vouk (NCSU) Target Users l Matt Coleman (LLNL)  Allen Christian (LLNL)  Phil Bourne (PDB)

SDM center This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W ENG-48.