iGAP: Integrative Grid-enabled Genome Annotation Pipeline

Slides:



Advertisements
Similar presentations
-Grids and the OptIPuter Software Architecture Andrew A. Chien Director, Center for Networked Systems SAIC Chair Professor, Computer Science and Engineering.
Advertisements

Biosciences Working Group Update Wilfred W. Li, Ph.D., UCSD, USA Habibah Wahab, Ph.D., USM, Malaysia Daejeon, Korea, March 24, 2009.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
UNIVERSITY OF CALIFORNIA, SAN DIEGO SAN DIEGO SUPERCOMPUTER CENTER UC Grid Summit -- April 1, 2009 UC San Diego Campus Grid Update Shava Smallen San Diego.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Presented by ORNL–University Partnerships in Computational Biology Igor B. Jouline Joint Institute for Computational Sciences The University of Tennessee–Oak.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
NPACI Alpha Project Review: Cellular Microphysiology on the Data Grid Fran Berman, UCSD Tom Bartol, Salk Institute.
Creating Smart Clients with the Collaboration Notebook Greg Quinn Principal Investigator Desktop and Mobile Data Management San Diego Supercomputer Center.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
Current challenges and opportunities in Biogrids Dr. Craig A. Stewart Director, Research and Academic Computing, University Information.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
January, 23, 2006 Ilkay Altintas
Developing Reusable Software Infrastructure – Middleware – for Multiscale Modeling Wilfred W. Li, Ph.D. National Biomedical Computation Resource Center.
Ch10. Intermolecular Interactions and Biological Pathways
Pharm 202 Computer Aided Drug Design Phil Bourne -> Courses -> Pharm 202 Several slides are taken from UC Berkley.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Protein Bioinformatics Course
Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
The Encyclopedia of Life (EOL) Project An initiative to analyze and provide annotation for putative protein sequences from all publicly available genome.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Investigators: Chaitan Baru, Randy Keller, Dogan Seber, Krishna Sinha, Ramon Arrowsmith, Boyan Brodaric, Karl Flessa, Eric Frost, Ann Gates, Mark Gahegan,
1 Cyberinfrastructure Summer Institute for Geoscientists July 18-22, 2005 San Diego Supercomputer Center.
(The Encyclopedia of Life (EOL)) medicine researcheducation The Annotation and Cataloging of Proteins, Life's Building Blocks for… The Open Notebook.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
PRAGMA 10 Biosciences Working Group Update Habibah Wahab, Ph.D Wilfred W. Li, Ph.D. On behalf of Karpjoo Jeong, Ph.D.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3.
1 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL Automated High-Quality Sequence Annotation Steffen Möller, Ulf Leser, Wolfgang Fleischmann,
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Pacific Rim Application and Grid Middleware Assembly: PRAGMA A community building collaborations and advancing grid-based applications Peter Arzberger,
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Introduction to The Storage Resource.
Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira,
The Queen’s University of Belfast The Queen’s University of Belfast GeneGrid and GridSphere Noel Kelly.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
National Institute of Advanced Industrial Science and Technology Gfarm Grid File System for Distributed and Parallel Data Computing Osamu Tatebe
The Storage Resource Broker and.
Mapping of Scientific Workflow within the e-Protein project to Distributed Resources London e-Science Centre Department of Computing, Imperial College.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
“Genomics: The CAMERA Project" Invited Talk 5 th Annual ON*VECTOR International Photonics Workshop UCSD February 28, 2006 Dr. Larry Smarr Director,
SAN DIEGO SUPERCOMPUTER CENTER, UCSD NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Introduction to SDSC Fran Berman Director, SDSC and.
GEON IT Solutions: Products and Demos Chaitan Baru San Diego Supercomputer Center.
1 1 High Throughput Proteomics and the Encyclopedia of Life Mark A. Miller, Ph.D. Integrative BioScience Program San Diego Supercomputer Center.
“ Building an Information Infrastructure to Support Microbial Metagenomic Sciences " Presentation to the NBCR Research Advisory Committee UCSD La Jolla,
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Functional and structural genomics using PEDANT
Bacteriophage Gene Functions
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Overview of the Encyclopedia of Life (EOL) Project
Sequence based searches:
Encyclopedia of Life as a Target VGrADS Application
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Bioinformatics Data and the Grid: The GeneGrid Data Manager
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Prediction of Protein Structure and Function on a Proteomic Scale
Protein Bioinformatics Course
Protein Structures.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
TF candidate selection pipeline.
Presentation transcript:

iGAP: Integrative Grid-enabled Genome Annotation Pipeline Wilfred Li, Ph.D. Integrative Biosciences Program San Diego Supercomputer Center University of California, San Diego http://eol.sdsc.edu

Encyclopedia Of Life Project High quality functional and 3-D structure assignment using iGAP Grid-enabled bioinformatics applications Optimization Dedicated and grid resources Integrative biological data warehouse Web services consumer Distributed database and data mining Advanced query environment Open Notebook Web services provider

iGAP Workflow Reassemble proteome, Data replication PAT-NR Proteome 1000+ Genomes Proteome Specific Benchmarking iGAP: Prestaging Execution Monitoring Only unique sequences are processed DBMS iGAP WMS

Protein sequences Prediction of : NR, PFAM signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG) Structural assignment of domains by PSI-BLAST profiles on FOLDLIB Structural assignment of domains by 123D on FOLDLIB Structural assignment of domains by WU-BLAST Store assigned regions in the DB Functional assignment by PFAM, NR assignments SCOP, PDB FOLDLIB NR, PFAM Building FOLDLIB: PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP 90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30) Domain location prediction by sequence structure info sequence info Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Workflow Management System for iGAP Grid Resources Work Stations Blue Horizon WMS SRB SDSC Others BII Japst Anywhere

EOL and APST The AppLeS Parameter Sweep Template (APST) provides EOL with transparent access to Grid resources and smart scheduling via Grid middleware. EOL Software/Data Grid Resources A P S T Globus GRAM/GASS SSH/SCP Application Description SRB/SFTP PBS/Loadleveler/Condor Grid Metadata Globus MDS/NWS/Ganglia

Acknowledgement SDSC Ceres Inc. BII, Singapore Fran Berman SDSC Director Philip E. Bourne IBS Director Mark Miller Project Coordinator Ilya N. Shindyalov CE Greg Quinn Web service Coleman Mosley Vicente Reyes Robert Byrnes Kim Baldrige iCC Director Jerry Greenberg CE portal Philip Papadoplous Rocks Mason Katz Greg Bruno SDSC Chaitan Baru David Archbell Jerry Rowley UCSD Peter Arzberger PRAGMA Henri Casanova Jim Hayes Ceres Inc. Nickolai Alexandrov 123D Richard Flavell BII, Singapore Larry Ang Kishore Sakharkar Arun Krishnan Atif Shahab Other BII members Everyone else Additional partner institutions