Bioinformatics Data and the Grid: The GeneGrid Data Manager

Slides:



Advertisements
Similar presentations
X-SIGMA (An XML based Simple data Integration system for Gathering, Managing and Accessing scientific experimental data in grid environments) Karpjoo
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
European Bioinformatic Institute.
Enabling Grids for E-sciencE INFSO-RI GPSA grid portal for Bioinformatics, EGEE3 Athens, 20/04/ GPSA - Grid Protein Sequence Analysis on the.
On line (DNA and amino acid) Sequence Information Lecture 7.
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
11 Decembre 2000V. Breton Milan WP6 DataGRID meeting Biological applications in testbed 0 Evaluate GRID added value for handling biological data –What.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
ORACLE Lecture 1: Oracle 11g Introduction & Installation.
CIT 381 What are databases? What are (R)DBMSs? How do we use/access databases? WWW and databases (client server) Who works with databases? History of databases.
The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
Lecture Microsoft Access and Relational Database Basics.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
Getting Started (Excerpts) Chapter One DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
Fundamentals, Design, and Implementation, 9/e Chapter 1 Introduction to Database Processing.
12ex.1. 12ex.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science.
How to use the web for bioinformatics Ethan Strauss X 1171
Baikal plankton data-analytic system Kosareva Natalia Irkutsk State University, Irkutsk Bolshie Koty, Baikal, Russia August 21-31, 2007 REG-6 GIS and bioinformatics.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
The Queen’s University of Belfast The Queen’s University of Belfast GeneGrid : Using OgsaDai in Bioinformatics Noel Kelly Belfast.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Resource Monitoring & Service Discovery in GeneGrid Sachin Wasnik Belfast e-Science Centre.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Interactive task invocation in the Virtual Laboratory M. Okoń, M. Lawenda, T. Rajtar, D. Stokłosa, D. Kaliszan, P. Mierzyński, N. Meyer, M. Stroiński 4.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
ATLAS Grid Requirements A First Draft Rich Baker Brookhaven National Laboratory.
XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
A Collaborative Research Environment for Avian Flu Research Luo Ze Computer Network Information Center, CAS
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
The Queen’s University of Belfast The Queen’s University of Belfast GeneGrid and GridSphere Noel Kelly.
ICM – API Server Gary Ratcliffe. 2 Agenda Webinar Programme API Server Overview JSON-RPC iCM API Service API Server and Forms New services under.
ECMM6018 Enterprise Networking For Electronic Commerce Tutorial 6 CGI/Perl and databases.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Introducing Bioperl Toward the Bioinformatics Perl programmer's nirvana.
Web Portal Access to Bioinformatics Resources on the NGS. Jonathan Churchill STFC eScience – RAL / NGS High Performance Computing Services Group.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Everything you've ever wanted to know about using Control-M to integrate any application workload September 9, 2016 David Fernandez Senior Presales Consultant.
Internet/Web Databases
GPIR GridPort Information Repository
EMBL-EBI, programmatically - take a REST from manual searching: Sequence analysis tools Web Production Team Anna Foix Joon Lee.
CS1222 Using Relational Databases and SQL
Database.
Genes to Trees Daniel Ayres and Adam Bazinet
Grid Based Data Integration with Automatic Wrapper Generation
CS1222 Using Relational Databases and SQL
CS1222 Using Relational Databases and SQL
Multiple sequence alignment & Phylogenetics Analysis
Chapter 1 Introduction to Database Processing
CS1222 Using Relational Databases and SQL
Supporting High-Performance Data Processing on Flat-Files

CS1222 Using Relational Databases and SQL
EMS/Trauma Registry Information System Project
Presentation transcript:

Bioinformatics Data and the Grid: The GeneGrid Data Manager Noel Kelly

GeneGrid Architecture Workflow Definition GDM Service GeneGrid Environment GeneGrid Portal GeneGrid Workflow Status GeneGrid Application Management Registry GeneGrid Workflow Manager Service GeneGrid Data Manager Registry GDM Service GeneGrid Process Manager Service GeneGrid Input &Results Parameters GDM Service BeSC GAM Service GAM Service GDM Service iGAP GAM Service GDM Service Blast EMBL DB TMHMM mpiBlast SwissProt DB SignalP EBI SDSC SwissProt Database EMBL Database

GeneGrid Data Manager Objectives Integrate specialised public biological data into the Grid Integrate proprietary data into the Grid Access and Storage of User Input Parameters Experiment Tracking Access and Storage of Experiment Results

GeneGrid Databases GeneGrid Workflow Definition Database Xindice 1.0 Collection GeneGrid Workflow Status Database GeneGrid Results & Input Parameter Database File System

Biological Databases Structured File EMBL Bank SwissProt TrEMBL TrEMBL_new GenBank DDJB ENSEMBL Fusion Proprietary Amtec Proprietary Structured File MySQL Oracle T.B.C.

Public Biological Data Integration GeneGrid Data Manager Service Using BioPERL modules JDBC Driver PERL Scripts SwissProt

Public Biological Data Integration GeneGrid Data Manager Service BeSC Perl Script Record SwissProt EBI

Fusion Antibodies Commercial Use Case Fasta File BlastP MQNSHSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRY………… Blast Format Blast Format SwissProt Query Blast Formatter Accession Numbers Multiple Fasta Records TMHMM Multiple TMHMM Format Multiple TMHMM Format SignalP Eliminator Fasta Records Multiple SignalP Format Multiple SignalP Format Bl2Seq Eliminator Fasta Records

Fusion Use Case – GDM Perspective BlastP SwissProt Query Blast Formatter TMHMM SignalP Eliminator Bl2Seq Eliminator

Multiple Accession Numbers Querying SwissProt Accession Numbers Task Params Fasta Record GeneGrid Data Manager Service (for SwissProt) GeneGrid Data Manager Service (for GRIP) Multiple Accession Numbers SwissProt GRIP

Fusion Use Case – GeneGrid Perspective BlastP SwissProt Query Blast Formatter TMHMM SignalP Eliminator Bl2Seq Eliminator

Executing Bioinformatics Applications Result File Task Params GeneGrid Application Manager Service (for SignalP) GeneGrid Data Manager Service (for GRIP) Multiple Fasta Records GRIP Input File

GeneGrid Landmarks 1 year through a 2 year project Successfully integrated a number of bioinformatics applications Successfully integrated a number of bioinformatics data sets Number of papers accepted at various conferences (Computing & Bioinformatics) International collaboration with EOL project (SDSC)

GeneGrid at All Hands A practical Workflow Implementation for a Grid Based Virtual Bioinformatics Laboratory Session 4.4, Thur 2nd Sep, 14:10 -15:50 Bioinformatics Application Integration and Management in GeneGrid: Experiments and Experiences Session 6.4, Fri 3rd Sep, 11:05 – 13:10

GeneGrid Demonstrations Tuesday, 1st September 18:15 – 20:15 Thursday, 2nd September 10:00 – 11:30 17:30 – 19:30 Friday, 3rd September 13:00 – 14:30