Download presentation
Presentation is loading. Please wait.
Published byDina Byrd Modified over 9 years ago
1
Genomes to Grids Thoughts on Building Data Grids for Biology Biologists have discovered many millions of genes and genome features, now part of the bio-data "library" distributed on computers around the world. Grid computing methods for finding and using interesting genome knowledge from this mountain of data are discussed - their promise and practical concerns for building usable bioinformatics grids. Don Gilbert, gilbertd@bio.indiana.edu
2
Bio Databanks, EBI, Sept. 2002 Many data objects, data sets updated frequently (daily) --> Keeping current data is a problem
3
Constellation of Bio-Data (SRS - Lion Bioscience) Many databanks, variously structured, widely distributed, loosely federated - finding “best data” a problem
4
Genome database & info system components Any genome database relies on, and feeds into, many other databases
5
BioGrid Schematic Grid-aware client software Data and software directories Grid of processing computers
6
Moving Bio-Data on the Grid 1. @virtualdata= biodirectory( "find protein coding sequences for Drosophila and Anopheles species”) 2. @realdata= biodirectory( "get locators for @virtualdata split n ways”) 3. for i (1.. n) { copydata(realdata[i],gridcpu[i]); runapp(gridcpu[i]) }
7
Directories of Bio-Data Directories are a necessary step for usable grids of bio-data –"broad and shallow" directories federate the "narrow and deep" databases Bio-Data Access Tools SRS, Sequence Retrieval System; Entrez ; AceDB; Genome relational databases (Ensembl, FlyBase, WormBase) ; IBM DiscoveryLink; BioDAS ; BioMoby Directory services for data access tools –Layer onto access tools for common query/retrieval of important data –LDAP: mature, efficient for high volumes, queries over distributed directories ; works well with bio-access tools –Web Services: XML messages over Web ; wide industry support, but standards are in progress
8
Bio-data Directory Needs Build on existing technology for finding distributed objects Efficient for millions of objects, by the gigabyte and terabyte Queries distributed across directories of collaborating services Support existing and new bioinformatics data access (relational dbs, object and XML dbs, SRS, Entrez, AceDB) Simple client program methods for computable use of directories Flexible, common schema for describing objects Replicate directories and objects among bioinformatics centers Peer-to-peer directories for collaborative projects Strong authentication and security for data access
9
Directory technology: LDAP,Web Services and/or? LDAP –Object-centric, optimized for efficient read operations. –Hierarchical, network-able, distributed and replicated in nature –Has many features needed for bio-data access Web/XML –SOAP+: SOAP for directory requests, WSDL to interface the directory repository, UDDI to locate the service (some assembly still required…) –UDDI is potential match to LDAP as directory technology –DSML: layer on top of LDAP for Web/XML interoperability Peer-to-peer (JXTA)? Grid SQL? XML-query systems? –Possible future directory technology
10
BioDirectory Tests SRS bioinformatics data retrieval system, for efficient retrieval of millions of bio-objects OpenLDAP for high performance and JavaLDAP for easy to configure directory transport. GLUE and Jakarta/Tomcat for Web Services tests. DSML, Directory Services markup for XML/LDAP conversion. Test queries: 20,000 to 1.2 million biosequence objects from GenBank, SwissProt and related dbs. IUBio SRS Server + LDAP, WebServices --> Bio-object directory search/retrieval
12
Using Bio Directories Simple client software Automated use People use Discovery Search by many criteria Retrieve bulk subsets
13
BioGridRunnerBioGridRunner A Globus CoG kit application for bioinformatics http://iubio.bio.indiana.edu/biogrid/runner/
14
Wrap up Future of Bio-data on Grids –Globus Toolkit useful for bio-grid data & compute intensive tools (BLAST, HMMer, Meme, others) –High volume, complex, changing, distributed data –Add methods to find & move data among grid, diretories of objects –LDAP works well ; Web-XML is usable, being defined Bio. Community needs and uses –Common data descriptions, schema, ontologies –Simple, practical, flexible grid methods ; use existing dbs See http://iubio.bio.indiana.edu/biogrid/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.