Genome Data Directories Don Gilbert, May 2003.

Slides:



Advertisements
Similar presentations
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,
Welcome to Middleware Joseph Amrithraj
High Performance Computing Course Notes Grid Computing.
Argos & Genome Directories & Lucegene (‘Lucy Jean’) A Replicable Genome infOrmation System of Common Components GMOD Meeting, Sept Don Gilbert,
Active Directory: Final Solution to Enterprise System Integration
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Distributed Heterogeneous Data Warehouse For Grid Analysis
June 15 SRS - A Backbone for Genome Information and Data Grid Systems Don Gilbert Indiana University
Chapter Goals Describe client/server and multi-tier application architecture and discuss their advantages compared to centralized applications Explain.
Workshop on Cyber Infrastructure in Combustion Science April 19-20, 2006 Subrata Bhattacharjee and Christopher Paolini Mechanical.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
2 Systems Architecture, Fifth Edition Chapter Goals Describe client/server and multi-tier application architecture and discuss their advantages compared.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
By Karan Oberoi.  A directory service (DS) is a software application- or a set of applications - that stores and organizes information about a computer.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Course Module: Introduction to Bioinformatics – CS 2001 July CS Databases.
Argos & Genome Directories & Lucegene (‘Lucy Jean’) A Replicable Genome infOrmation System of Common Components GMOD Meeting, Sept Don Gilbert,
WFleaBase Daphnia Genome Database from Common Components Daphnia Genomic Consortium Meeting, Sept Don Gilbert,
A Close Look Inside the SharePoint Engine Randy Williams, MVP MOSS Synergy Corporate Technologies
Genomes to Grids Bio Data Distribution for Grid Computing Biologists have discovered many millions of genes and genome features, now part of the bio-data.
© Geodise Project, University of Southampton, Data Management in Geodise Jasmin Wason, Zhuoan Jiao and Marc Molinari Engineering.
UDDI ebXML(?) and such Essential Web Services Directory and Discovery.
A Replicable Model Organism Information System FlyBase next-generation Don Gilbert, May 2003.
EMBRACE Web Services Taavi Hupponen CSC – Center for Scientific Computing, Finland BOSC 2007.
© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason and Marc Molinari
A. Cavalli - F. Semeria INFN Experience With Globus GIS 1 A. Cavalli - F. Semeria INFN First INFN Grid Workshop Catania, 9-11 April 2001 INFN Experience.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
From the World Directory of Collections of Cultures of Microorganisms towards Mash-up of Biodiversity and Sequence Databases Director, WFCC-MIRCEN World.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
OEI’s Services Portfolio December 13, 2007 Draft / Working Concepts.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
UKQCD QCDgrid Richard Kenway. UKQCD Nov 2001QCDgrid2 why build a QCD grid? the computational problem is too big for current computers –configuration generation.
Implementing LDAP Client/Server System for Directory Service By Maochun Sun Project Advisor: Dr. Chung-E Wang Department of Computer Science California.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
1 Windows 2008 Configuring Server Roles and Services.
The Anatomy of the Grid Introduction The Nature of Grid Architecture Grid Architecture Description Grid Architecture in Practice Relationships with Other.
10/25/20151 Single Sign-On Web Service Supervisors: Viktor Kulikov Alexander Sherman Liana Lipstov Pavel Bilenko.
LDAP (Lightweight Directory Access Protocol ) Speaker: Chang-Yu Wu Adviser: Quincy Wu Date:2007/08/22.
Genomes to Grids Thoughts on Building Data Grids for Biology Biologists have discovered many millions of genes and genome features, now part of the bio-data.
The HEP White Pages Project Ray Jackson CERN / IT - Internet Services Group 23rd April HEPiX/HEPNT Conference, LAL-Orsay, France.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Internet Search Tools Understand Internet search tools and methods.
An approach to Web services Management in OGSA environment By Shobhana Kirtane.
Brian Matthews, euroCRIS, 18/09/03 CRIS architecture to support an ERA Brian Matthews.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Directory Services CS5493/7493. Directory Services Directory services represent a technological breakthrough by integrating into a single management tool:
A Research Collaboratory for Open Source Software Research Yongqin Gao, Matt van Antwerp, Scott Christley, Greg Madey Computer Science & Engineering University.
Life Science Identifiers Chris Wroe (based on material from myGrid team and IBM Life Sciences)
1 CEG 2400 Fall 2012 Directory Services Directory Services eDirLDAP Active Directory.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Internet and Distributed Application Services
The LIBI Federated database
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
High Performance Computing Lab.
Bioinformatics Tools for Comparative Genomics of Vectors
Understand Internet Search Tools
Wsdl.
1.01- Understand Internet search tools and methods.
1.01- Understand Internet search tools and methods.
Lesson 3 Bioinformatics Laboratory
ACTIVE DIRECTORY An Overview.. By Karan Oberoi.
1.01- Understand Internet search tools and methods.
1.01- Understand Internet search tools and methods.
Presentation transcript:

Genome Data Directories Don Gilbert, May 2003

Data Access Needs Computable genome data access -- Page scraping and bulk files not enough -- Internet search & retrieval of all genome objects distributed among many sources -- Simple, flexible client program model -- Efficient for high volumes (10 5 objects; >1 GB sizes)

Directories of Genome Data Directories are a necessary step for bio grids "broad and shallow" directories federate the "narrow and deep" databases Bio-Data Access Tools SRS, Sequence Retrieval System; Entrez ; AceDB; Genome relational databases (Ensembl, FlyBase, WormBase) ; IBM DiscoveryLink; BioDAS ; BioMoby Directory services for data access Layer onto access tools for common query/retrieval LDAP: mature, efficient for high volumes, query distributed directories ; works well with bio-access tools Web Services: XML messages over Web ; wide industry support, standards are in progress

Directory components

Genome Directory Needs Build on existing technology for finding distributed objects Efficient for millions of objects, by the gigabyte and terabyte Queries distributed across directories of collaborating services Support existing and new bioinformatics data access (relational dbs, object and XML dbs, SRS, Entrez, AceDB) Simple client program methods for computable use of directories Flexible, common schema for describing objects Replicate directories and objects among bioinformatics centers Peer-to-peer directories for collaborative projects Strong authentication and security for data access

Directory Standards Open Grid Services Architechture (OGSA) SOAP based; query support for XML-SQL and Xquery. Data Access project: Lightweight Directory Access (LDAP) Robust system for distributed search and retrieval Object-centric, optimized for efficient read operations Hierarchical, distributed and replicated in nature Life Sciences ID (LSID) new standard for bio-object naming, with LDAP and WebServices implementations Moby project web services repository system

Directory Tests Design and test distributed access with LDAP and Web Services SRS backend for efficient search/retrieval from GenBank, SwissProt/TrEMBL, LocusLink, Medline, many others Find & fetch 20,000 to 1.2 million objects LDAP is ~10x faster than WebServices Test in progress for FlyBase

Directory Tests

Genome Directory Issues Directory tests at