Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow

Slides:



Advertisements
Similar presentations
DyVOSE Status Report Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University.
Advertisements

BRIDGES Status Report Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Software change management
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP1. Project Management.
Facilitating the use of eInfrastructure: NeSC Training Team Enabling, facilitating and delivering quality training in the UK and Internationally.
Holding slide prior to starting show. Supporting Collaborative Working of Construction Industry Consortia via the Grid - P. Burnap, L. Joita, J.S. Pahwa,
Systematic Review Data Repository (SRDR™) The Systematic Review Data Repository (SRDR™) was developed by the Tufts Evidence-based Practice Center (EPC),
The Community Authorisation Service – CAS Dr Steven Newhouse Technical Director London e-Science Centre Department of Computing, Imperial College London.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
GEODE Project introduction and summary, 12/12/05 GEODE: Grid Enabled Occupational Data Environment GEODE Project introduction and summary, 12/12/05 Motivation.
EDINA 20 th March 2008 EDINA Geo/Grid - Security Prof. Richard O. Sinnott Technical Director, National e-Science Centre University of Glasgow, Scotland.
Evidence-Based Information Retrieval in Bioinformatics
03 December 2003 Digital Certificate Operation in a Complex Environment Consultation/Stakeholders Meeting 3 December 2003.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
15th January, NGS for e-Social Science Stephen Pickles Technical Director, NGS Workshop on Missing e-Infrastructure Manchester, 15 th January, 2007.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
NGS induction --- case study: the BRIDGES project Micha Bayer Grid Services Developer, BRIDGES project National e-Science Centre, Glasgow Hub.
Portals and Credentials David Groep Physics Data Processing group NIKHEF.
E-Science Education Workshop, 1-2 Nov 2004 Teaching Grid Computing Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Data integration via XML Ela Hunt John Wilson Vangelis Pafilis Inga Tulloch
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
South West Grid for Learning Educational Portal Awareness Event.
Gene Expression Omnibus (GEO)
Taverna and my Grid Basic overview and Introduction Tom Oinn
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Copyright OpenHelix. No use or reproduction without express written consent1.
BRIDGES Status Report Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University.
1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Supporting further and higher education The Akenti Authorisation System Alan Robiette, JISC Development Group.
Usability Talk, 26 th January 2006 Development of Usable Grid Services for the Biomedical Community Prof Richard Sinnott Technical Director National e-Science.
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.
OGF22 25 th February 2008 OGF22 Demo Slides Prof. Richard O. Sinnott Technical Director, National e-Science Centre University of Glasgow, Scotland
Metadata Mòrag Burgon-Lyon University of Glasgow.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
IBM & HSBC visit Malcolm Atkinson Director & e-Science Envoy UK National e-Science Centre & e-Science Institute 30 th March 2006.
Ela Hunt, MRC research fellow Department of Computing Science SyntenyVista BIOINFORMATICS RESEARCH CENTRE.
PIXUS - The JISC Image Portal Demonstrator Portals & Portlets 2003 e-Science Institute Sandy Buchanan
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Dynamic Privilege Management Infrastructures Utilising Secure Attribute Exchange Dr John Watt Grid Developer, National e-Science Centre University of Glasgow.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Condor Technology Solutions, Inc. Grace Performance Chemicals HRIS Intranet Project.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
© Copyright AARNet Pty Ltd PRAGMA Update & some personal observations James Sankar Network Engineer - Middleware.
The National Grid Service Mike Mineter.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Virtual Organisations for Trials and Epidemiological Studies (VOTES) Overview VOTES is a pioneering project investigating the application of Grid technology.
Collaborative Tools for the Grid V.N Alexandrov S. Mehmood Hasan.
Shibboleth Use at the National e-Science Centre Hub Glasgow at collaborating institutions in the Shibboleth federation depending.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
INTAROS WP5 Data integration and management
Ian Bird GDB Meeting CERN 9 September 2003
Mangaldai College, Mangaldai
BioMedBridges – Work Packages 2 & 12
Technical Outreach Expert
Presentation transcript:

Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow 18 th March 2004 BRIDGES Status Report

Overview Review goals of Bridges project Briefly summarise technical approach Outline achievements thus far Demonstration Plans for the future

Bridges Goals High blood pressure affects 25% of adults in western societies Cardiovascular Functional Genomics (CFG) project investigating this through physiological models of hypertension in rat Bridges is a supporting project to CFG and will provide Grid infrastructure to facilitate scientific research CFG project partners are distributed but need to access and integrate various software and especially data resources Main aims of BRIDGES are to develop re-useable infrastructure to provide data federation incorporating appropriate security concerns

CFG Partner Distribution Shared data Glasgow Edinburgh Leicester Oxford London Netherlands Public curated data Private data Private data Private data Private data Private data Private data

Problems to be addressed BRIDGES will address the following problems facing CFG biologists How to integrate data with multiple levels of security including public data, project only data and private data? How to search multiple distributed databases through single optimised queries? How to use multiple tools in a coordinated (and automated) manner, e.g. how to develop re-useable workflows for the CFG scientists? Integration of a range of bioinformatics analysis and visualisation tools, e.g. BLAST, genome browsers, etc. How to deal with inconsistencies of online databases and possible “dirty data”? How to get more “up to date” data? Make it all user friendly…  portals,  hidden infrastructure, e.g. security authorisation

Planned Approach BRIDGES will address these problems through Development of re-useable Grid services based upon GT3 technologies Virtualisation of multiple distributed data sets to provide a single virtual data set for use by the biologists – exploiting IBM’s DiscoveryLink Developing a collection of data on a well-managed platform, including copies of extracts of relevant public data, all project data, and the required software tools (administered using DB2 and DiscoveryLink) Access to and integration of multiple distributed data sets in a Grid environment using results from the OGSA_DAI/DAIT projects A secure environment offering authentication and authorisation  will build on results of the PERMIS security authorisation project

Bridges team Project Management Richard Sinnott Dave Berry Database Design/Development Derek Houghton Grid Services Developer Micha Bayer Magnus Ferrier Technical Input David White, Jean-Christophe Mestres, Andy Knox, Emmanuel Guyonnet (IBM), Ela Hunt (Glasgow), Neil Hanlon (Glasgow) Prof’s David Gilbert, Malcolm Atkinson, Anna Dominiczak,

Achievements Web site and project portal established Engaged with CFG consortia Staff trained in relevant technologies GT3, DiscoveryLink, Condor Initial version of local repository developed Populated with data that cannot be federated  e.g. public data sets with no programmatic interface –Ensembl/EMBL-EBI, NCBI - GENBANK, REFSEQ, Gene Expression Omnibus UCSC, SwissProt/TrEMBL UniSTS/dbSTS UNIGENE LOCUSLINK GENMAPP OMIM Sanger dbSNP dbEST InterPro, Pfam,Prints,Cath, SCOP, ProSite, Weissman Institute PDB Rikken Rat Genome DB, Mouse Atlas, Affymetrix, … Includes shared data sets of CFG scientists  QTL DB, …

Achievements …ctd GT3 based Grid services offered that allow to make use of these local data sets Grid enabled BLAST services produced  Offer access to large e-Science infrastructures at Glasgow (ScotGrid) SyntenyVista tool extended to allow Grid enabled visual navigation of genomic data sets Planned front end for many other tools Externally Poster at AHM 2003 Tutorial submitted to ISMB/ECCB (the major bioinformatics conference) Liaising with other projects  eDIKT, myGrid, GeneGrid, PERMIS,...

Achievements …ctd Demonstration of some of the achievements

Plans Refine/extend and requirements Further refinement of use cases & scenarios More data sets (public, shared, private, …) Implementation and realisation of further use cases e.g. extended query services for microarray data interpretation, workflows for probe set mapping, … Security realisation and roll-out We can only help share CFG data sets if we can get SECURE access to them – following up with CFG sites  Authorisation with PERMIS coming  GSI based authentication Investigate application of replication manager (RLS)  Should support illusion of data from each site being available to all other sites Further Grid based data visualisation services accessible via SyntenyVista Ensure that keep track of relevant developments (WSRF, GT4, …)

To sequence To multiple alignment To tabular summaries DRILL-DOWN FUNCTIONS Future Vision of Tools via Portal

Questions?

Other Scenarios to be Realised Manual micro array data interpretation For each probe visit Affymetrix web pages For each probe follow links  OMIM, HUGO, Ensembl, PubMED, GeneCards, … For each probe look up map positions For each probe select papers from Pubmed, print out the papers Examine the datasets at RIKEN, Array Express, or other expression databases Examine any other data Correlate other results with our data Use of BRIDGES technologies will allow to link multiple remote/local data sets and have queries over those data sets automate processes for dealing with responses, e.g. workflows…

Other Scenarios …ctd Design PCR probes for a large number of micro array hits query Affymetrix database  for each probe name -> find target sequence (sequence from which probes were designed)  for each probe BLAST at for rat, human, mouse genes, & rat, human, mouse genomic sequencewww.ensembl.org record map locations record real gene sequences Takes 5-10 minutes per probe - human intensive Relies on shared resources ensembl, Affymetrix Local resources (ScotGrid, others) available for BLAST’ing Local data repository for most up to date/relevant CFG data sets Automated processes to realise these (and other) scenarios

Initial Deadlines and Deliverables Started October 2003 (when full team on board!) WP1 - ends M3 Hold 2-day workshop with all participants (CFG leaders, IBM specialists, bioinformaticians, team members) Agree on team training Schedule installation of software infrastructure Develop an architecture Choose initial set of use cases and identify test data sets/analysis tools used to establish system is functional Outcome of WP1 is initiated UP with initial architecture, set of use cases and initial system design D1.1 List of documented use cases D1.2 Architecture Definition D1.3 Plan and prototypes for UP cycle

Initial Deadlines and Deliverables …ctd WP2 starts M3 ends M6 Develop collection of data on well-managed platform, including copies of extracts of relevant public data, all project data, and required software tools  Administer this using DB2 and DiscoveryLink Use Grid technology to farm out workloads  Should make use of large e-Science computational infrastructures (ScotGrid, …)  Data manager will work with the CFG researchers to migrate the relevant subset of their data into required form Outcome of WP 2 enlarged set of refined use cases operational system subset of research data organised so can be used by bioinformaticians  D2.1 Updated list of documented use cases  D2.2 Working system at Glasgow and Edinburgh  D2.3 Report on cycle 1: experience, lessons & issues  D2.4 Plan and base system for UP cycle 2

System Usage Scenario Usage of Extended SyntenyVista and BLAST service BRIDGES Portal Data Repository Client Site X Secure access for CFG VO Shared/ Private Data Sets Personalised Services BLAST Smith W SV DL OGSA-DAI Authorisation Per user, per site Data remote? Browser based clients… Java App downloaded (via WebStart) QTL DB Relevant data sets copied onto ScotGrid and correctly formatted CONDOR POOL??? Export interesting data

Security Authentication via X.509 certificate based PKI Embedded in browsers Authorisation via PERMIS PERMIS working with GLOBUS team to define Security Assertion MarkUp Language (SAML) interface to GT3  PERMIS SAML interface already implemented – now waiting for GT3 to support this interface  Likely early April – (von Welch)

Where we are today! Web site and project portal established DiscoveryLink (Information Integrator) DB repository established and being populated … with public data sets (data warehousing) … links to ensembl (federated data) … with local CFG VO shared data sets (QTL DB) Grid services developed (BLAST, …) Through the pain barrier of GT3! General usage of ScotGrid OpenPBS job submission from client with data staging Extended SyntenyVista to work with remote data sets Gaining experience with security technologies Setting up policies with PERMIS etc