NGS induction --- case study: the BRIDGES project Micha Bayer Grid Services Developer, BRIDGES project National e-Science Centre, Glasgow Hub.

Slides:



Advertisements
Similar presentations
PRAGMA BioSciences Portal Raj Chhabra Susumu Date Junya Seo Yohei Sawai.
Advertisements

NGS computation services: API's,
The National Grid Service and OGSA-DAI Mike Mineter
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
A Computation Management Agent for Multi-Institutional Grids
EDINA 20 th March 2008 EDINA Geo/Grid - Security Prof. Richard O. Sinnott Technical Director, National e-Science Centre University of Glasgow, Scotland.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
Master’s course Bioinformatics Data Analysis and Tools Lecture 6: Internet Basics Centre for Integrative Bioinformatics.
15th January, NGS for e-Social Science Stephen Pickles Technical Director, NGS Workshop on Missing e-Infrastructure Manchester, 15 th January, 2007.
Understanding and Managing WebSphere V5
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
Portals and Credentials David Groep Physics Data Processing group NIKHEF.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
XCAT Science Portal Status & Future Work July 15, 2002 Shava Smallen Extreme! Computing Laboratory Indiana University.
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
23:48:11Service Oriented Cyberinfrastructure Lab, Grid Portals Fugang Wang April 29
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
HERA/LHC Workshop, MC Tools working group, HzTool, JetWeb and CEDAR Tools for validating and tuning MC models Ben Waugh, UCL Workshop on.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Using NMI Components in MGRID: A Campus Grid Infrastructure Andy Adamson Center for Information Technology Integration University of Michigan, USA.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Tony Doyle & Gavin McCance - University of Glasgow ATLAS MetaData AMI and Spitfire: Starting Point.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Java Portals and Portlets Submitted By: Rashi Chopra CIS 764 Fall 2007 Rashi Chopra.
Usability Talk, 26 th January 2006 Development of Usable Grid Services for the Biomedical Community Prof Richard Sinnott Technical Director National e-Science.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Rob Allan Daresbury Laboratory A Web Portal for the National Grid Service Xiaobo Yang, Dharmesh Chohan, Xiao Dong Wang and Rob Allan CCLRC e-Science Centre,
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
A Dynamic Service Deployment Infrastructure for Grid Computing or Why it’s good to be Jobless Paul Watson School of Computing Science.
CNGrid GOS 3.0 Practice OMII-Euro & CNGrid Joint Training Material QiaoJian Jan
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
MEMBERSHIP AND IDENTITY Active server pages (ASP.NET) 1 Chapter-4.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Creating and running an application.
Shibboleth & Grid Integration STFC and University of Oxford (and University of Manchester)
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Using the ARCS Grid and Compute Cloud Jim McGovern.
MGRID Architecture Andy Adamson Center for Information Technology Integration University of Michigan, USA.
Application Specific Module Tutorial Zoltán Farkas, Ákos Balaskó 03/27/
The NGS Grid Portal David Meredith NGS + Grid Technology Group, e-Science Centre, Daresbury Laboratory, UK
Manchester Computing Supercomputing, Visualization & eScience Seamless Access to Multiple Datasets Mike AS Jones ● Demo Run-through.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Shibboleth Use at the National e-Science Centre Hub Glasgow at collaborating institutions in the Shibboleth federation depending.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
BIG DATA/ Hadoop Interview Questions.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Portlet Development Konrad Rokicki (SAIC) Manav Kher (SemanticBits) Joshua Phillips (SemanticBits) Arch/VCDE F2F November 28, 2008.
Open Source distributed document DB for an enterprise
CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.
Viet Tran Institute of Informatics Slovakia
Unit 27: Network Operating Systems
Patrick Dreher Research Scientist & Associate Director
Module 01 ETICS Overview ETICS Online Tutorials
Presentation transcript:

NGS induction --- case study: the BRIDGES project Micha Bayer Grid Services Developer, BRIDGES project National e-Science Centre, Glasgow Hub

The BRIDGES project  Biomedical Research Informatics Delivered by Grid-Enabled Services  2 year e-Science project, started 1 st October 2003  aim: provide data integration and grid-based compute power for Cardiovascular Functional Genomics project  CFG project investigates genetic predisposition for hypertensive heart disease  my role on project: develop grid applications for end users

BRIDGES requirements and the NGS functional:  high throughput compute tasks, e.g. large BLAST jobs non-functional:  interfaces to applications should be targeted at the less computer literate --- users range in computer literacy from fairly advanced to mildly technophobic  security requirements should not cause any extra work or inconvenience for users as this may put them off altogether  resources provided by BRIDGES compete with familiar, similar resources already on offer at established bioinformatics institutions (EBI, NCBI, EMBL) -> need to make things “palatable” so people do use it

How to get your job onto the NGS NGS clusters NGS portal GSI-SSH project portal standalone GUI client custom solutions: standard solutions: Leeds Oxford RAL Manchester

Custom grid applications  if possible/appropriate, get a developer to write bespoke interface to a grid app running on NGS  only worthwhile if application is used frequently and/or by many users and is relatively unchanging/simple  best to hide complexity of grid from users altogether  users should not even have to choose between resources  automatic scheduling of jobs to resources that currently have spare capacity is desirable  best option for delivery is portlet in project-specific web portal – just need web browser for access then

Project web portals  portals are configurable, personalized collections of web applications delivered to a web browser as a single page  NGS encourage projects to maintain their own web portals to deliver apps to their users  applications can then be provided through user-friendly, specific portlet interfaces  allows the hiding of grid complexity from users  requires developer time  BRIDGES portal currently uses IBM Websphere (free to academia)

More on portals  increasingly important technology – not just for grid computing (cf. Yahoo)  gives end users a customized view of software and hardware resources specific to their particular application domain  also provides a single point of access to Grid-based resources following user authentication (“single-sign-on”)  content is provided by portlets (Java servlet extension) – JSR168 standard provides for exchangeability  some portal packages currently available: IBM Websphere, Gridsphere, JetSpeed, uPortal, Jportlet, Apache Pluto

Authentication and User Management (1) model adopted in BRIDGES:  requirement was for users not to have to obtain and manage certificates  we applied for a single project account at NGS – users do not need individual NGS accounts  this account maps to a single user (“BRIDGES”) on the NGS with home directories on all nodes (like normal users)  authentication for this user on NGS is by means of the host certificate of the machine where the jobs are submitted from (under control of BRIDGES project)  users authenticate via the BRIDGES web portal using standard username and password pairs

Authentication and User Management(2)  Users can create accounts for themselves in BRIDGES Websphere portal (“self-care”)  alternatively one could of course give the users usernames and passwords  information gathered is kept in Websphere's secure user database  current info is very basic but will be extended to include more detail (e.g. URL of user's project or departmental website where the user is listed)  provides at least a basic means of accounting for user activity  no need for physically visiting the Registration Authority/presenting ID  may need to resort to stricter security if system is abused e.g. if impersonation takes place etc.

Authorisation with PERMIS  PERMIS = grid authorisation software developed at Salford University (  BRIDGES uses PERMIS to differentially allow users access to resources  typical use is with GT3.3 service but lookup-type use is also possible with other services (in our case GT3.0.2)  code in our service calls a PERMIS authorisation service running on a machine at NeSC  user's roles are queried and access to resource is permitted or denied accordingly  gives BRIDGES staff full control over who is allowed to use NGS resource through our applications NGS Leeds Oxford RAL Manch ester end user ScotGRID NeSC Condor Pool

Security in BRIDGES – summary NGS clusters Leeds Oxford RAL Manch ester end user BRIDGES web portal NeSC machine with PERMIS authorisation service (GT3.3) NeSC grid server with host credentials authenticate at BRIDGES web portal with username and password only job request is passed on securely with username get user authorisations make host proxy, authenticate with NGS and submit job

Host authentication for job submission  allows us to submit jobs to NGS as user “BRIDGES”  apply for host certificate for the grid server machine as normal (UK e-Science Certification Authority)  results in a passwordless private key and host certificate for the machine  Java Cog kit code can then be used to generate a host proxy locally  this is used for job submission

Use case: Microarray reporter sequence BLAST jobs  microarray chips contain up to 400,000 reporter sequences  these need to be compared to existing annotated sequence databases  takes approx. 3 weeks to compute against human genome on average desktop machine “Job processing – please wait....” (and wait....and wait....)

BLAST  Basic Local Alignment Search Tool  used for comparing biological sequences (DNA, protein) against a set of target sequences  returns a sorted list of matches  most widely used algorithm for this sort of thing  compute intensive

How do I get my application to run efficiently on a grid?  applications to be deployed on a compute grid need to be parallelised to really benefit (can of course just run them as single jobs too)  for this one must be able to partition a job into several subjobs  these then get processed separately at the same time on multiple processors  need to combine results of individual subjobs at the end

Parallel BLAST – grid style  partition your job by putting one or several query sequences into a separate input file (= 1 subjob)  distribute all input files, the executable and target data onto your grid clusters (“stage-in”)  results are returned to the server and combined there  if 100 free processors are available, and 100 subjobs are to be run, the time taken is 1/100th of the time it would have taken to run the whole job on a single machine (plus overheads for scheduling, data transfer and result combining)

To stage or not to stage?  file staging is the copying – at runtime – of files onto the remote resource  example: BLAST jobs we need  input file  target data file (“database” – really a flat text file)  executable (BLAST)  target files and executable are unchanging components for this kind of job  it is best to store these locally on the remote resources to avoid staging overhead (target data are in the region of several gb in size and growing exponentially)  rather than individual users keeping multiple copies of publicly available data in their home directories, get sys admins to put up copies visible to all  must stage in input files since these vary from job to job

BRIDGES GridBLAST Job Submission end user machine NESC Grid Server (Titania) GT 3 core grid service return result send job request PBS wrapper BRIDGES Meta- Scheduler ScotGRID masternode ScotGRID worker nodes jobs farmed out to compute nodes PBS server side + BLAST NESC Condor pool Condor Central Manager Condor + BLAST GridBLAST client Apache Tomcat Condor wrapper execution hosts Oxford headnode GT2.4 + BLAST execution hosts Leeds headnode GT2.4 + BLAST GT2.4 wrapper NGS

Current status of our system  software is still at prototype stage – haven’t benchmarked any really big jobs yet  Java webstart client (launched from portal) connects to service – needs to be changed to portlet  user registration needs to be revised and users re- registered  happy to share portlet code etc with others once finished

How we worked with the NGS  BRIDGES was one of the first projects doing bio stuff on NGS  we established a basic infrastructure needed for BLAST on the NGS clusters in collaboration with NGS user support  good collaboration on our security requirements – very helpful and accommodating  our project account is the first of its kind and we jointly tailored a solution that would fit BRIDGES  ask for what you need! things are not cast in stone and it is supposed to be a public service

Public bioinformatics infrastructure on NGS – current status  we are in the process of establishing an infrastructure for BLAST jobs that can be used by all  this includes:  making BLAST and mpiBLAST executables publicly available  mirroring the entire NCBI BLAST databases repository  currently trialling this on Leeds node – will be replicated at other nodes eventually  data replication on all nodes necessary to avoid severe performance hits  input from others needed and welcome!

Contact details  BRIDGES website:  Code repository (available soon):  BRIDGES web portal:  Contacts: Micha Bayer at NeSC in Glasgow -- Richard Sinnott at NeSC in Glasgow --