Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Slides:



Advertisements
Similar presentations
Scaling distributed search for diagnostics and prognostics applications Prof. Jim Austin Computer Science, University of York UK CEO Cybula Ltd.
Advertisements

Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Dynasoar Dynamic Deployment of Web Services on a Grid or the Internet or Why its good to be Jobless Paul Watson School of Computing Science.
EScience Meeting, Edinburgh, November Slide 1 CARMEN Code Analysis, Repository and Modelling for e-Neuroscience Jim Austin, Colin Ingram, Leslie.
Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
Grid Computing Workshop, Stirling, October Slide 1 CARMEN or Neuroinformatics: what can E-Science offer Neuroscience or E-Science, and Neuroscience:
CARMEN: Code Analysis, Repository and Modelling for e-Neuroscience.
The National Grid Service and OGSA-DAI Mike Mineter
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University.
Science Cloud Paul Watson Newcastle University, UK
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
Slide 1 The Sociology of Ontologies in Neurosciences Phillip Lord, School of Computing Science, Newcastle University.
Metadata For CARMEN Phillip Lord and Frank Gibson.
Digital Curation or Digital Data? The impact of Services and Federation Phil Lord Newcastle University.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
February Semantion Privately owned, founded in 2000 First commercial implementation of OASIS ebXML Registry and Repository.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Microsoft Research Faculty Summit Paul Watson Professor of Computer Science Newcastle University, UK.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
material assembled from the web pages at
DAME: Distributed Engine Health Monitoring on the Grid
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
The DAME project Professor Jim Austin University of York.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
Futures Lab: Biology Greenhouse gasses. Carbon-neutral fuels. Cleaning Waste Sites. All of these problems have possible solutions originating in the biology.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
A Dynamic Service Deployment Infrastructure for Grid Computing or Why it’s good to be Jobless Paul Watson School of Computing Science.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Middleware for Campus Grids Steven Newhouse, ETF Chair (& Deputy Director, OMII)
Sharing the knowledge of electrophysiology data Phillip Lord, Frank Gibson and the CARMEN Consortium.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM Virtual Geophysics Laboratory (VGL): Scientific workflows Exploiting the Cloud Josh.
Economic and On Demand Brain Activity Analysis on Global Grids A case study.
Adrian Jackson, Stephen Booth EPCC Resource Usage Monitoring and Accounting.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
The National Grid Service Mike Mineter.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
An Information Environment for Neuroscientists David Spence Oxford e-Research Centre.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Enhancements to Galaxy for delivering on NIH Commons
Accessing the VI-SEEM infrastructure
Solutions to Clinical Data Visualization and Analysis
EOSC MODEL Pasquale Pagano CNR - ISTI
UK Grid: Moving from Research to Production
University of Technology
MANAGING KNOWLEDGE FOR THE DIGITAL FIRM
The CARMEN e-Science pilot project: Neuroinformatics work packages.
Code Analysis, Repository and Modelling for e-Neuroscience
Large Scale Distributed Computing
Grid Systems: What do we need from web service standards?
Code Analysis, Repository and Modelling for e-Neuroscience
Tom Savel, MD Lead – Grid Technologies Medical Officer NCPHI, CDC
Presentation transcript:

Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University

e-Science “e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it” John Taylor Former Director General of the UK Research Councils

Two Strands to talk...

Research Challenge Understanding the brain is the greatest informatics challenge Enormous implications for science: Medicine Biology Computer Science

Collecting the Evidence 100,000 neuroscientists generate huge quantities of data – molecular (genomic/proteomic) – neurophysiological (time-series activity) – anatomical (spatial) – behavioural

Neuroinformatics Problems Data is: expensive to collect but rarely shared in proprietary formats & locally described The result is: a shortage of analysis techniques that can be applied across neuronal systems limited interaction between research centres with complementary expertise

Data in Science Bowker’s “Standard Scientific Model” 1.Collect data 2.Publish papers 3.Gradually loose the original data The New Knowledge Economy & Science & Technology Policy, G.C. Bowker Problems: –papers often draw conclusions from data that is not published –inability to replicate experiments –data cannot be re-used

Codes in Science Three stages for codes 1.Write code and apply to data 2.Publish papers 3.Gradually loose the original codes Problems: –papers often draw conclusions from codes that are not published –inability to replicate experiments –codes cannot be re-used

CARMEN enables sharing and collaborative exploitation of data, analysis code and expertise that are not physically collocated

CARMEN Project UK EPRSC e-Science Pilot £5M ( ) 20 Investigators Stirling St. Andrews Newcastle York Sheffield Cambridge Imperial Plymouth Warwick Leicester Manchester

Newcastle: Colin Ingram Paul Watson Stuart Baker Marcus Kaiser Phil Lord Evelyne Sernagor Tom Smulders Miles Whittington York: Jim Austin Tom Jackson Stirling: Leslie Smith Plymouth: Roman Borisyuk Cambridge: Stephen Eglen Warwick: Jianfeng Feng Sheffield: Kevin Gurney Paul Overton Manchester: Stefano Panzeri Leicester: Rodrigio Quian Quiroga Imperial: Simon Schultz St. Andrews: Anne Smith CARMEN Consortium

Industry & Associates

cracking the neural code neurone 1 neurone 2 neurone 3 raw voltage signal data typically collected using single or multi-electrode array recording Focus on Neural Activity

Epilepsy Exemplar Data analysis guides surgeon removing brain tissue WARNING! The next 2 Slides show an exposed brain

Epilepsy Exemplar Recording from removed tissue (up to 20 GB/h) On-line analysis by distributed collaborators will enable experiment to be defined during data collection Repository will enable integration of rare case types from different labs Advances in Treatment Data analysis guides surgeon removing brain tissue

e-Science Requirements Summary Sharing –data –code Capacity –vast data storage (100TB+ in CARMEN) –support data intensive analysis

CARMEN Cloud Architecture Data storage and analysis User access over Internet (typically via browser) Users upload data & services Users run analyses

e-Science Cloud Services Amazon (& Google) offer cloud computing –Basic storage & compute services –e.g. Amazon S3 & EC2 e-Science needs a set of higher-level services to support user needs Which services?....

CARMEN Cloud (CAIRN) Search for Data & Analysis Code Raw & Derived Data Store Structured Metadata Store Enabling Search & Annotation Analysis Code Store

Dynasoar Code Repository and Deployment –long term storage Code factored as Web Services –Standard (WS-I) interface –Internals not important Java, MatLab, C, C#,C++,... Deployers for a variety of service types –.war files (Tomcat), Virtual Machines (VMWare, Virtual PC),.NET assemblies, database stored procedures

Dynasoar: Dynamic Deployment 21 R The deployed service remains in place and can be re-used - unlike job scheduling A request to s4

Dynasoar 22 A request for s2 is routed to an existing deployment of the service

Performance Gains

Scalability

CARMEN Cloud (CAIRN) Search for Data & Analysis Code Raw Signal Data Search & Visualisation Enactment of scientific analysis processes Raw & Derived Data Store Security Policies Controlling Access to Data & Code Structured Metadata Store Enabling Search & Annotation Analysis Code Store

Controlled Sharing My collaborators can now see it Everyone can see it Only I am allowed to see this data Scientist

Security Solution XACML – standard way to encode rules as (subject, action, resource) triples Rules checked on each access

Controlled Sharing - conflicts My collaborators can now see it Only I am allowed to see this data All data must be accessible to everyone after the end of the project Scientist Funder

Addressing Conflicts Each party expresses policy as XACML rules Rules are converted to formal language –XACML -> VDM++ Run formal model to detect conflicts

OMII: Grimoire DAME: Signal Data Explorer OMII/ my Grid: Taverna OGSA-DAI, SRB, DAME Gold: Role & Task based Security my Grid & CISBAN Dynasoar CARMEN CAIRN

Using CARMEN for a typical scenario 1.Data Collection from a Multi-Electrode Array 2.Data Visualisation and Exploration 3.Spike Detection 4.Spike Sorting 5.Analysis 6.Visualisation of Analysis Results Currently, this is a semi-manual process CARMEN has automated this….

Web Portal

Raw Data Exploration with Signal Data Explorer

Defining the process with Workflow

Running a Workflow

SRB FileSystem RDBMS External Client Spike Sorting Service Reporting Dynamically Deployed Services in Dynasoar TAVERNA Registry INPUT Data OUTPUT Metadata Available Services Repository Security Workflow Engine Query Running the Workflow

Graphical Output

Movie Output

CARMEN ( is delivering an e-Science infrastructure that can be applied across a diverse range of applications uses a Cloud/Software as a Service architecture enables cooperation and interdisciplinary working aims to deliver new results in neuroscience, computer science and medicine