Recap: introduction to e-science New IT is coming closer to medical practice New IT is already used in “Big science” e-Science e-Science approach used for medical research e-Science research topics infrastructure data management distributed processing user interfaces collaboration MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Course Overview Lectures Introduction to e-Science Computing on e-infrastructures Practice Introduction to WS-PGRADE Exercises Wrap-up MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Computing on e-infrastructures MIK 2.1 Databases and Networksystems Guest lecture “e-science” Silvia Delgado Olabarriaga Bioinformatics Laboratory, KEBB http://bioinformaticslaboratory.nl/twiki/bin/view/BioLab/EducationMIKDB2013
Computing on e-infrastructures Grid computing Workflow management Science gateway MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Types of e-infrastructures Not only computing: Also.. High-performance computing (HPC) Virtual computing High-throughput computing (HTC) Network Data services Visualization Collaboration Support MIK 2.1 DBNS - Computing on e-infrastructures, 2013
e-science infrastructures in NL https://www.sara.nl/systems/hpc-cloud https://www.surfsara.nl MIK 2.1 DBNS - Computing on e-infrastructures, 2013
High Performance Computing (HPC) Cluster set of loosely connected computers that work together so that in many respects they can be viewed as a single system Homogeneous vs. heterogeneous, fast network, shared data space http://mooc-inst.sara.cloudlet.sara.nl/mooc/cluster.html GPU (Graphical Processing Unit) massively parallel processing units Stream processing, SIMD (single instruction multiple data) Supercomputer computer at the frontline of current processing capacity, particularly speed of calculation memory, processors, special architecture MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Cloud Computing Internet-based computing shared resources, software and information are provided to computers and other devices on-demand, similar to a public utility such as the electricity grid. Refers to both the applications delivered as services over the Internet and the hardware and systems software in the data centers that provide those services. Virtualization http://en.wikipedia.org/wiki/Cloud_computing http://www.infoworld.com/d/cloud-computing/what-cloud-computing-really-means-031 MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Cloud Computing Types Public cloud Private cloud Utility computing: service being sold Novelties appearance of infinite computing resources available on demand ability to pay for use Michael Armbrust et al. “A View of Cloud Computing”. Communications of the ACM, Vol. 53 No. 4, Pages 50-58 http://cacm.acm.org/magazines/2010/4/81493-a-view-of-cloud-computing/fulltext MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 …. XXX As a Service Infrastructure as a service (IaaS) Platform as a service (PaaS) Software as a service (SaaS) Storage as a service (STaaS) … (on demand, pay as go) http://en.wikipedia.org/wiki/Cloud_computing MIK 2.1 DBNS - Computing on e-infrastructures, 2013
High Throughput Computing: Grid “a system that uses open, general-purpose protocols to federate distributed resources…” resources: computing, data, storage, services, software, equipment, expertise Jan Foster, “What is the grid? A three point check list”, 2002 http://esc.dl.ac.uk/StarterKit/100136.html Key: FEDERATION “organization or group within which smaller divisions have some degree of internal autonomy” http://mooc-inst.sara.cloudlet.sara.nl/mooc/wms.html MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Virtual Organizations (VO) Grid certificate MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Example: Life Science Grid VARIOUS VO: VLEMED LSGRID BBMRI … MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Overview: e-infrastructures MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Discussion in groups (5min) What is the difference/similarity/relationship between Cluster and supercomputer Cluster and grid Grid and cloud computing High performance and high throughput computing Parallel and distributed computing? What could be the role of such computing infrastructures for the information infrastructure in healthcare? MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Recap: Computing on e-infrastructures Grid computing Workflow management Science gateway ✔ ✔ ✔ ✔ ✔ MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Workflow is… (adapted from wikipedia) sequence of connected steps; depiction of a sequence of operations designed to achieve processing intents of some sort, such as physical transformation, service provision, or information processing. MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Workflow Reference Model http://www.wfmc.org/reference-model.html MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Basic workflow concepts Components, processes, activities Inputs, outputs (ports) Process Input port Output port MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Workflow step1 step2 step3 step4 Linked components Data passed around Processes started when data is available MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Workflow Management System Start program to execute processes Pass data between processes Manage intermediate data Distribute computation (parallel processes) Retry failed processes Keep track of what which processes were executed where and which data has been generated by which process (provenance) http://mooc-inst.sara.cloudlet.sara.nl/mooc/wfms.html MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Which workflow system? SHaring Interoperable Workflows for large-scale scientific simulations on Available DCIs http://shiwa-workflow.eu/ MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Example: WS-Pgrade Components Web services, grid jobs, other workflows Web portal http://www.guse.hu/ MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Recap: Computing on e-infrastructures Grid computing Workflow management Science gateway ✔ ✔ ✔ ✔ ✔ ✔ MIK 2.1 DBNS - Computing on e-infrastructures, 2013
What is a Science Gateway? Interface to an e-infrastructure community-developed set of tools, applications, and data integrated via a portal, usually in a graphical user interface customized for a specific community Also known with other terms Portal, Virtual research environment, collaboratory, Virtual laboratory, problem solving environment, … https://www.xsede.org/gateways-overview MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Types of science gateways Depending on functionality that is offered Generic Run jobs, access to files, grid authentication, status, … Dedicated to some scientific area Neuroscience, protein docking, molecular chemistry Depending on the technology used Custom Scripts, php, web applications Based on framework Portal framework, grid portal framework MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 AMC generic Gateway Workflows Grid, cluster, server Monitoring Collaboration MIK 2.1 DBNS - Computing on e-infrastructures, 2013
AMC Computational Neuroscience Gateway http://bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/NSGUserDoc http://www.youtube.com/watch?feature=player_embedded&v=ACDfK9Xt7ss MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Browse data, run applications MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Monitor processing Application Output Data Input Data MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Under the hood Commit/ MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Recap: Computing on e-infrastructures Grid computing Workflow management Science gateway ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ MIK 2.1 DBNS - Computing on e-infrastructures, 2013
MIK 2.1 DBNS - Computing on e-infrastructures, 2013 Recap There are various types of e-infrastructures Different characteristics, different usages Grid computing is one of them FEDERATION is the keyword Coordination of processing and data on distributed infrastructures is difficult Workflow management systems help coordination Science gateways provide high level interfaces (web) MIK 2.1 DBNS - Computing on e-infrastructures, 2013
Questions?