E-Science and Grid The VL-e approach L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
R. Belleman, 22 juni 2004VL-e technical overview VL-e toolkit development cycle Robert Belleman
BTB or not to B(TB) – That’s the question BTB GRID Marianne Heling Mark de Groot Michelle Niekoop Bioinformatics – A. van Kampen.
Programming Languages for End-User Personalization of Cyber-Physical Systems Presented by, Swathi Krishna Kilari.
From Digital Libraries and Multimedia Archives Towards Virtual Information and Knowledge Environments supporting Collective Memories Technology Platforms.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Virtual Laboratory for e-Science (VL-e) Henri Bal Department of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.
VL-e PoC Architecture and the VL-e Integration Team David Groep VL-e work shop, April 7 th, 2006.
VL-e PoC: What it is and what it isn’t Jan Just Keijser VL-e P4 Scaling and Validation Team TU Delft Grid Meeting, December 11th, 2008.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
Virtual Lab AMsterdam VLAM-G Project VLAM-G developers team Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
June Amsterdam A Workflow Bus for e-Science Applications Dr Zhiming Zhao Faculty of Science, University of Amsterdam VL-e SP 2.5.
Developing Reusable Software Infrastructure – Middleware – for Multiscale Modeling Wilfred W. Li, Ph.D. National Biomedical Computation Resource Center.
From Bio-Informatics towards e-BioScience L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit.
ICP ICT and Company Practise College 1 Dinsdag 3 april 2007 Geleyn Meijer.
A long tradition. e-science, Data Centres, and the Virtual Observatory why is e-science important ? what is the structure of the VO ? what then must we.
INFSO-SSA International Collaboration to Extend and Advance Grid Education ICEAGE Forum Meeting at EGEE Conference, Geneva Malcolm Atkinson & David.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 1 Slide 1 An Introduction to Software Engineering.
Storage and data services eIRG Workshop Amsterdam Dr. ir. A. Osseyran Managing director SARA
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues.
Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for Supercomputing Applications (NCSA) University of Illinois at.
E-science in the Netherlands Maria Heijne TU Delft Library Director / Chair Consortium of University Libraries and National Library.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Using the VL-E Proof of Concept Environment Connecting Users to the e-Science Infrastructure David Groep, NIKHEF.
August , Elsevier, Amsterdam Scientific Workflows in e-Science Dr Zhiming Zhao System and Network.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Scenarios for a Learning GRID Online Educa Nov 30 – Dec 2, 2005, Berlin, Germany Nicola Capuano, Agathe Merceron, PierLuigi Ritrovato
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Grids - the near future Mark Hayes NIEeS Summer School 2003.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Scaling and Validation Programme PoC David Groep & vle-pfour-team VL-e Workshop NIKHEF SARA LogicaCMG IBM.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
ICT infrastructure for Science: e-Science developments Henri Bal Vrije Universiteit Amsterdam.
Unit 2 Architectural Styles and Case Studies | Website for Students | VTU NOTES | QUESTION PAPERS | NEWS | RESULTS 1.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
Virtual Lab for e-Science Towards a new Science Paradigm.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Scaling and Validation Programme David Groep & vle-pfour-team VL-e SP Meeting NIKHEF SARA LogicaCMG IBM.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Grid and VOs. Grid from feet The GRID: networked data processing centres and ”middleware” software as the “glue” of resources. Researchers perform.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Processes of the Information Value Chain Informing Knowledge ActionProductive Knowledge Information Organizing Grouping Classifying Formatting Geo-referencing.
Virtual Lab AMsterdam VLAMsterdam Abstract Machine Toolbox A.S.Z. Belloum, Z.W. Hendrikse, E.C. Kaletas, H. Afsarmanesh and L.O. Hertzberger Computer Architecture.
Data and storage services on the NGS.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Visual Knowledge ® Software Inc. Visual Knowledge BioCAD Case Study Parallels to Other Domains VK Semantic Web Server.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Clouds , Grids and Clusters
Grid Computing.
University of Technology
VL-e PoC Architecture and the VL-e Integration Team
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

e-Science and Grid The VL-e approach L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam

Content Developments in Grid Developments in e-Science Objectives Virtual Lab for e-Science Research philosophy Conclusions

Background ICTpush developments Processing power doubles every 18 month Memory size doubles every 12 month Network speed doubles every 9 month Something has to be done to harness this development Virtualization of ICT resources  Internet  Web  Grid

Internal versus external bandwidth Mbit/s Computer busses networks

Web& Grid & Web/Grid Services Web services is a paradigm/way of using/accessing information Web resources are mostly human-centered (understood by humans, computers can read but can’t understand) Grid is about accessing & sharing computing resources by virtualization Data & Information repositories Experimental facilities OGSA: Service oriented Grid standard based on Web services

Web & Grid & Semantic Web/Grid Semantic Web resources should also be understandable by computers This way, many complex tasks formulated and assigned by humans can be automated and executed by agents working on the Semantic Web But, both Grid and Web Services only focus on single services. Semantic Web/Grid should be able to describe a single service as well as the relationship between services e.g. aggregate service using a number of services so making knowledge explicit

Levels of Grid abstraction Computational Grid Data Grid Information Web/Grid Knowledge Web/Grid

Background information experimental sciences There is a tendency to look ever deeper in: Matter e.g. Physics Universe e.g. Astronomy Life e.g. Life sciences Therefore experiments become increasingly more complex Instrumental consequences are increase in detector: Resolution & sensitivity Automation & robotization Results for instance in life science in:  So called high throughput methods  Omics experimentation  genome ===> genomics

New technologies in Life Sciences research University of Amsterdam cell GenomicsTranscriptomicsProteomicsMetabolomics RNA protein metabolites DNA Methodology/ Technology

Paradigm shift in Life science Past experiments where hypothesis driven Evaluate hypothesis Complement existing knowledge Present experiments are data driven Discover knowledge from large amounts of data  Apply statistical techniques

Background information experimental sciences Experiments become increasingly more complex Driven by detector developments  Resolution increases  Automation & robotization increases Results in an increase in amount and complexity of data

The Application data crisis Scientific experiments start to generate lots of data medical imaging (fMRI): ~ 1 GByte per measurement (day) Bio-informatics queries:500 GByte per database Satellite world imagery: ~ 5 TByte/year Current particle physics: 1 PByte per year LHC physics (2007): PByte per year Data is often very distributed

Background information experimental sciences Experiments become increasingly more complex Driven by detector developments  Resolution increases  Automation & robotization increases Results in an increase in amount and complexity of data Something has to be done to harness this development Virtualization of experimental resources: e-Science

The what of e-Science e-Science is the application domain “Science” of Grid & Web More than only coping with data explosion A multi-disciplinary activity combining human expertise & knowledge between:  A particular domain scientist  ICT scientist e-Science demands a different approach to experimentation because computer is integrated part of experiment  Consequence is a radical change in design for experimentation e-Science should apply and integrate Web/Grid methods where and whenever possible

Grid and Web Services Convergence Grid Definition of Web Service Resource Framework(WSRF) makes explicit distinction between “service” and stateful entities acting upon service i.e. the resources Means that Grid and Web communities can move forward on a common base!!! WSRF Started far apart in apps & tech OGSI GT2 GT1 HTTP WSDL, WS-* WSDL 2, WSDM Have been converging Ref: Foster Web

e-Science Objectives It should enhance the scientific process by: Stimulating collaboration by sharing data & information Result is re-use of data & information

The data sharing potential for Cognition Collaborative scientific research Information sharing Metadata modeling Allows for experiment validation Independent confirmation of results Statistical methodologies Access to large collections of data and metadata Training Train the next generation using peer reviewed publications and the associated data

Acquisition Alignment Reconstruction Segmentation Interpretation Electron tomography data pipeline

Cell Centred Database NCIMR (San Diego) Maryann Martone and Mark Ellisman

e-Science Objectives It should enhance the scientific process by: Stimulating collaboration by sharing data & information Improve re-use of data & information Combing data and information from different modalities Sensor data & information fusion

e-Science Objectives It should enhance the scientific process by: Stimulating collaboration by sharing data & information Improve re-use of data & information Combing data and information from different modalities  Sensor data & information fusion Realize the combination of real life & (model based) simulation experiments

An example of the combination of real life & (model based) simulation experiments patient specific vascular geometry (from CT) Segmentation blood flow simulation (Latice Bolzmann) Pre-operative planning (interaction) Suitable for parallelization through functional decomposition Simulated “Fem-Fem” bypass Patient’s vascular geometry (CTA) Grid resources Simulated Vascular Reconstruction in a Virtual Operating Theatre

e-Science Objectives It should enhance the scientific process by: Stimulating collaboration by sharing data & information Improve re-use of data & information Combing data and information from different modalities  Sensor data & information fusion Realize the combination of real life & (model based) simulation experiments Modeling of dynamic systems

Dynamic bird behaviour MODELS Bird distributions Ensembles Calibration and Data assimilation Predictions and on-line warnings RADAR Bird behaviour in relation to weather and landscape

e-Science Objectives It should result in : Computer aided support for rapid prototyping of ideas Stimulate the creativity process It should realize that by creating & applying: New ICT methodologies and a computing infrastructure stimulating this From this ICT point of view it should support the following application steps: Design Development & realization Execution Analysis & interpretation We try to realize e-Science and their applications via the Virtual Lab for e-Science (VL-e) project

Virtual Lab for e-Science research Philosophy Multidisciplinary research & the development of related ICT infrastructure Generic application support Application cases are drivers for computer & computational science and engineering research

Grid Services Harness multi-domain distributed resources Management of comm. & computing VL-e Application Oriented Services Food Informatics Dutch Telescience Medical Diagnosis & Imaging VL-e project Bio- Informatics Data Insive Science/ LOFAR Bio- Diversity Data intensive sciences

Two sides of Bioinformatics as an e-Science The scientific responsibility to develop the underlying computational concepts and models to convert complex biological data into useful biological and chemical knowledge Technological responsibility to manage and integrate huge amounts of heterogeneous data sources from high throughput experimentation

Role of bioinformatics cell Data generation/validation Data integration/fusion Data usage/user interfacing GenomicsTranscriptomicsProteomicsMetabolomics Integrative/System Biology RNA protein metabolites DNA methodology bioinformatics

Virtual Lab for e-Science research Philosophy Multidisciplinary research and development of related ICT infrastructure Generic application support Application cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever possible

Grid/ Web Services Harness multi-domain distributed resources Management of comm. & computing Management of comm. & computing Management of comm. & computing Potential Generic part Potential Generic part Potential Generic part Application Specific Part Application Specific Part Application Specific Part Virtual Laboratory Application Oriented Services Application pull

Generic e-Science aspects Virtual Reality Visualization & user interfaces Modeling & Simulation Interactive Problem Solving Data & information management Data modeling dynamic work flow management Content (knowledge) management Semantic aspects Meta data modeling  Ontologies Wrapper technology Design for Experimentation

Virtual Lab for e-Science research Philosophy Multidisciplinary research and development of related ICT infrastructure Generic application support Application cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever possible Rationalization of experimental process Reproducible & comparable

Issues for a reproducible scientific experiment interpretation Rationalization of the experiment and processes via protocols processing processed data conversion, filtering, analyses, simulation, … experiment parameters/settings, algorithms, intermediate results, … Parameter settings, Calibrations, Protocols … software packages, algorithms … raw data acquisition sensors,amplifiers imaging devices,, … presentation visualization, animation interactive exploration, … Metadata Much of this is lost when an experiment is completed.

Scientific experiments & e-Science Step1: designing an experiment Step2: performing the experiment Step3: analyzing the experiment results success For complex experiments:  contain complex processes  require interdisciplinary expertise  need large scale resource Grid & high level support

Experiment Topology –Graphical representation of self-contained data processing modules attached to each other in a workflow Process-Flow Templates(PFT) – Derived from ontologies – Graphical representation of data elements and processing steps in an experimental procedure – Information to support context-sensitive assistance (semantics) Study – Descriptions of experimental steps represented as an instance of a PFT with references to experiment topologies Components in a VL-e experiment

Step1: designing an experiment Step2: performing the experiment Step3: analyzing the experiment results success Taverna, Kepler, and Triana: model processes for computing tasks. In VL-e: model both computing tasks and human activity based processes, and model them from the perspective of an entire lifecycle. It tries to support Collaboration in different stages Information sharing Reuse of experiment PFT in VLAMGPFT instances in VLAMG Process Flow Template e-Science environment Scientific Workflow Management Systems Process Flow Template

Definition of experiment protocols Workflow definitions Recreate complex experiments into process flows Workflow execution Maintain control over the experiment Data processing execution Topologies of data processing modules Interpretation Visualization of processed results to help intuition Ontology definitions can help in obtaining a well-structured definition of experiment, data and metadata.

Virtual Lab for e-Science research Philosophy Multidisciplinary research and development of related ICT infrastructure Generic application support Application cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever possible Rationalization of experimental process Reproducible & comparable Two research experimentation environments Proof of concept for application experimentation Rapid prototyping for computer & computational science experimentation

The VL-e infrastructure Grid Middleware Surfnet Application specific service Application Potential Generic service & Virtual Lab. services Grid & Network Services Virtual Laboratory VL-e Proof of Concept Environment Telescience Medical Application Bio Informatics Applications VL-e Experimental Environment Virtual Lab. rapid prototyping (interactive simulation) Additional Grid Services (OGSA services) Network Service (lambda networking) VL-e Certification Environment Test & Cert. Compatibility Test & Cert. Grid Middleware Test & Cert. VL-software

Infrastructure for Applications Applications are a driving force of the PoC Experience shows applications value stability Foster two-way interaction to make this happen

Taverna, Kepler, Triana and VLAM. Step1: designing an experiment Step2: performing the experiment Step3: analyzing the experiment results success PFT e-Science environment Scientific Workflow Management Systems SWMS Research activity to be developed in the Rapid Prototyping environment Stable developments to be used in VL-e in the Proof of Concept environment Research activity to be developed in the Rapid Prototyping environment PFT

VL-e PoC environment Latest certified stable software environment of core grid and VL-e services Core infrastructure built around clusters and storage at SARA and NIKHEF (‘production’ quality) Controlled extension to other platforms and distributions On the user end: install needed servers: user interface systems, storage elements for data disclosure, grid-secured DB access Focus on stability and scalability

Hosted services for VL-e Key services and resources are offered centrally for all applications in VL-e Mass data and number crunching on the large resources at SARA Storage for data replication & distribution Persistent ‘strategic’ storage on tape Resource brokers, resource discovery, user group management

Why such a complex scheme? “software is part of the infrastructure” stability of core software needed to develop the new scientific applications enable distributed systems management (who runs what version when?) “the grid is one big error amplifier” “computers make mistakes like humans, only much, much faster”

What did we learn It is not enough to just transport current applications to Grid or e-Science infrastructures To fully exploit the potential of e-Science infrastructures one has to learn what is possible Therefore the full lifecycle of an experiment has to be taken into account Workflow management is a first step We add semantic information via Process Flow Templates Application innovation such as for instance bio- banking should be the aim Grids should be transparent for the end-user

Conclusions e-Science is a lot more more than trying to cope with data explosion alone Implementation of e-Science systems requires further rationalization and standardization of experimentation process e-Science success demands the realization of an environment allowing application driven experimentation & rapid dissemination of feed back of these new methods We try to do that via development of Proof of Concept based on Grid

Electron tomography data pipeline development TOM (Matlab) acquisition and data storage with the SRB SRB Storage With the CCDB database for retrieval of experimental data sets Grid-computing 3D reconstruction refinements Currently for EMAN and later also for TOM (Matlab) SARA - Amsterdam 1 Gbit/s