Panel 22 July, 2015 Panel Data Intensive Science at HPCS 2015 – The International Conference on High Performance Computing & Simulation

Slides:



Advertisements
Similar presentations
Polska Infrastruktura Informatycznego Wspomagania Nauki w Europejskiej Przestrzeni Badawczej Institute of Computer Science AGH ACC Cyfronet AGH The PL-Grid.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
UrbanFlood Towards a framework for creation, deployment and reliable operation of distributed, time-critical applications Marian Bubak and Marek Kasztelnik.
Scientific Workflow Support in the PL-Grid Infrastructure with HyperFlow Bartosz Baliś, Tomasz Bartyński, Kamil Figiela, Maciej Malawski, Piotr Nowakowski,
High Performance Computing Course Notes Grid Computing.
Polish Infrastructure for Supporting Computational Science in the European Research Space GridSpace Based Virtual Laboratory for PL-Grid Users Maciej Malawski,
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Towards auto-scaling in Atmosphere cloud platform Tomasz Bartyński 1, Marek Kasztelnik 1, Bartosz Wilk 1, Marian Bubak 1,2 AGH University of Science and.
3 Cloud Computing.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.
Distributed Cloud Environment for PL-Grid Applications Piotr Nowakowski, Tomasz Bartyński, Tomasz Gubała, Daniel Harężlak, Marek Kasztelnik, J. Meizner,
CIRRUS Workshop, Vienna, Austria119 Nov 2013 Security in the Cloud Platform for VPH Applications Marian Bubak Department of Computer Science and Cyfronet,
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Advanced Grid-Enabled System for Online Application Monitoring Main Service Manager is a central component, one per each.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
An Answer to the EC Expert Group on CLOUD Computing Keith G Jeffery Scientific Coordinator.
Computer Science and Engineering 1 Cloud ComputingSecurity.
EScience challenges in levees monitoring - lessons from "flood" projects Marian Bubak Department of Computer Science AGH University of Science and Technology.
DISTRIBUTED COMPUTING
International Telecommunication Union Geneva, 9(pm)-10 February 2009 ITU-T Security Standardization on Mobile Web Services Lee, Jae Seung Special Fellow,
Architecting Web Services Unit – II – PART - III.
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues.
In each iteration macro model creates several micro modules, sends data to them and waits for the results. Using Akka Actors for Managing Iterations in.
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
Experience with the OpenStack Cloud for VPH Applications Jan Meizner 1, Maciej Malawski 1,2, Piotr Nowakowski 1, Paweł Suder 1, Marian Bubak 1,2 AGH University.
Basic Grid Registry configuration – there is not any backup data Grid Registry configuration where every domain has duplicated information Find all services.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
High Level Architecture (HLA)  used for building interactive simulations  connects geographically distributed nodes  time management (for time- and.
Distributed Computing Environment (DCE) Presenter: Zaobo He Instructor: Professor Zhang Advanced Operating System Advanced Operating System.
EC-project number: Universal Grid Client: Grid Operation Invoker Tomasz Bartyński 1, Marian Bubak 1,2 Tomasz Gubała 1,3, Maciej Malawski 1,2 1 Academic.
AKOGRIMO Integration of Grid services with mobile technologies; validation in e-health, e-learning and disaster management areas CoreGRID European Grid.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Lightweight construction of rich scientific applications Daniel Harężlak(1), Marek Kasztelnik(1), Maciej Pawlik(1), Bartosz Wilk(1) and Marian Bubak(1,
Federating PL-Grid Computational Resources with the Atmosphere Cloud Platform Piotr Nowakowski, Marek Kasztelnik, Tomasz Bartyński, Tomasz Gubała, Daniel.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
7. Grid Computing Systems and Resource Management
Workflow scheduling and optimization on clouds
Workshop on Cloud Services for File Synchronization and Sharing, CERN, November 17-18, Data Management Services for VPH Applications Marian Bubak,
High Level Architecture (HLA)  used for building interactive simulations  connects geographically distributed nodes  time management (for time- and.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Methods and Tools for Data Intensive Science on Distributed Resources Methods and Tools for Data Intensive Science on Distributed Resources Marian Bubak.
Cloud Computing 3. TECHNOLOGY GUIDE 3: Cloud Computing 2 Copyright John Wiley & Sons Canada.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
ORNL Site Report ESCC Feb 25, 2014 Susan Hicks. 2 Optical Upgrades.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
PLG-Data and rimrock Services as Building
Demo of the Model Execution Environment WP2 Infrastructure Platform
In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.
Demo of the Model Execution Environment WP2 Infrastructure Platform
From VPH-Share to PL-Grid: Atmosphere as an Advanced Frontend
Model Execution Environment for Investigation of Heart Valve Diseases
DICE - Distributed Computing Environments Team
Smart levee monitoring and flood decision support system: reference architecture and urgent computing management Bartosz Baliś, Tomasz Bartynski, Marian.
University of Technology
PROCESS - H2020 Project Work Package WP6 JRA3
1ACC Cyfronet AGH, Kraków, Poland
3 Cloud Computing.
Mariusz Sterzel1 , Lukasz Dutka1, Tomasz Szepieniec1
A Survey of Interactive Execution Environments
Presentation transcript:

Panel 22 July, 2015 Panel Data Intensive Science at HPCS 2015 – The International Conference on High Performance Computing & Simulation 22 July, Marian Bubak AGH University of Science and Technology Krakow, Poland and University of Amsterdam, Amsterdam, The Netherlands

DICE Team Academic Computer Centre CYFRONET AGH (1973) 120 employees Academic Computer Centre CYFRONET AGH (1973) 120 employees Department of Computer Science AGH (1980) 800 students, 70 employees Department of Computer Science AGH (1980) 800 students, 70 employees Faculty of Computer Science, Electronics and Telecommunication (2012) 2000 students, 200 employees Faculty of Computer Science, Electronics and Telecommunication (2012) 2000 students, 200 employees AGH University of Science and Technology (1919) 16 faculties, students; 4000 employees AGH University of Science and Technology (1919) 16 faculties, students; 4000 employees Other 15 faculties Distributed Computing Environments (DICE) Team Investigation of methods for building complex scientific collaborative applications Elaboration of environments and tools for e-Science Integration of large-scale distributed computing infrastructures Knowledge-based approach to services, components, and their semantic composition

From Workshop on Cloud Services for File Synchronisation and Sharing, CERN Nov 17-18, 2014 Protocols for file sharing and synchronization Reliability and consistency of file synchronization services Efficiency and scalability of file synchronization services File-sharing semantics Data analysis workflows Backend storage technologies Federated access to cloud storage Integration of large data repositories Mobile access to data

In service orchestration, all data is passed to the workflow engine Data transfers are made through SOAP, which is unfit for large data transfers Spiros Koulouzis, Reggie Cushing, Kostas Karasavvas, Adam Belloum, and Marian Bubak. Enabling web services to consume and produce large datasets. IEEE Internet Computing, 16(1):52–60, 2012 Spiros Koulouzis, Dmitry Vasyunin, Reginald Cushing, Adam Belloum, and MarianBubak. Cloud data federation for scientific applications. In Euro-Par 2013: Parallel Processing Workshops, LNCS 8374, pp 13–22. Springer, 2014 Storage federation Scalable data access

Cloud and Big Data Benchmarking and Verification Methodology Methodology of Evaluation of systems and applications – Qualitative metrics (architectures, functionality) – Quantitative metrics (performance, stability, cost) – Test scenarios, test cases and parameters – Experiment planning, analysis of results Selection of benchmarks – Portfolio of standard benchmarks – Design of application-specific scenarios Target platforms – IaaS clouds (public, private) – Hybrid Clouds with cloud bursting – Real-Time BigData processing systems (Hadoop, Spark, ElasticSearch) Collaboration with Samsung R&D Polska – Methodology applied to cloud infrastructure at the industrial partner – Consultancy on the analysis of results and development of Testing-as-a-service (TaaS) system K. Zieliński, M. Malawski, M. Jarząb, S. Zieliński, K. Grzegorczyk, T. Szepieniec, and M. Zyśk: Evaluation Methodology of Converged Cloud Environments. In: K. Wiatr, J. Kitowski, M. Bubak (Eds) Proceedings of the Seventh ACC Cyfronet AGH Users’ Conference, ACC CYFRONET AGH, Kraków, ISBN , pp (2014) 5

Data security in clouds To ensure security of data in transit Modern applications use secure tranport protocols (e.g.TLS) For legacy unencrypted protocols if absolutly needed, or as additional security measure: – Site-to-Site VPN, e.g. between cloud sites is outside of the instance, might use – Remote access – for individual users accessing e.g. from their laptops Data should be secure stored and realiable deleted when no longer needed Clouds not secure enough, data optimisations preventing ensuring that data were deleted A solution: – end-to-end encryption (decryption key stays in protected/private zone) – data dispersal (portion of data, dispersed between nodes so it’s non-trivial/impossible to recover whole message) J. Meizner, M. Bubak, M. Malawski, P. Nowakowski: Secure Storage and Processing of Confidential Data on Public Clouds. In: PPAM 2013, LNCS 8384, pp , Springer, 2014

Competences  Exploitation of PaaS-based solutions with in-house installations  Handling heterogeneous data in diverse scientific disciplines  Building multi-layer and multi-protocol software stacks Objectives  Ad-hoc metadata model creation and deployment of corresponding storage facilities  Create a research space for metadata model exchange and discovery with associated data repositories with access restrictions in place  Different types of storage sites and data transfer protocols Architecture  Web Interface-based metadata model management  PaaS-based repositories over REST  Site-specific storage infrastructure for file persistence Colaborative metadata management D. Harężlak, M. Kasztelnik, M. Pawlik, B. Wilk, and M. Bubak: A Lightweight Method of Metadata and Data Management with DataNet. In: M. Bubak, J. Kitowski, K. Wiatr (Eds.): eScience on Distributed Computing Infrastructure, LNCS Springer, pp , 2014

Levee Monitoring Application ISMOP project Levee breach threat due to a passing wave High water levels lasting for up to 2 weeks Large areas of levees affected (100+ km) 8

Flood threat assessment platform Bartosz Balis,Marek Kasztelnik, Maciej Malawski, Piotr Nowakowski, Bartosz Wilk, Maciej Pawlik, Marian Bubak: Execution Management and Efficient Resource Provisioning for Flood Decision Support. ICCS 2015: , Procedia Computer Science51, Elsevier 2015

Goal: Extending the traditional scientific publishing model with computational access and interactivity mechanisms; enabling readers (including reviewers) to replicate and verify experimentation results and browse large-scale result spaces. Challenges: Scientific: A common description schema for primary data (experimental data, algorithms, software, workflows, scripts) as part of publications; deployment mechanisms for on-demand reenactment of experiments in e-Science. Technological: An integrated architecture for storing, annotating, publishing, referencing and reusing primary data sources. Organizational: Provisioning of executable paper services to a large community of users representing various branches of computational science; fostering further uptake through involvement of major players in the field of scientific publishing. P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: Proceedings of the International Conference on Computational Science, ICCS 2011 (2011), Winner of the Elseview/ICCS Executable Paper Grand Challenge E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Service in Procedia Computer Science, vol. 18, 2013 Collage - executable e-Science publications

Simulating a city, citizen science SensorsSimulatingOpen Data Data Analytics Decision Understanding a city (mobility, crime, flood, health, evacuation, etc.) through computation Set of simulation combined together and reacting for changes Key challenges: Open data ( - Tomek Gubała’s initiative) Distributed environment with auto scaling capability (e.g. Atmosphere, AWS Auto Scaling, etc.) Simulation repository Decision Support System Proof of concept projects, which use Open Data (work in progress), https ://plankrk.herokuapp.com https ://plankrk.herokuapp.com

State Graph describing a filtering state machine for tweets which is mapped to 11 VMs Reginald Cushing, Adam Belloum, Marian Bubak, and Cees de Laat. Automata-based dynamic data processing for clouds. In Euro-Par 2014: Parallel Processing Workshops, LNCS 8805, pp 93–104, 2014 Reginald Cushing, Adam Belloum, Marian Bubak, and Cees de Laat. Towards Computing Without Borders: Data Processing Plane, In review: Future Generation of Computer Systems, 2015 Automata-based dynamic data processing Data processing schema can be considered as a state transformation graph The graph facilitates data processing in many ways – Data state can be easily tracked – Using the graph as a protocol header, a virtual data processing network layer is achieved – Data becomes self routable to processing nodes – Collaboration can be achieved by joining the virtual network