Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues.

Similar presentations


Presentation on theme: "Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues."— Presentation transcript:

1 Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues in integrating Legacy Experiment Environment to Scientific Workflows Zhiming Zhao, Dmitry A. Vasunin, Adianto Wibisono, Adam Belloum, Cees de Laat, Pieter Adriaans, Bob Hertzberger

2 Outline Scientific experiments and R Problem description Optional solutions Experimental results Summarizing discussion Future work

3 Scientific experiments and support systems Experiment: on full data scale. Define goal Data analysis Prototype the algorithm Computing (Test with small data) Vis./Int. (Validation) Finding & Dissemination Apply to full size data Refine Prototype: on small data scale. In such scenarios: Existing experiment environments, such as R, are widely used by domain scientists Human in the loop computing is important for testing and validating prototypes scientific workflows are used to manage different processes and the experiment lifecycle

4 R and workflow support in VL-e R realises rich functionality of data statistics and visualisation, and has been used as an important experimental environment in bio-sciences. –R needs scientific workflow support Accessing different e-Science resources Being coordinated with the other components in a large scale experiment –E-Science workflows in certain domains also need R Reuse the advanced results from legacy systems Support experiments developed on legacy systems Workflow support in VL-e –Four systems are recommended Taverna, Kepler and VLAM have support to R –A generic solution is under construction

5 R in scientific workflows: current solutions Three types of solutions Local: local installation of R, through the command line interface of R –Simple configuration –Performance bottleneck Web Service: SOAP to pass R script and objects –Standard interface, distributed computing –High latency TCP Socket: socket interface (RServe) –Distributed computing –Maintain states –Poor security Wf system User Desktop Local R Env. Remote node Remote R Env. WS Socket L S W

6 Typical scenario of RServe and requirements on privacy Different levels of privacy issues Data level –Intermediate results not to be seen by the other users Communication level: graphical display –Remote X display and interaction between multi users WF1WF2RDisplay

7 Problem description and desired solution Problem description –Most of the legacy experiment environment do not have strong security management –Workflow systems provide integration without considering security issues –The deployment of remote environment is required to be secure Desire –Using existing technologies –Provide solutions to privacy issues at workflow level, preferably in a transparent way

8 Experiments Review optional solutions Investigate the overhead of security enhancement on the workflow execution

9 Different configurations and their level of security Data managementDisplay management Static (R engine) Shared engine Dynamic (R engine) different user account Static (X server)Dynamic (X server) {Job+VNC} Local XRemote X + VNC No.Yes NoYes Easy to setupThe endpoint is unknown at workflow design stage Individual X server, bounded to user’s desktop X is not protected Management overhead of VNC

10 An experiment: Taverna, RServe and security tunnel Experiment Adding security enhancement in Taverna Protect the data channels between Taverna and RServe Overhead –Setting up security tunnels –Runtime data transfer

11 Summarizing discussion Integrating existing experiment environment with workflow system is important for rapid prototyping Privacy issues are demanded by both users and e- Science infrastructure, and can be viewed a generic issue when integrating a user interaction enabled legacy component in workflow Privacy protection can be achieved at certain level by customizing the workflow execution Enhancing workflow execution not necessarily gives high penalty on execution

12 Future work In the VL-e project, we are developing a bus style generic solution for different workflow systems Taking the data privacy into account when realizing the interoperability between different workflow systems

13 Activities Int ’ l workshop on “ Workflow systems in e-Science ”, organized by Zhiming Zhao and Adam Belloum, in the context of ICCS, 2006 Reading University, 2007 Beijing, China. –Proceedings is in LNCS, Springer Verlag. –A special issue will be published in Scientific Programming Journal. –http://staff.science.uva.nl/~zhiming/iccs-wseshttp://staff.science.uva.nl/~zhiming/iccs-wses Workshop on “ Scientific workflows and industrial workflow standards in e-Science ”, organized by Adam Belloum and Zhiming Zhao, in the context of IEEE e-Science and Grid computing conference in Amsterdam December 2006. –Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South California) –BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) –Kepler, Dr. Bertram Lud ä scher (Department of Computer Science University of California, Davis) –Taverna, Prof. Peter Rice (European Bioinformatics Institute) –WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of Pi4 Technologies)Pi4 Technologies –Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University) –http://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htmhttp://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htm


Download ppt "Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues."

Similar presentations


Ads by Google