Toward an open-data infrastructure for WeNMR and structural biology-related services Alexandre M.J.J. Bonvin WeNMR Project coordinator Bijvoet Center for.

Slides:



Advertisements
Similar presentations
E-NMR (RI ) is funded by the European Commission under the Research Infrastructure Programme The e-NMR project: deploying and unifying.
Advertisements

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI AAI in EGI Status and Evolution Peter Solagna Senior Operations Manager
A Worldwide e-Infrastucture for NMR and structural biology Project Coordinator: Prof. Alexandre M.J.J. Bonvin, Utrecht University, NL
Blind Testing of Routine, Fully Automated Determination of Protein Structures from NMR Data Antonio Rosato, James M. Aramini, Cheryl Arrowsmith, Anurag.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Portals and Credentials David Groep Physics Data Processing group NIKHEF.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
Anna Scipioni Stefano Nativi Jerome Bequignon Mirco Mazzucato Lorenzo Bigagli Vincenzo Cuomo CYCLOPS Project.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
Open Source Grid Computing in the Finance Industry Alex Efimov STFC Kite Club Knowledge Exchange Advisor UK CERN Technology Transfer Officer
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
What is OMII-Europe? Qin Li Beihang University. EU project: RIO31844-OMII-EUROPE 1 What is OMII-Europe? Open Middleware Infrastructure Institute for Europe.
User requirements for and concerns about a European e-Infrastructure Steven Newhouse, Director.
INFSO-RI Enabling Grids for E-sciencE GLOBAL GRID USER SUPPORT THE MODEL AND EXPERIENCE IN LCG/EGEE Gilles Mathieu(1), Torsten Antoni(2),
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
A short introduction to the Worldwide LHC Computing Grid Maarten Litmaath (CERN)
Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ?? Athens, May 5-6th 2009 Community Support.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
The ILC And the Grid Andreas Gellrich DESY LCWS2007 DESY, Hamburg, Germany
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
MTA SZTAKI Hungarian Academy of Sciences Introduction to Grid portals Gergely Sipos
 Our mission Deploying and unifying the NMR e-Infrastructure in System Biology is to make bio-NMR available to the scientific community in.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Terena conference, June 2004, Rhodes, Greece Norbert Meyer The effective integration of scientific instruments in the Grid.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
A worldwide e-Infrastructure for NMR and structural biology A worldwide e-Infrastructure for NMR and structural biology Introduction Structural biology.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The GILDA t-Infrastructure Roberto Barbera.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Structural Biology on the GRID Dr. Tsjerk A. Wassenaar Biomolecular NMR - Utrecht University (NL)
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE08 conference, Istambul Life sciences cluster perspective on EGI V. Breton, CNRS On.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI User Support services and activities Gergely Sipos User Community Support.
Extending WeNMR e-Infrastructure outside Europe Marco Verlato INFN (National Institute of Nuclear Physics) Division of Padova Italy
The WeNMR HADDOCK portal goes crowd computing Alexandre M.J.J. Bonvin Project coordinator Bijvoet Center for Biomolecular Research Faculty of Science,
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
An Open Education Cloud Alexandre Bonvin, Utrecht University WeNMR VRC / WestLife VRE /MoBrain CC But today mainly wearing a lecturer’s hat Open Science.
Update report Marco Verlato INFN (National Institute of Nuclear Physics) Division of Padova Italy CHAIN Workshop, Taipei, Taiwan,
The WeNMR VRC: operating a worldwide e-Infrastructure for structural biology Marco Verlato INFN (National Institute of Nuclear Physics) Division of Padova.
EGEE is a project funded by the European Union under contract IST Compchem VO's user support EGEE Workshop for VOs Karlsruhe (Germany) March.
A worldwide e-Infrastructure and Virtual Research Community for NMR and structural biology Alexandre M.J.J. Bonvin Project coordinator Bijvoet Center for.
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Structural Biology on the Grid Christophe Schmitz Bijvoet Center for Biomolecular Research Faculty of Science, Utrecht University The Netherlands
DutchGrid KNMI KUN Delft Leiden VU ASTRON WCW Utrecht Telin Amsterdam Many organizations in the Netherlands are very active in Grid usage and development,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Requirements from the WeNMR VRC User Forum 2011, Vilnius Workshop.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI A pan-European Research Infrastructure supporting the digital European Research.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
An short overview of INSTRUCT and its computational requirements Alexandre M.J.J. Bonvin Project coordinator Bijvoet Center for Biomolecular Research Faculty.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
A Competence Center to Serve Translational Research from Molecule to Brain Alexandre M.J.J. Bonvin MoBrain CC coordinator Bijvoet Center for Biomolecular.
WeNMR Operations Report Marco Verlato INFN (National Institute of Nuclear Physics) Division of Padova Italy EGI Community Forum.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Accessing the VI-SEEM infrastructure
INFN Grid - Referee meeting Bologna, 25 July 2008
September 19th-23th, Lyon, France
Marc van Dijk | WeNMR | StickyBits
Alexandre M.J.J. Bonvin MoBrain CC coordinator
DIRAC 2014, Helsinki, May 21th, 2014 Alexandre M.J.J. Bonvin
UK GridPP Tier-1/A Centre at CLRC
Supporting user communities across EGI and OSG: the WeNMR use case
Leveraging GPGPU Computing in Grid and Cloud Environments: First Results from the Mobrain CC Alexandre Bonvin Utrecht University
EGI – Organisation overview and outreach
Volume 20, Issue 2, Pages (February 2012)
Presentation transcript:

Toward an open-data infrastructure for WeNMR and structural biology-related services Alexandre M.J.J. Bonvin WeNMR Project coordinator Bijvoet Center for Biomolecular Research Faculty of Science, Utrecht University the Netherlands With contributions from Dr. Antonio Rosato (RDA structural biology interest group) Dr. Yannick Legre (Life Sciences VRC and bioinformatics) Dr. Martin Walsh (INSTRUCT)

Research in Structural Biology and Life sciences Global Multidisciplinary Integrative Researchers often not experts in e-Science solutions Data often tightly linked to specific projects

Open data access What is the data lifecycle in Structural Biology? Should we store/preserve everything?

WeNMR VRC (Nov. 2013) One of the largest (#users) VO in life sciences > 580 VO registered users (35% outside EU) > 1000 VRC members ~ CPU cores > 4.7M CPU hours over the last 12 months > 1.8 million jobs over the last 12 months User-friendly access to Grid via web portals NMR SAXS A worldwide e-Infrastructure for NMR and structural biology A worldwide e-Infrastructure for NMR and structural biology

Our policy is to shield as much as possible the end user from the grid and all middleware related issues and commands We chose to develop mainly web portals providing “protocolized” access to the grid To facilitate operation, we use when possible robot certificates 33 portals (6 developed by external teams) are currently embedded in the WeNMR gateway and provide services for NMR and SAXS spectroscopists 11 of these run calculations on the grid The WeNMR service portfolio

The WeNMR Virtual Research CommunityThe WeNMR Virtual Research Community Knowledge Help Centers Tutorials Wiki docs Movies Consultancy Services Portals VRC Third-party aggregation Grid Exposure Marketplace Blogs, news, events… (often picked up from eScienceTalk/GridCast) Users SSO Facebook, Linkedin But where are the data?But where are the data?

Haddock web portal

User friendly easy interface > 3400 registered users (Since. Nov all with grid access via robot) > served runs since June 2008 ~ 12% on the GRID

Grid daemons beyond a portal Data transfer / jobs User level – data transfer via web interface: Data input: ~10 kB to 10 MB Data output: ~10MB to a few GB Grid level – data/jobs transfer (UI WMS CE WN) : Job submission: ~250 to 2500 single jobs Data transfer: ~1 to 10MB Output: ~1-20MB per job Local cluster– data/jobs transfer: Job submission: ~10 to 50 single jobs Data: a few GB Some portals make use of storage elements for the data

HADDOCK Results

Data volumes vary Some WeNMR portal generate 0.5 – 5GB of compressed data per run Translates into GB per day Users only typically see a fraction of those data Currently data from WeNMR portals are not stored: – Usually deleted within a few weeks – Responsibility of user to save data Some experimental facilities are generating in the order of 1- 6 TB a day (X-ray beamline at DIAMOND synchrotron)

Current data model

Data challenges and opportunities for a federated data infrastructure Link data generation (at instruments) with data processing and analysis (at remote locations and on e- Infrastructure) Persistent storage for (original) data and derived data (results) Persistent? storage for intermediate results Fast access – dedicated light paths?

Data challenges and opportunities for a federated data infrastructure Provide collaborative environments to share / process / analyze data (“à la Dropbox) Provide simple mechanisms to manage the data: sharing / protection / deletion Various access levels depending on security requirements – Personal X.509 certificated – not user friendly – robot certificates – open access Provide mechanisms for citation / recognition

Add value to the data! Provide additional analysis of the data (post-processing / analysis as a service to users Involve software developers and help them incorporate their solutions into the data repositories (software is often a limiting factor) This however calls for data storage close to processing power Need for innovative tools for enhanced visualization of structural data – E.g. to enable comparison of data from different sources and technique

The data web from a user perspective Bad model from a user perspective: N sites N connections / SLAs Bad model from a user perspective: N sites N connections / SLAs

The data web from a user perspective Organiza tion X Better model:

Utrecht University, Bijvoet Center for Biomolecular Research, NL Johann Wolfgang Goethe Universität Frankfurt a.M., Center for Biomolecular Magnetic Resonance DE University of Florence, Magnetic Resonance Center, IT Istituto Nazionale di Fisica Nucleare, Padova, IT Raboud University, Nijmegen, NL University of Cambridge UK European Molecular Biology Laboratory, Hamburg, DE Spronk NMR Consultancy, LT Academia Sinica, TW The partners The team Alexandre M.J.J. Bonvin Andrea Giachetti Antonio Rosato Anurag Bagaria Christophe Schmitz Eric Frizziero Gijs van der Schot Harald Schwalbe Hendrik R. A. Jonker Ivano Bertini Johan van der Zwan Lucio Ferella Marc van Dijk Marco Verlato Mikael Trellet Mirco Mazzucato Nuno Loureiro-Ferreira Peter Güntert Rolf Boelens Sjoerd J. de Vries Stefano Dal Pra Torsten Herrmann Tsjerk A. Wassenaar Victor Jaravine Wim F. Vranken Danny Hsu Simon Lin Acknowledgments Contract n°: RI

wordle.net