The EMBL-European Bioinformatics Institute

Slides:



Advertisements
Similar presentations
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Advertisements

STANFORD UNIVERSITY INFORMATION TECHNOLOGY SERVICES IT Services Storage And Backup Low Cost Central Storage (LCCS) January 9,
Steven Newhouse, Head of Technical Services Virtualisation and Cloud Computing at EBI.
HPC in the Human Genome Project James Cuff
3/5/2007 Copyright Notice COPYRIGHT © 2007 THE REGENTS OF THE UNIVERSITY OF MICHIGAN ALL RIGHTS RESERVED PERMISSION IS GRANTED TO USE, COPY, CREATE DERIVATIVE.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
SX.enterprise Architecture & Deployment Session #24 Duc Chu & Perry Jager.
Telnet/SSH: Connecting to Hosts Internet Technology1.
September 8th What, Why, When...and the Future Sierra w/o Wires.. and Sierra Data Center 2.
Virtual Company Group 8 Presentation Date: June /04/2017
UCL Site Report Ben Waugh HepSysMan, 22 May 2007.
Steven Newhouse, Head of Technical Services European Bioinformatics Institute: ICT Challenges.
1 Web Server Administration Chapter 1 The Basics of Server and Web Server Administration.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Presentation To. Mission Think Dynamics is in the business of automating the management of data center resources thereby enabling senior IT executives.
Microsoft ® System Center Service Manager 2010 Infrastructure Planning and Design Published: December 2010.
Windows Small Business Server 2003 Setting up and Connecting David Overton Partner Technical Specialist.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
-- Don Preuss NCBI/NLM/NIH
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Introducing Network Design Concepts Designing and Supporting Computer Networks.
IST Storage & Backup Group 2011 Jack Shnell Supervisor Joe Silva Senior Storage Administrator Dennis Leong.
The CRI compute cluster CRUK Cambridge Research Institute.
Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Introducing Network Design Concepts Designing and Supporting Computer Networks.
Windows Azure Virtual Machines Anton Boyko. A Continuous Offering From Private to Public Cloud.
E a s y S h a r e Jung Son Ky Le. Operational Concepts Recent years, huge number of growth in Internet users and broadband usage File-sharing become extremely.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. NAS versus SAN NAS – Architecture to provide dedicated file level access.
Chapter 3 Selecting the Technology. Agenda Internet Technology –Architecture –Protocol –ATM IT for E-business –Selection Criteria –Platform –Middleware.
SSH. 2 SSH – Secure Shell SSH is a cryptographic protocol – Implemented in software originally for remote login applications – One most popular software.
COMP1321 Digital Infrastructure Richard Henson March 2016.
Advanced Computing Facility Introduction
1.4 wired and wireless networks lesson 1
NG/VITA Strategy & Architecture Tony Shoot
Getting Connected to NGS while on the Road…
Open-E Data Storage Software (DSS V6)
Nexsan iSeries™ iSCSI and iSeries Topologies Name Brian Montgomery
Network Attached Storage Overview
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
NGS data transmission, A point view from a user
Backdoor Attacks.
Virtual Network Computing
NG/VITA Strategy & Architecture Tony Shoot December 19, 2006
ELIXIR: Potential areas for collaboration with e-Infrastructures
Introduction to Operating System (OS)
Network Requirements Javier Orellana
Cloud based Open Source Backup/Restore Tool
PRESENTATION ON Sky X TECH. SUBMETTED TO:- SUBMETTED BY:-
Amazon Web Services as a team project platform
Unit 1.4 Wired and Wireless Networks Lesson 1
Infrastructure, Data Center & Managed Services
Dedicated Hosting Servers In The US With Faster Bandwidth Connectivity Fast bandwidth connectivity for dedicated hosting servers is vital for any enterprise.
Unit 27: Network Operating Systems
Telnet/SSH Connecting to Hosts Internet Technology.
Delivering Distance Learning Experiments in Local Area Networking
File Transfer Issues with TCP Acceleration with FileCatalyst
Getting Connected to NGS while on the Road…
Information Technology Ms. Abeer Helwa
BusinessObjects IN Cloud ……InfoSol’s story
TWNIC mDNS Business Plan
CLASP Project AAI Workshop, Nov 2000 Denise Heagerty, CERN
Campus Software Deployment Solution
Beyond FTP & hard drives: Accelerating LAN file transfers
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
STATEL an easy way to transfer data
Presentation transcript:

The EMBL-European Bioinformatics Institute Manuela Menchi, EBI Systems Group

Goal of this presentation View at the flow of data to/from EBI from Systems group point of view. Introducing the EBI Systems group in order to establish connections for further discussion and collaboration to find a solution to the problem of data transfers over very long distance. October 2009

Outline of this presentation Introduce the Systems team EBI IT resources EBI network resources Data/Computing growth during years Flow of data to/from EBI by examples: ftp, fasp Aspera, network Conclusion Outline of this presentation October 2009

The EBI Systems Group EBI Systems Team works towards planning, implementing, maintaining IT infrastructure and services. ~450 users on-site all bioinformatics projects and services rely on EBI Systems support for planning, implementing and maintaining their IT infrastructure LSF Load Sharing Facility  job scheduler to execute batch jobs on networked Unix (or Windows) on many different platforms. October 2009

The EBI Systems Group core and desktop support 16 people in Systems team: 1 + 9 core, 6 desktop Core hardware infrastructure datacentre, disaster recovery sites, computing servers, storage, network (LAN/WAN) Core services infrastructure network (yp, ldap, ftp, fasp, rsync, svn, cvs, email), Platform LSF, database operations, backups Core user support of EMBL-EBI users in their daily activities. In addition to the central EBI infrastructure the group works closely with all project groups planning and maintaining any new project infrastructure (hw/os/system) to meet their specific needs. October 2009

Areas in the EBI Systems Group (core) Mr Pete Jokinen Technical Coordinator: Jonathan Barker Head of Systems Storage/Data Database Operations Oracle MySQL SAN Infrastructure SAN Storage NAS Scale-out Traditional 2-way Email FTP/Aspera Backup/Mirror Disaster Recovery strategies Datacentre/Networking/Computing Datacentre Operations Campus Disaster Recovery Sites Networking LAN WAN DNS,YP,LDAP Computing Farms LSF Technical Coordinator: Manuela Menchi October 2009

The EBI facilities over the WAN Duxford10Gbps Hinxton (Wellcome Trust Genome Campus) Duxford Datacentre Disaster Recovery Cambridge1Gbps Internet (via JaNET) London 10Gbps EMBL-EBI manages the 10GB connection to JANET (Joint Academic NETwork) it shares with the rest of the Wellcome Trust Genome Campus, Hinxton Our primary Internet connection is via London. We have an active backup link to Cambridge. Duxford: Virtual Ethernet Circuit = private circuit over Ethernet = 40KM from Campus October 2009

EBI Datacenters network connectivity Cambridge1Gbps London 10Gbps Sanger Institute Sulston Building Sanger Research Support Facility Duxford10Gbps Data center EMBL-EBI Sanger Labs/informatics Cairns Pavilion (shared) Thanks to Don Powell, Wellcome Trust Sanger Institute, for providing this image. October 2009

EBI IT Resources: computation More than 380 servers on PC farms More than 7000 server CPU cores EBI’s biggest server farms are: External Services 180 servers Ebi ~200 servers Operating Systems Linux CentOS Linux RedHat Enterprise Solaris Windows TRU64

How has number of cores grown

EBI IT Resources: storage EBI has ~5 Petabyte central storage available <1PB on SAN (*) and >4PB on NAS (**) (*) HP EVA, Hitachi Data Systems, NexSan (**) NetApp, BlueArc, Onstor, Isilon, Panasas, DataDomain

How has storage grown October 2009

Important factors in data growth 2008, 2009 Biological data explosion New projects Example European Nucleotide Archive Reads (1.5 PB from first half 2008) http://www.ebi.ac.uk/embl/Documentation/ENA-Reads.html Mirroring of primary storage to Duxford DR center as part of backup strategy and for disaster recovery October 2009

Data Transfers in / out EBI Aspera service ftp service rsync service courier service: disks deliveries Aspera file transport protocol using standard UDP. “Designed to transfer files over any IP network (LAN, WAN, satellite, and wireless)”. EBI introduced Aspera service during the second half of 2008. October 2009

ascp -QT -m 10M -l 50M SRC EBIaccount@fasp.era.ebi.ac.uk:DST ascp client The Aspera ascp command line file copying client can be downloaded from http://www.asperasoft.com/downloads/connect Example upload to EBI: ascp -QT -m 10M -l 50M SRC EBIaccount@fasp.era.ebi.ac.uk:DST “Please be aware that you will be using a large amount of bandwidth. In some cases it may be necessary to reduce the max-rate (-l). Also, you should never use max-rate (-l) larger than 400Mbps because EBI network limitations.” license up to 400Mbps firewall needs to have port 22/tcp and port 33001/udp open for incoming traffic -m minimum rate -l max rate -Q fair transfer policy enabled -T disable encryption (for max throughput) October 2009

Statistics: example of ftp.ebi.ac.uk /pub/databases/embl/ Sub-directories related to the EMBL database /pub/databases/embl/genomes/ Finished genomes, chromosomes and contigs /pub/databases/embl/release/ Complete latest full release of the EMBL Nucleotide Sequence Database /pub/databases/embl/cds/ Nucleotide sequences of the CDS (coding sequence) features, as annotated in EMBL database (EMBLCDS dataset) /pub/databases/embl/wgs/ Whole genome shotgun sequences /pub/databases/embl/align/ Complete list of sequence alignment data Old Format Alignments List of old format alignments on FTP server (ds prefix) /pub/software/ Freely available molecular biology software /pub/databases/embl/patent/ Sequences from the patent literature EMBL database October 2009

Statistics: example of ftp.1000genomes.ebi.ac.uk Downloads only October 2009

ftp.era.ebi.ac.uk fasp.era.ebi.ac.uk Example of preferred use of aspera vs ftp: specific project: European Nucleotide Archive Reads ftp.era.ebi.ac.uk fasp.era.ebi.ac.uk October 2009

A view at traffic composition uploads downloads aspera cumulative (from July 2009) ~20TB ~40TB ftp cumulative (from January 2009) ~8TB ~243TB rsync - courier (from January 2009) ~25TB many projects involve uploading data to EBI, uploads are increasing much more data is downloaded from EBI than uploaded October 2009

10GigE London usage statistic London weekly 10GigE London weekly 10GigE October 2009

10GigE London year usage statistic London 10GigE year October 2009

10GigE Duxford weekly usage statistic Duxford10GigE October 2009

Conclusion Thank you for inviting EBI Systems to this workshop. We are looking forward to further discussions in the near future: from troubleshooting to developing new ideas. We are keen to collaborate with all the network and bioinformatics experts in this workshop to meet the goal of transferring large datasets over the network between China and the EBI in realistic timeframes. Manuela Menchi manuela@ebi.ac.uk October 2009

Thank you! Manuela Menchi manuela@ebi.ac.uk Some background, the EBI is based on the Wellcome Trust Genome Campus in Hinxton, which is near Cambridge in UK. We share the campus with the Sanger Institute. The EBI is part of the European Molecular Biology Laboratory and as part of that, we’re a non-profit organisation. 30.06.2018