Presentation is loading. Please wait.

Presentation is loading. Please wait.

The EMBL-European Bioinformatics Institute

Similar presentations


Presentation on theme: "The EMBL-European Bioinformatics Institute"— Presentation transcript:

1 The EMBL-European Bioinformatics Institute
Manuela Menchi, EBI Systems Group

2 Goal of this presentation
View at the flow of data to/from EBI from Systems group point of view. Introducing the EBI Systems group in order to establish connections for further discussion and collaboration to find a solution to the problem of data transfers over very long distance. October 2009

3 Outline of this presentation
Introduce the Systems team EBI IT resources EBI network resources Data/Computing growth during years Flow of data to/from EBI by examples: ftp, fasp Aspera, network Conclusion Outline of this presentation October 2009

4 The EBI Systems Group EBI Systems Team works towards planning, implementing, maintaining IT infrastructure and services. ~450 users on-site all bioinformatics projects and services rely on EBI Systems support for planning, implementing and maintaining their IT infrastructure LSF Load Sharing Facility  job scheduler to execute batch jobs on networked Unix (or Windows) on many different platforms. October 2009

5 The EBI Systems Group core and desktop support
16 people in Systems team: core, 6 desktop Core hardware infrastructure datacentre, disaster recovery sites, computing servers, storage, network (LAN/WAN) Core services infrastructure network (yp, ldap, ftp, fasp, rsync, svn, cvs, ), Platform LSF, database operations, backups Core user support of EMBL-EBI users in their daily activities. In addition to the central EBI infrastructure the group works closely with all project groups planning and maintaining any new project infrastructure (hw/os/system) to meet their specific needs. October 2009

6 Areas in the EBI Systems Group (core)
Mr Pete Jokinen Technical Coordinator: Jonathan Barker Head of Systems Storage/Data Database Operations Oracle MySQL SAN Infrastructure SAN Storage NAS Scale-out Traditional 2-way FTP/Aspera Backup/Mirror Disaster Recovery strategies Datacentre/Networking/Computing Datacentre Operations Campus Disaster Recovery Sites Networking LAN WAN DNS,YP,LDAP Computing Farms LSF Technical Coordinator: Manuela Menchi October 2009

7 The EBI facilities over the WAN
Duxford10Gbps Hinxton (Wellcome Trust Genome Campus) Duxford Datacentre Disaster Recovery Cambridge1Gbps Internet (via JaNET) London 10Gbps EMBL-EBI manages the 10GB connection to JANET (Joint Academic NETwork) it shares with the rest of the Wellcome Trust Genome Campus, Hinxton Our primary Internet connection is via London. We have an active backup link to Cambridge. Duxford: Virtual Ethernet Circuit = private circuit over Ethernet = 40KM from Campus October 2009

8 EBI Datacenters network connectivity
Cambridge1Gbps London 10Gbps Sanger Institute Sulston Building Sanger Research Support Facility Duxford10Gbps Data center EMBL-EBI Sanger Labs/informatics Cairns Pavilion (shared) Thanks to Don Powell, Wellcome Trust Sanger Institute, for providing this image. October 2009

9 EBI IT Resources: computation
More than 380 servers on PC farms More than 7000 server CPU cores EBI’s biggest server farms are: External Services 180 servers Ebi ~200 servers Operating Systems Linux CentOS Linux RedHat Enterprise Solaris Windows TRU64

10 How has number of cores grown

11 EBI IT Resources: storage
EBI has ~5 Petabyte central storage available <1PB on SAN (*) and >4PB on NAS (**) (*) HP EVA, Hitachi Data Systems, NexSan (**) NetApp, BlueArc, Onstor, Isilon, Panasas, DataDomain

12 How has storage grown October 2009

13 Important factors in data growth 2008, 2009
Biological data explosion New projects Example European Nucleotide Archive Reads (1.5 PB from first half 2008) Mirroring of primary storage to Duxford DR center as part of backup strategy and for disaster recovery October 2009

14 Data Transfers in / out EBI
Aspera service ftp service rsync service courier service: disks deliveries Aspera file transport protocol using standard UDP. “Designed to transfer files over any IP network (LAN, WAN, satellite, and wireless)”. EBI introduced Aspera service during the second half of 2008. October 2009

15 ascp -QT -m 10M -l 50M SRC EBIaccount@fasp.era.ebi.ac.uk:DST
ascp client The Aspera ascp command line file copying client can be downloaded from Example upload to EBI: ascp -QT -m 10M -l 50M SRC “Please be aware that you will be using a large amount of bandwidth. In some cases it may be necessary to reduce the max-rate (-l). Also, you should never use max-rate (-l) larger than 400Mbps because EBI network limitations.” license up to 400Mbps firewall needs to have port 22/tcp and port 33001/udp open for incoming traffic -m minimum rate -l max rate -Q fair transfer policy enabled -T disable encryption (for max throughput) October 2009

16 Statistics: example of ftp.ebi.ac.uk
/pub/databases/embl/ Sub-directories related to the EMBL database /pub/databases/embl/genomes/ Finished genomes, chromosomes and contigs /pub/databases/embl/release/ Complete latest full release of the EMBL Nucleotide Sequence Database /pub/databases/embl/cds/ Nucleotide sequences of the CDS (coding sequence) features, as annotated in EMBL database (EMBLCDS dataset) /pub/databases/embl/wgs/ Whole genome shotgun sequences /pub/databases/embl/align/ Complete list of sequence alignment data Old Format Alignments List of old format alignments on FTP server (ds prefix) /pub/software/ Freely available molecular biology software /pub/databases/embl/patent/ Sequences from the patent literature EMBL database October 2009

17 Statistics: example of ftp.1000genomes.ebi.ac.uk
Downloads only October 2009

18 ftp.era.ebi.ac.uk fasp.era.ebi.ac.uk
Example of preferred use of aspera vs ftp: specific project: European Nucleotide Archive Reads ftp.era.ebi.ac.uk fasp.era.ebi.ac.uk October 2009

19 A view at traffic composition
uploads downloads aspera cumulative (from July 2009) ~20TB ~40TB ftp cumulative (from January 2009) ~8TB ~243TB rsync - courier (from January 2009) ~25TB many projects involve uploading data to EBI, uploads are increasing much more data is downloaded from EBI than uploaded October 2009

20 10GigE London usage statistic
London weekly 10GigE London weekly 10GigE October 2009

21 10GigE London year usage statistic
London 10GigE year October 2009

22 10GigE Duxford weekly usage statistic
Duxford10GigE October 2009

23 Conclusion Thank you for inviting EBI Systems to this workshop.
We are looking forward to further discussions in the near future: from troubleshooting to developing new ideas. We are keen to collaborate with all the network and bioinformatics experts in this workshop to meet the goal of transferring large datasets over the network between China and the EBI in realistic timeframes. Manuela Menchi October 2009

24 Thank you! Manuela Menchi manuela@ebi.ac.uk
Some background, the EBI is based on the Wellcome Trust Genome Campus in Hinxton, which is near Cambridge in UK. We share the campus with the Sanger Institute. The EBI is part of the European Molecular Biology Laboratory and as part of that, we’re a non-profit organisation.


Download ppt "The EMBL-European Bioinformatics Institute"

Similar presentations


Ads by Google