Lin Fang Director, Computer Platform BGI, Shenzhen Oct.10th, 2009 BGI IT Infrastructure Lin Fang Director, Computer Platform BGI, Shenzhen Oct.10th, 2009
Summary Data generation Computer capacity Current networks Future
Data generation Shenzhen, 20 GAII, 76bp PE reads & 100bp PE reads, 1.2T RTA data/day Hongkong, 76bp PE reads, 8 GAII, 500G RTA data/day Beijing, 1 GA, 36bp SE reads, 200G raw data/day 4 month queuing jobs need sequencing!
Computer Capacity 300 nodes 4000 CPU cores 4T memory 2.5P storage 30 Tflops peak value
Current Networks Beijing Center Hangzhou Center Shenzhen Center CSTNET 20M CNC SDH 2M Hangzhou Center CNC SDH 8M CNC 5M Shenzhen Center CSTNET 20M China Telecom Green 10M Hongkong Center China Telecom 10M CSTNET 20M PCCW 2M
Data Distributing EBI FTP 10Mb/S Aspera 9Mb/S FTP 10Mb/S Beijing Center NCBI Shenzhen Center
By the end of 2009 150bp PE reads 50Gbp/run, 500Gbp/day 5T/day fastaq data 15T RTA data/day Triple sequencing machine and what will be…
Expand Computer Capacity 1000 nodes 12000 CPU cores 10P storage 100 Tflops peak value
Networks to be… Beijing Center Hangzhou Center Shenzhen Center CSTNET 20M Hangzhou Center 20M VPN Share CSTNET 5M VPN Between CNC & CSTNET Shenzhen Center CSTNET >20M For service & data trans. 20M VPN Share CSTNET China Telecom Green 10M For office Hongkong Center CSTNET 20M
Data Distributing to be… EBI FTP 10Mb/S Aspera 9Mb/S FTP 10Mb/S Aspera 9Mb/S NCBI Shenzhen Center
Cost and Efficiency 100K RMB/month for internet connection, transmits 240G/day 400 RMB/HD to transport 1T data How to distribute 15T data every day?
BGI Effort Mirroring first class biological database Managing data generated by BGI Build bioinformatics “cloud” center
Mirrors EnsEMBL, 6 releases now! http://ensembl.genomics.org.cn http://ensembl.genomics.org.cn:8050 http://ensembl.genomics.org.cn:8051 http://ensembl.genomics.org.cn:8052 http://ensembl.genomics.org.cn:8053 http://ensembl.genomics.org.cn:8054 EnsEMBL Bacteria Browser http://bacteria.genomics.org.cn EnsEMBL Protists Browser http://protists.genomics.org.cn UCSC Genome Browser, processing… http://ucsc.genomics.org.cn
Plan CLOUD http://cloud.genomics.org.cn Integrated biological data and bioinformatics tools in a single interface Click to go professional pipeline for Digital Gene Express, RNA analysis etc. Flexible workflow design Knowledge managing and mining It’s FREE!
Thanks!