Download presentation
Presentation is loading. Please wait.
Published byAllison George Modified over 6 years ago
1
NGS data transmission, A point view from a user
LU Gang Chinese Human Genome Center of Shanghai 2010/10/7
2
outline Data challenge in NGS era
Experience in submission of Schistosoma japonicum project. Peer-to-peer network Conclusion
3
Challenge in NGS era Massive data generation
Solexa GAIIX 100G Solexa HiSeq ~300G SOLiD G SOLiD4 HQ 300G *More than 1500 NGS sequencer installed. Huge volume rawdata for genome project
4
Experience on Schistosoma japonicum(Sj) project
Summary of Sj draft project Asian blood fluke Whole genome short gun, ~4 million capillary reads ~397M genome size Nature 460, (16 July 2009) | doi: /nature08140; The Schistosoma japonicum genome reveals features of host–parasite interplay The Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium
5
Data submitted No raw reads 95,265 contigs, 25,048 scaffolds
13,469 protein coding genes No raw reads
6
It’s a pleasant journey to make the submission to EBI on this project.
However several bottlenecks exists. Site bandwidth limit from userside Bandwidth utilization Data Transfer protocol
7
P2P, New way to transfer data?
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or work loads between peers. Peers are equally privileged, equipotent participants in the application. They are said to form a peer-to-peer network of nodes. Peers make a portion of their resources, such as processing power, disk storage or network bandwidth, directly available to other network participants, without the need for central coordination by servers or stable hosts. Peers are both suppliers and consumers of resources, in contrast to the traditional Client/Server model where only servers supply, and clients consume.
8
If P2P, Bandwidth limit – different route
A user maybe don’t have fast connection to international center like NCBI,EBI, but maybe he have a neighbor on network who has faster link to Europe. Bandwidth utilization – much higher According to personal experience, FTP transfer speed normally won’t be higher than 200kB, most time around 100kB, But for P2P download, if the file is hot, most time it is on full speed of the bandwidth Data Transfer protocol – advanced data compression algorithms can be applied to shorten transfering time.
9
We need Two-way BitTorrent
Chop into pieces Directly send to destination or ask a neighbor next to you in network to send it for you Collect all data trunk at receiver side, unpack and assemble them into original files.
10
Network structure Similar to skype, KaZaA, Normal node (normal user)
Super node (any node with good connection and computing resources Cluster (super node and normal nodes close to it make a cluster) network over internet. (cluster to cluster, make the whole network)
11
User experience will be
A single transfer program, can packing and encrypt multi data files at a time, easy to use User can choose how much bandwidth it will used, flexible. The program runs background automatically as a services, comfortable.
12
Ranking board The more people participated in this network, the total data transfer speed will be faster. To encourage more people to use it, a ranking board can be set. People helps more in uploading data can get better priority in further data exchange. Uploading raise the rank level.
13
Conclusion Improve hard links speed
P2P can help improve aggregated bandwidth Build a community network to share the data, include upload and download
14
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.