Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Genomic BIG DATA with Content Centric Networking J.J. Garcia-Luna-Aceves UC Santa Cruz

Similar presentations


Presentation on theme: "Enabling Genomic BIG DATA with Content Centric Networking J.J. Garcia-Luna-Aceves UC Santa Cruz"— Presentation transcript:

1 Enabling Genomic BIG DATA with Content Centric Networking J.J. Garcia-Luna-Aceves UC Santa Cruz jj@soe.ucsc.edu

2 Example Today  Cancer Genome Hub (CGhub)  CGhub’s purpose was to store the genome’s sequenced as part of The Cancer Genome Atlas (TCGA) project.  At about 300 GB/genome this translated to about 17,000 genomes over the 44 month lifetime of the project.  Transmission requirements of archiving effort reached a sustained rate of 17 Gbps by the end of the 44 month project

3 Cancer Genomics Hub (UCSC) is Housed in SDSC CoLo: Large Data Flows to End Users 1G 8G 15G Cumulative TBs of CGH Files Downloaded Data Source: David Haussler, Brad Smith, UCSC 30 PB

4 Example Today CGHub had to use current technology:  Data organization and search: XML schema definitions  Security: Existing techniques, HTTPS  Big data transfer: Modified Bit Torrent (Gene Torrent or GT) HTTPS and Bit Torrent are problematic No caching with HTTPS TCP limitations percolate to multiple connections under BT or GT A potential playground for DDoS?

5 The Future of Genomic BIG DATA  Is the Internet ready to support personalized medicine?  Is the future of genomic data really different?  If not, what technology would be limiting progress? First:  Genomic data are really BIG DATA.  Personalized medicine will make genomic data volumes explode, and many other applications of genomic data will develop  Even if one site or a few mirrors are used for a personal genome, it has to be uploaded.

6 Is Technology Ready in in 5-10 Years?  Communication, storage and and computing technologies are not the problem: – Production optical transport @ 1 Tbpshttp://www.lightreading.com/document.asp?d oc_id=188442&http://www.lightreading.com/document.asp?d oc_id=188442& – Individual hosts able to transmit at 100 Gbps – I/O throughput can keep up with the network speeds (i.e., disk will be able to handle 100 Gbps = 12.5 GBps). – Memory and processing costs will continue to decline.

7 Networking is The BIG PROBLEM for Genomic BIG DATA  Speed of light will not increase but number of genomic data repositories or distance between them will  Internet protocol stack was not designed for BIG DATA transfer over paths with large bandwidth- delay products: –TCP throughput –DDoS vulnerabilities (e.g., SYN flooding) –Caching vs privacy (e.g., HTTPS) –Static directory services (e.g., DNS vs content directories).

8 Sobering Results for Today’s Internet  TCP and variations (e.g., BT) cannot be the baseline to support big data genomics  Storage must be used to reduce bandwidth-delay products Simulation results -4-day simulation -20 locations -40 Gbps links with 5 to 25ms latency - ave. degree of 5 TCP (client/server) Content centric approach

9 Internetworking BeND  TCP/IP architecture must change for BIG DATA, but how?  Content Centric Network architectures (CCN) such as NDN and CCNx have been proposed  The main advantage of CCN solutions is caching  But…NDN and CCNx still at early stages of development  Big Data Networking is all about bandwidth-delay product, not replacing IP addresses with names

10


Download ppt "Enabling Genomic BIG DATA with Content Centric Networking J.J. Garcia-Luna-Aceves UC Santa Cruz"

Similar presentations


Ads by Google