Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.

Similar presentations


Presentation on theme: "Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales."— Presentation transcript:

1 Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales William K. Barnett, Ph.D. Richard LeDuc, Ph.D. National Center for Genome Analysis Support

2 Bio-IT World Asia, June 7, 2012National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Summary Changing genomics analytical needs NCGAS and its mission NCGAS cyberinfrastructure The 100 Gigabit demonstration Scaling genomics analysis The NCGAS research model Outcomes for life sciences research

3 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Changing genomics analytical needs Next Gen sequencers are generating more data and getting cheaper Sequencing is:  Becoming commoditized at large centers and  Multiplying at individual labs Analytical capacity has not kept up  Bioinformatics support  Computational support (thousand points solution)  Storage support Bio-IT World Asia, June 7, 2012

4 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org NCGAS widening the analytical bottleneck Funded by National Science Foundation (grant # ABI- 1062432) Large memory clusters for assembly Bioinformatics consulting for biologists Optimized software for better efficiency Providing services at: http://ncgas.orghttp://ncgas.org Bio-IT World Asia, June 7, 2012

5 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Making it easier for Biologists Galaxy interface provides a “user friendly” window to NCGAS resources Supports many bioinformatics tools Available for both research and instruction. Common Rare Computational Skills LOW HIGH Bio-IT World Asia, June 7, 2012

6 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org NCGAS Service Model Hardware Layer OS Layer Services Layer Applications Bioinformatics Network Layer Public Cloud Providers NCGAS Mason (512 GB/node) Systems Administration Galaxy, Parallelization Hardened Applications and Workflows Expert Consulting 100 Gbps I2 Bio-IT World Asia, June 7, 2012 NEED APIs

7 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org NCGAS Galaxy Applications Model Virtual box hosting Galaxy.Indiana.edu The host for each tool is configured to meet IU needs Quarry Mason Data Capacitor RFS Virtual box hosting Galaxy.NCGAS.org The host for each tool is configured to meet National needs Custom Site Hosting Galaxy.YourSite.??? The host for each tool is configured to meet Your needs Bio-IT World Asia, June 7, 2012

8 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org NCGAS Workflow Demo at SC 11 STEP 1: data pre- processing, to evaluate and improve the quality of the input sequence STEP 2: sequence alignment to a known reference genome STEP 3: SNP detection to scan the alignment result for new polymorphisms Bloomington, INSeattle, WA Bio-IT World Asia, June 7, 2012

9 10 Gbps 100 Gbps Mason IU POD Data Capacitor NCBI Reference Data Lustre WAN File System Large Sequencing Center NCGAS Virtual Genomics Science Instrument International Collaborators via TransPAC, Geant Smaller Sequencing Centers FTP

10 Commodity Internet (1Gbps but highly variable) Internet2 (100Gbps) 0 100 Gbps NLR to Sequencing Centers (10Gbps/link) IU Data Capacitor (20 Gbps throughput) Ultra SCSI 160 Disk (1.2 Gbps, 160 MBps) DDR3 SDRAM (51.2 Gbps, 6.4GBps, ) This Architecture Scales! Bio-IT World Asia, June 7, 2012 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org

11 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org How would this work at scale? 1.Biologists anywhere use Galaxy 2.Sequence data transferred over Research Nets 3.Lustre WAN flows data into Data Capacitor 4.Data Capacitor mounts reference data 5.Results available on Data Capacitor for subsequent analyses (secure to HIPAA standards) Bio-IT World Asia, June 7, 2012

12 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Outcomes for Life Sciences Research… National and international networks have the capacity to handle genomics data. Distributed workflow tools lower the bar for biologists to accomplish genomic science. NCGAS is an extensible model of a scaled and integrated infrastructure for biological research. This model can extend internationally Bio-IT World Asia, June 7, 2012

13 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Thank You Questions? Bill Barnett (barnettw@iu.edu)barnettw@iu.edu Rich LeDuc (rleduc@iu.edu)rleduc@iu.edu Bio-IT World Asia, June 7, 2012


Download ppt "Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales."

Similar presentations


Ads by Google