Presentation is loading. Please wait.

Presentation is loading. Please wait.

Richard LeDuc, Ph.D. (Manager)

Similar presentations


Presentation on theme: "Richard LeDuc, Ph.D. (Manager)"— Presentation transcript:

1 Richard LeDuc, Ph.D. (Manager)
National Center for Genome Analysis Support Leverages XSEDE Resources to Support Life Scientists William K. Barnett, Ph.D. (Director) Richard LeDuc, Ph.D. (Manager) National Center for Genome Analysis Support XSEDE 2013, San Diego CA, 7/23/2013

2 Summary What is NCGAS? What do we do? How do we do it?
I will assume you know more about HPC than biology. National Center for Genome Analysis Support:

3 Funded by National Science Foundation
Large memory clusters for assembly Bioinformatics consulting for biologists Optimized software for better efficiency Collaboration across IU, TACC, SDSC, and PSC. Open for business at:

4 Making it easier for Biologists
Common Rare Computational Skills LOW HIGH Web interface to NCGAS resources Supports many bioinformatics tools Available for both research and instruction.

5 We Provide: Large RAM computational resources Appropriate storage
Data transport assistance IT (help-desk-like) Support Bioinformatic Consultation and support

6 The services announced today include:
Storage of up to 50 terabytes of research data on IU's Scientific Data Archive tape storage system. Services for curation and long-term storage of data sets and final results from genome research in the IUScholarWorks… • NCGAS will write letters of commitment for consulting, computation, and data storage resources to include with grant proposals …

7 Service Continuum Focus

8 Staffing 3.7 FTE Direct Staff
I’m a biologist with 15 years software engineering experience PhD Computer Scientist Full-time Bioinformatic Analyst 50% of a PhD Genomicist 20% of people above me But we have direct access to the rest of our partner supercomputing centers Customize footer: View menu/Header and Footer 11/22/2018

9 NCGAS Cyberinfrastructure at IU
Rockhopper: 11 servers with 48 cores and 128 GB RAM. Mason large memory cluster: 16 nodes with 32 cores each and 512 GB RAM per node. Data Capacitor: 1 PB at 20 Gbps throughput. SDA – 17(+) PB hierarchical tape archive Additional resources through our XSEDE partners XRAC allocation National Center for Genome Analysis Support:

10 Rockhopper Penguin Computing's Penguin-On-Demand (POD) supercomputing cloud appliance hosted by Indiana University. A collaborative effort between Penguin Computing, IU, the University of Virginia, the University of California Berkeley, and the University of Michigan. Provides supercomputing cloud services in a secure US facility. Researchers at US institutions of higher education and Federally Funded Research and Development Centers (FFRDCs) can purchase computing time from Penguin Computing, and receive access via high-speed national research networks operated by IU. National Center for Genome Analysis Support:

11 Standardized Trinity Analyses
National Center for Genome Analysis Support:

12 Who do we serve?

13 Zhao et al. BMC Bioinformatics 2011, 12(Suppl 14):S2
Haas, B., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P., Bowden, J., Couger, M., Eccles, D., Li, B., Lieber, M., MacManes, M., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C., Henschel, R., LeDuc, R., Friedman, N., and Regev, A. (2013) De novo transcript sequence reconstruction from RNA-Seq using the Trinity platform for reference generation and analysis, Nature Protocols, in press.

14 The Project A project represents a single NSF grant, or similar for Server-on-Demand services. Projects are inherent organic organizational structures used in biology. Researchers know what projects they are on, who else is on the project, what the project is trying to accomplish etc. Projects are frequently widely distributed.

15 Projects “SUGAR”: Schaack lab Undergraduate Genome Analyses at Reed

16 Projects as of Early July 2013

17 How do we support Projects

18 Support varies by Project need
Most projects just need access to the resources, and some technical support. Other projects require more staff interaction; up to and including intellectual contributions to the project. Several projects have utilized “private” Galaxy instances.

19

20 GALAXY.NCGAS.ORG Model Quarry Mason
NCGAS establishes tools, hardens them, and moves them into production. Virtual box hosting Galaxy.ncgas.org The host for each tool is configured individually Individual projects can get duplicate VMs. Quarry Mason Each project can get 50 TB of archive space for raw data. Policies guarantee that untouched data is removed with time. Data Capacitor Archive

21 Current System Globus On-line and other tools Lustre WAN File System
NCGAS Mason (Free for NSF users) 100 Gbps Your Friendly Neighborhood Sequencing Center Optimized Software Lustre WAN File System Globus On-line and other tools IU POD (12 cents per core hour) Data Capacitor NO data storage Charges Your Friendly Neighborhood Sequencing Center Other NCGAS XSEDE Resources… How this works at scale: Biologists use Galaxy to execute workflows Sequence data mounted via Lustre WAN or automatically transferred using Internet2 Data Capacitor flows data into Mason or other computational clusters Data Capacitor mounts or mirrors reference data from NCBI or other sources Results delivered through web interfaces and to visualization or other science tools 10 Gbps Your Friendly Neighborhood Sequencing Center

22 Future Direction

23 NCGAS gives back to XSEDE
NCGAS is a Tier 2 XSEDE Partner. XSEDE allocations are available on Mason (our large RAM cluster). 300,000 SU’s of 0.5 TB RAM nodes with NCGAS support software.

24 Thank You Questions? Bill Barnett Rich LeDuc Le-Shin Wu Carrie Ganote
Tom Doak


Download ppt "Richard LeDuc, Ph.D. (Manager)"

Similar presentations


Ads by Google