1 iPlant: Cyberinfrastructure for Plant Sciences (and Beyond) Your Name Here 1
One Big Problem...
Transfer Storage Analysis Visualization Metadata Mark-up Search and Discover Share/Collaborate Publish Transfer Storage Analysis Visualization Metadata Mark-up Search and Discover Share/Collaborate Publish
High-throughput Data Acquisition In 11 Days Generates 4TB of raw data 600,000,000,000 bases of DNA sequence (200 human genomes)
$70K for~30 camera sets ~200 movies of plants undergoing a dynamic growth process “Only” 4GB a day High-throughput Phenotyping (Watching Grass Grow)
Big Data in Ecology Global Multidimensional Data
GGGTGCCCAAAAGCCCGGTTTGTTAGCCCCTTTCCGATTCCCTCACCCAATCTCATTAAAAGCAAGCCCAGCAGGCCCTGCCTTAACCTGTCCCCAGACAGCCAGCCCTCACCAGGCCGCTGGCATTACTCAATGCTCCGCCACGA AGCAAGCTCTCCCGAATACGACAGATGCGGAAGTGGCTCAAGAAGTCGGAGGAATCAAGTAGTTGGAAAACTGTATGCACGAGGGGACATGAGTCTTCTGGGAATTGGCAACATTGGCAGAAAATAAAGGGGAATACAAGGGGGGT AGGAATCCACTTTGTTAGGTGTAGCTATACTCACGTAAGTAGTCGGCCTAACCTTCGGTTCCCGTAACCAAGTTGTTCTTTCTCACTCCTGATTGACTTTTGATTACTGAATCCATACTTTTTTACTTTTTTTGAATTTGAAGTGTGGGGA AAAGGGCGCCCTCTACTTCTACTTCTAACTACAGGCGAAAAGCTGGCATTGCAAGCAAATAGAGAGCCCCCGCCCGTTTGAGTCGTTGCGAGCCGGAAAGCGTACCAGCGGTTTGAGTCGCGAAAGGGCCGCTTGCTTAATTATAT TATAATATAATATATATAATCTTCTATCTCTATCTATCAACAATAAAATCAGAAGAAAGTAAAAAAAATATAAAAAAGAAAATCATTTTTTGTATCCAATTTTGCATTCCTGGGAAGAGGAAGAAGCAGATAGAGCAAAGGCCTCCTCTTTC CGTCCGCTCTTCCCGAAGTGAGCGAATTGCATGTAGAGATCCGTAGGGGCTTATAGTTTAATTGGTTGAAACGTACCGCTCATAACGGTGATATTGTAGGTTCGAGCCCTACTAAGCCTACCACCCCCTTCTCTTCACCCGATACAAG GCAGTCGAAGTCCCCGCCACCCTGCAGATCTCAATCTAGCGACGGCACCTAGAACCACACTGCTGCCGCTGCCCGAAGGGCACGCCTCCTACGCTCTTGCAGCATGCCCCCTTCGGGGCAGATGTTACTATACTAAAAAAGAAGG CCCTCGCTAAGCGCTGGTTCTATCCCGGCCAAGCAACCAAGGTGGGAATAGTGAACGAAAGAGAAGGACATTGTTCAGAGTGAAACTGAACCCCCTTGATCAATCCTGTAAGAACGAAGACTTCACCAATCGACCAATGGGCCTTTC CTTGTAGGCGGCGAAGGGCAGGTGAACACTCTTCCCTTGGAACCTGCGCATAATGAAATATAACATTTTTTACTTCCATGGTCATACTATATTTATCTTTATATTGCGAATGAGTCTGGACCATCTCCTATTGTAGTATCAAAATGAATA TGACTTTACTTTGAAGTTTAGCCCTCTTTCAACAGTATGACAACCTTCCCAATCACTCGGTTCAATCCTTACCTGAGGATGACAAGGCTTGGCTGATAGGCCGAGGACGAAGCATGCACCTTCGCTTAACCTTCGATGTTGTCATCAC GCTGCCTTTCGCATGTCGGGCCTATACACGCTCGTCAAGTTACACCTTAACTGCTTTCTCAACGCGCGGCTCTTATAGATAACCCTTCCTTATCAATCAAATAAGCATTTGTGAGTTGAGATTCCTTCCCTATGTTTCAAAGCTAGCTTC TCTAGCTATACTGTGTGACCCACCTCCTCCCTTCGCTCTCTTGCTAGAGCTGGTTCTAAGCCTACCTTTTCCTTCCCCACCCTCTGTCTGACCAACCTCTGACTTCGATAATGACCTATAAACAATTATTCCAAGTGAAACGATAAGCAT GTTGGCTTAGACCTTATCTTACCAGACAAGACATTGATGTCTGCCCGAACATCGAAATGAATCTTTCATATGCGGATTTTCACTTCCCCTCATTAAGCTAGCTAGCGCAAGTGTCAGAAAGGATGAATGCATTCCGAGATCGAATTAGC CCCTTACATCTTAGAACATTTTATACAAGGAATGTGTGTGACCTCTCTGATAGTAAGAGCGCACTAACGGAAAGAGAAATTGTTATAAGGAAAGAGAACACACTAGGGTAGAGTGGAAAGGGAACAGGAAAAAACTTTAGTCGACTAA CTCTAGTATTCATGCTAGAACAGAGCTCCCGAAATAAAATATATCAATTATAGCGCTTCATCACTTGAAATAGGATCTTGCCTACGGCCCTAGTACTTTACTTTAGTCGATCGACATCTCATTAGCAAACAAACATAGAAGAGTCAGCTT CCTCAGTCTTGGTTATCGAGTTATCTTACCTGACAGGGTCGGCTAGGTGAGTTTGATTCCATTCCCGTGGCAAAAGGAAAAGAGCTTGATATCCGGGCTTCTATCGGTGAAGAAATGTTATGCCCACGGTTCCGTACTAAAGAATGAG CCAACAGCTATCTCCTTAGCTTCTTAAGGCACTCTTTTTCTGTTTAGTTATTGGTAATCCATCCGAGTGATCTATCTTATCGATAAAGAAATTCTCTCCCCTTACCGATCTTGTTATGCCTCCCGCGGTACATACAAAGGAACCTTCTATC CCATCGGTTAATCAAAGAAATTAGGTGCTCCTACGCCTGAAGTTATCGGTGAAGGCTTCCCCTCCATTTGATCTGTAGGATATCGAGTTTTCTTACCGCCTCTATCGGCTATGGGATATGCAATTCTCTTCTCTGACTTAACACAGAGC AAAGTAGACTGATTTCGCGCTAGTGCTAGTACACGAGTAGACCGCTTTCACCTAGCTATTGCTCACTAACAGAACCTTCTCGTACTGGAGAAAAGAACTTGAGCTCTGCTTCGAGGAACTAGCAGTCGAAGGGTGACGATTTCTGATC ACTGGATTCAAGAGCTTTTAGGGTGTTCGGAACAGTTATTAGTAGAAGATAAGACTTTCTCGGCTTGTTTACTAAGTCTCTGATTCGAATAAGCGACTCGGAACTCTGTTCGCGGTTAGCTGAGAATGTTCTTGCTTCTTGCCAGTTAG ATTAGCTTGAAAGGGAATGAGTGAGTCGAAAGTATGACAACGGGCATAGATAGAGGAGTTCCTGATCCCGGTACTAGGGCGAATGGCATAACTGCTTCTTTCTCTTTTTACGGGTAGAATCCGCTATAGTTGAGGAAGCCCAGAGAT GAGGATAAAATCTCTTGTTTAAGAAGCAACTCATGTTTCAGG Transforming genomes of information into knowledge Biologist IS5gndwbbKwbbJ wbbL ~1kb Transfer Storage Analysis Visualization Metadata Mark-up Search and Discover Share/Collaborate Publish Transfer Storage Analysis Visualization Metadata Mark-up Search and Discover Share/Collaborate Publish Cyberinfrastructure for Plant Sciences (and Beyond): Scalable Capable Extensible Cyberinfrastructure for Plant Sciences (and Beyond): Scalable Capable Extensible
Cyberinfrastructure Philosophy 8 iPlant’s CI uses the pillars of CIF21 – High Performance Computing – Data and Data Analysis – Virtual Organization (VO) – Learning and Workforce iPlant is a shining example of a VO for CI creation, delivery and support.
What is Cyberinfrastructure? Connecting Scientists and Computation Experimental Verifiability, Reproducibility, and Provenance User Identity Management (Collaborative trust) Connections Between Resources Multiple Access Levels Facilitate Collaborations Democratizing Supercomputing for Research Liberating High-Throughput Data Acquisition Integrate Software Data Scalable Data Management Computing Resources
End Users Computational Users iPlant Layered Services and Access TeraGrid XSEDE
iPlant Discovery Environment Managing and Integrating: Data, Tools, Analysis
Tool Integration and Publication
iPlant Data Store Free Your Data Different Users, Different Access Needs: One Data Store Different Users, Different Access Needs: One Data Store
iPlant Data Store Free Your Data WebDAV DE i-commandsiDrop API iPlant Data Store Desktop Folder Discovery Environment Command Line (HPT) i-Drop (HPT) API iPlant Data Store Desktop Folder Discovery Environment Command Line (HPT) i-Drop (HPT) API
Atmosphere: Servers and Software on Demand Use Your iPlant Credentials
Atmosphere
Atmosphere Plus VNC
Atmosphere New Images Dear iPlant staff, I have a running instance that has a few mainstream genome assemblers and gene prediction pipeline installed. The details are as below. Would you please create an image from this running instance? I'd like to talk about this during the upcoming iPlant workshop at PAG. Thanks ~ 1.Username: tanghaibao 2.IP Address of the instance: Name of the Image: TwigToGenome 4.Description of the Image: This image includes three genome assemblers, a gene annotation pipeline and several visualization tools. The VM image is intended for researchers who start with raw reads, and follow a standard pipeline of preparation, assembly, annotation and visualization for genomic data. List installed software: CABOG (v7.0-prerelease)/ALLPATHS-LG (v40173)/SOAPdenovo (v1.05) installed in /usr/local/packages and symlinked in /usr/local/bin; MAKER (v2.11) in /opt/maker; SNAP/BLAST/REPEATMASKER/EXONERATE installed as part of MAKER; GENEMARK(v2.8a)/AUGUSTUS(v2.5.5) installed in /usr/local/packages and symlinked in /usr/local/bin; Python modules in /usr/lib/python26/site- packages (on default search path of python26). The image is built on top of image "New NGS viewers". Unmount EBS Volumes: I had vol-9B mounted in /data/ during testing. Currently unmounted. iPlant-managed System Files: n/a Image Tags: assembly, annotation, visualization, pipeline, twig to genome, jcvi, allpaths, soapdenovo, cabog, maker Do you want the visibility of the image to be private, public, or select users?: Publicly visible. -- Haibao Tang, Ph. D. Senior Bioinformatics Engineer J. Craig Venter Institute 9704 Medical Center Dr. Rockville, MD, Office Phone: ; Fax: Dear iPlant staff, I have a running instance that has a few mainstream genome assemblers and gene prediction pipeline installed. The details are as below. Would you please create an image from this running instance? I'd like to talk about this during the upcoming iPlant workshop at PAG. Thanks ~ 1.Username: tanghaibao 2.IP Address of the instance: Name of the Image: TwigToGenome 4.Description of the Image: This image includes three genome assemblers, a gene annotation pipeline and several visualization tools. The VM image is intended for researchers who start with raw reads, and follow a standard pipeline of preparation, assembly, annotation and visualization for genomic data. List installed software: CABOG (v7.0-prerelease)/ALLPATHS-LG (v40173)/SOAPdenovo (v1.05) installed in /usr/local/packages and symlinked in /usr/local/bin; MAKER (v2.11) in /opt/maker; SNAP/BLAST/REPEATMASKER/EXONERATE installed as part of MAKER; GENEMARK(v2.8a)/AUGUSTUS(v2.5.5) installed in /usr/local/packages and symlinked in /usr/local/bin; Python modules in /usr/lib/python26/site- packages (on default search path of python26). The image is built on top of image "New NGS viewers". Unmount EBS Volumes: I had vol-9B mounted in /data/ during testing. Currently unmounted. iPlant-managed System Files: n/a Image Tags: assembly, annotation, visualization, pipeline, twig to genome, jcvi, allpaths, soapdenovo, cabog, maker Do you want the visibility of the image to be private, public, or select users?: Publicly visible. -- Haibao Tang, Ph. D. Senior Bioinformatics Engineer J. Craig Venter Institute 9704 Medical Center Dr. Rockville, MD, Office Phone: ; Fax:
How to get access:
iPlant APIs Resources Cyberinfrastructure for Life Sciences Scalable Capable Extensible Cyberinfrastructure for Life Sciences Scalable Capable Extensible
iPlant’s Building Blocks 74 MetadataDataToolsWorkflowsViz
Staff: Greg Abram Sonali Aditya Roger Barthelson Brad Boyle Todd Bryan Gordon Burleigh John Cazes Mike Conway Karen Cranston Rion Doodey Andy Edmonds Dmitry Fedorov Michael Gatto Utkarsh Gaur Cornel Ghiban Michael Gonzales Hariolf Häfele Matthew Hanlon iPlant’s Building Blocks 74 MetadataDataToolsWorkflowsViz Executive Team: Steve Goff Dan Stanzione Faculty Advisors & Collaborators: Ali Akoglu Greg Andrews Kobus Barnard Sue Brown Thomas Brutnell Michael Donoghue Casey Dunn Brian Enquist Damian Gessler Ruth Grene John Hartman Matthew Hudson Dan Kliebenstein Jim Leebens-Mack David Lowenthal Robert Martienssen Students: Peter Bailey Jeremy Beaulieu Devi Bhattacharya Storme Briscoe Ya-Di Chen John Donoghue Steven Gregory Yekatarina Khartianova Monica Lent Amgad Madkour B.S. Manjunath Nirav Merchant David Neale Brian O’Meara Sudha Ram David Salt Mark Schildhauer Doug Soltis Pam Soltis Edgar Spalding Alexis Stamatakis Ann Stapleton Lincoln Stein Val Tannen Todd Vision Doreen Ware Steve Welch Mark Westneat Andrew Lenards Zhenyuan Lu Eric Lyons Naim Matasci Sheldon McKay Robert McLay Angel Mercer Dave Micklos Nathan Miller Steve Mock Martha Narro Praveen Nuthulapati Shannon Oliver Shiran Pasternak William Peil Titus Purdin J.A. Raygoza Garay Dennis Roberts Jerry Schneider Anthony Heath Barbara Heath Matthew Helmke Natalie Henriques Uwe Hilgert Nicole Hopkins Eun-Sook Jeong Logan Johnson Chris Jordan B.D. Kim Kathleen Kennedy Mohammed Khalfan Seung-jin Kim Lars Koersterk Sangeeta Kuchimanchi Kristian Kvilekval Aruna Lakshmanan Sue Lauter Tina Lee Bruce Schumaker Sriramu Singaram Edwin Skidmore Brandon Smith Mary Margaret Sprinkle Sriram Srinivasan Josh Stein Lisa Stillwell Kris Urie Peter Van Buren Hans Vasquez-Gross Matthew Vaughn Fusheng Wei Jason Williams John Wregglesworth Weijia Xu Jill Yarmchuk Aniruddha Marathe Kurt Michaels Dhanesh Prasad Andrew Predoehl Jose Salcedo Shalini Sasidharan Gregory Striemer Jason Vandeventer Kuan Yang Postdocs: Barbara Banbury Jamie Estill Bindu Joseph Christos Noutsos Brad Ruhfel Stephen A. Smith Chunlao Tang Lin Wang Liya Wang Norman Wickett
23 Comments Emphasize: – Scalable – Extensible Getting introduction – Quick start guide – wiki – forums Last slide: link to resources 23
24 Emphasize what CI gets us Dress up the list of stuff slides – Less words, more animation/images More bullet points for other people to know the walk away message – Work with Dan 24