Presentation is loading. Please wait.

Presentation is loading. Please wait.

Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower 08-08-13 Harvard School of Public Health Department of Biostatistics.

Similar presentations


Presentation on theme: "Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower 08-08-13 Harvard School of Public Health Department of Biostatistics."— Presentation transcript:

1 Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower 08-08-13 Harvard School of Public Health Department of Biostatistics

2 Some setup notes Slides with green titles or text include instructions not needed today, but useful for your own analyses Keep an eye out for red warnings of particular importance Command lines and program/file names appear in a monospaced font. 2

3 Getting some HMP data Go to http://hmpdacc.orghttp://hmpdacc.org 3 Click “Get Data”

4 Getting some HMP data Check out what’s available 4 Click “HMIWGS”

5 Getting some HMP data Check out what’s available 5 Click on your favorite body site

6 Getting some HMP data Check out what’s available 6 Don’t click on anything!

7 Getting some (prepped) HMP data Connect to the server instead and run: –for S in `ls /class/stamps-shared/chuttenh/7*.fasta`; do ln -s $S; done These are subsamples of six HMP files: –SRS014459.tar.bz2  763577454-SRS014459-Stool.fasta –SRS014464.tar.bz2  763577454-SRS014464-Anterior_nares.fasta –SRS014470.tar.bz2  763577454-SRS014470-Tongue_dorsum.fasta –SRS014472.tar.bz2  763577454-SRS014472-Buccal_mucosa.fasta –SRS014476.tar.bz2  763577454-SRS014476-Supragingival_plaque.fasta –SRS014494.tar.bz2  763577454-SRS014494-Posterior_fornix.fasta All six shotgunned body sites from –One subject, first visit –Subsampled to 20,000 reads 7

8 Who’s there: MetaPhlAn 8 Gene X X is a core gene for clade YX is a unique marker gene for clade Y

9 Who’s there: MetaPhlAn Go to http://huttenhower.sph.harvard.edu/metaphlanhttp://huttenhower.sph.harvard.edu/metaphlan 9 Scroll down

10 Who’s there: MetaPhlAn You could download MetaPhlAn by clicking here 10

11 Who’s there: MetaPhlAn But don’t! Instead, we’ve downloaded MetaPhlAn already for you by clicking here 11

12 Who’s there: MetaPhlAn And saving just the file metaphlan.py in stamps-software/bin 12

13 Who’s there: MetaPhlAn You could download the bowtie2 database here 13

14 From the command line... But don’t! Instead, get it by: –for S in `ls /class/stamps-shared/chuttenh/*.bt2`; do ln -s $S; done To see what you can do, run: –module load stamps –metaphlan.py -h | less –Use the arrow keys to move up and down, q to quit back to the prompt 14

15 Who’s there: MetaPhlAn 15

16 Who’s there: MetaPhlAn For future reference: these extra options aren’t necessary if you download the whole “default” MetaPhlAn package –metaphlan.py your_input.fasta > your_output.txt To launch your first analysis, run: –metaphlan.py --bowtie2db mpa 763577454-SRS014459-Stool.fasta > 763577454-SRS014459-Stool.txt 16

17 Who’s there: MetaPhlAn What did you just do? Two new output files: –763577454-SRS014459-Stool.fasta.bowtie2out.txt Contains a mapping of reads to MetaPhlAn markers –763577454-SRS014459-Stool.txt Contains taxonomic abundances as percentages 17

18 Who’s there: MetaPhlAn less 763577454-SRS014459-Stool.fasta.bowtie2out.txt 18

19 Who’s there: MetaPhlAn less 763577454-SRS014459-Stool.txt 19

20 Who’s there: MetaPhlAn Now finish the job: –metaphlan.py --bowtie2db mpa 763577454-SRS014464-Anterior_nares.fasta > 763577454-SRS014464-Anterior_nares.txt –... Note that you can use the up arrow key to make your life easier! 20

21 Who’s there: MetaPhlAn Let’s make a single table containing all six samples: –mkdir tmp –mv *.bowtie2out.txt tmp –merge_tables.py -l -d *.txt | zero.py > 763577454.tsv You can look at this file using less –Note 1: The arguments less -x4 -S will help –Note 2: You can set this “permanently” using export LESS="-x4 -S" 21

22 Who’s there: MetaPhlAn But it’s easier using MeV; go to http://www.tm4.org/mev.htmlhttp://www.tm4.org/mev.html 22 Click “Download”

23 An interlude: MeV Don’t forget to transfer your 763577454.tsv file locally for viewing using scp Unzip, launch MeV, and select File/Load data 23

24 An interlude: MeV Click “Browse” to your TSV file, then –Tell MeV it’s a two-color array –Uncheck “Load annotation” –Click on the upper-leftmost data value 24

25 An interlude: MeV “Load” your data, then make is visible by: –Display/Set Color Scale Limits –Choose Single Gradient, min 0, max 10 25

26 An interlude: MeV Finally, to play around a bit: –Display/Set Element Size/whatever you’d like –Clustering/Hierarchical Clustering –Optimize both gene and sample order –And select Manhattan Distance (imperfect!) 26

27 An interlude: MeV If you’d like, you can –Display/Sample-Column Labels/Abbr. Names 27

28 An interlude: MeV MeV is a tool; imperfect, but convenient –You should likely include just “leaf” nodes Species, whose names start include “s__” You can filter your file using: cat 763577454.tsv | grep -E '(Stool)|(s__)' > 763577454_species.tsv –You can, but might not want to, z-score normalize Adjust Data/Gene-Row Adjustments/Normalize Genes-Rows Many other tools built in – experiment! 28

29 What they’re doing: HUMAnN Back to the task at hand; you could download HUMAnN at: http://huttenhower.sph.harvard.edu/humann http://huttenhower.sph.harvard.edu/humann 29 Click here

30 What they’re doing: HUMAnN...but instead we’ve already downloaded it Expand HUMAnN (no install!) –tar -xzf /class/stamps- shared/chuttenh/sources/humann-0.98.tar.gz Set up a link to the KEGG reference DB: –ln -s /class/stamps-shared/chuttenh/kegg.reduced.udb And although you would normally download USEARCH from here: –http://www.drive5.com/usearch/download.htmlhttp://www.drive5.com/usearch/download.html We’re going to use it preinstalled instead 30

31 What they’re doing: HUMAnN If we weren’t all running this, you’d need to: –Get KEGG – used to be free, now it’s not! Fortunately, we have a HUMAnN-compatible distributable version; contact me... –Index it for USEARCH: usearch6 -makeudb_usearch kegg.reduced.fasta -output kegg.reduced.udb This takes a minute or two, so we’ve precomputed it; thus, forge ahead... 31

32 What they’re doing: HUMAnN Did you notice that we didn’t QC our data at all? –MetaPhlAn is very robust to junk sequence –HUMAnN is pretty robust, but not quite as much We’ve already run a standard metagenomic QC: –Quality trim by removing bad bases (typically Q ~15) –Length filter to remove short sequences (typically <75%) 32

33 What they’re doing: HUMAnN Must start from FASTQ files to do this Quality trim by removing bad bases: –TrimBWAstyle.py your_trimmed_data.fastq Length filter by removing short sequences: –75% of original length is standard (thus 75nt from 100nt reads) –remove_bad_seqs.py 75 your_filtered_data.fastq Now convert your FASTQ to a FASTA: –fastq2fasta.py your_filtered_data.fasta Some final caveats: –If you’re using paired end reads, match filters! –See my course homeworks at http://huttenhower.sph.harvard.edu/bio508http://huttenhower.sph.harvard.edu/bio508 –Aren’t you glad you’re not doing this today? 33

34 What they’re doing: HUMAnN Enter the humann directory –cd humann-0.98 Run your first translated BLAST search: –usearch6.0.192_i86linux32 -usearch_local../763577454-SRS014459-Stool.fasta -db../kegg.reduced.udb -id 0.8 -blast6out input/763577454-SRS014459-Stool.txt What did you just do? –less input/763577454-SRS014459-Stool.txt –Recall BLAST’s tab-delimited output headers: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore Rinse and repeat for the remaining samples 34

35 What they’re doing: HUMAnN Normally you’d need to install SCons from: –http://www.scons.orghttp://www.scons.org Instead, we’ll use it preinstalled as well, so... GO! –/class/stamps-software/scons/bin/scons You should see a bunch of text scroll by –Note: you can run scons -j8 to parallelize tasks 35

36 What they’re doing: HUMAnN After a minute or two, you should see: 36

37 What they’re doing: HUMAnN This has created four main files: –Two each for pathways (big) and modules (small) –Two each for coverage and relative abundance Each is tab-delimited text with one column per sample All four are in the output directory: –output/04a-hit-keg-mpt-cop-nul-nve-nve-xpe.txt Coverage (a) of pathways (t) –output/04a-hit-keg-mpm-cop-nul-nve-nve-xpe.txt Coverage (a) of modules (m) –output/04b-hit-keg-mpt-cop-nul-nve-nve.txt Abundance (b) of pathways (t) –output/04b-hit-keg-mpm-cop-nul-nve-nve.txt Abundance (b) of modules (m) I almost always just use 04b-mpm (module abundances) 37

38 What they’re doing: HUMAnN Let’s take a look: –less output/04b-hit-keg-mpm-cop-nul-nve-nve.txt 38

39 What they’re doing: HUMAnN That’s ugly; it gets much better in Excel –Note: this is very sparse since we’re using a small subset of KEGG –Note: the mock community demo data is included on the right 39

40 What they’re doing: HUMAnN And there’s nothing stopping us from using MeV –Or R, or QIIME, or anything that’ll read tab-delimited text 40

41 What matters: LEfSe All of these analyses give you tables –16S  OTUs –MetaPhlAn  Species –HUMAnN  Modules (or pathways or genes) If you know something about your samples... –Case/control –Different habitats –Different time points How can you identify features that change? 41

42 What matters: LEfSe Let’s get all of the HMP species data: http://hmpdacc.org/resources/data_browser.php http://hmpdacc.org/resources/data_browser.php 42 Click “HMSMCP”

43 What matters: LEfSe Download the MetaPhlAn table for all 700 samples 43 Right click this irritatingly tiny icon

44 Downloading from the command line Instead of saving this, download it by: –Right-click to copy the URL –Run wget –Note: curl –O works just as well 44

45 What matters: LEfSe Make sure this file is in your home directory, and expand it: –bunzip2 HMP.ab.txt.bz2 Look at the result –less HMP.ab.txt IMPORTANT!!! –This file’s too big to analyze directly today –ln -s /class/stamps-shared/chuttenh/HMP.ab.filtered.txt This is great – tons of data, but no metadata –HUMAnN to the rescue –From the humann-0.98 directory: python src/metadata.py input/hmp_metadata.dat../HMP.ab.filtered.metadata.tsv NOW take a look again 45

46 What matters: LEfSe Let’s modify this file to be LEfSe-compatible Open it up in Excel 46

47 What matters: LEfSe Delete all of the metadata rows except: –RANDSID and STSite –Save it as tab-delimited text: HMP.ab.filtered.metadata.txt 47

48 What matters: LEfSe Visit LEfSe at: http://huttenhower.sph.harvard.edu/lefsehttp://huttenhower.sph.harvard.edu/lefse 48 Click here

49 What matters: LEfSe Then upload your formatted table –After you upload, wait for the progress meter to turn green! 49 1. Click here 2. Then here 3. Then here 4. Then watch here

50 What matters: LEfSe Then tell LEfSe about your metadata: 50 1. Click here 2. Then select STSite 3. Then select RANDSID 4. Then here

51 What matters: LEfSe Then select LDA=4, “One-against-all,” and run LEfSe! –You can change other default statistical parameters if desired 51 1. Click here 4. Then GO! 3. Then here (finds differences in at least one condition rather than in all conditions) 2. Then here (finds only very extreme differences)

52 What matters: LEfSe You can plot the results as a bar plot –Again, lots of graphical parameters to modify if desired 52 1. Click here 2. Then here

53 What matters: LEfSe In Galaxy, view a result by clicking on its “eye” 53 Click here

54 What matters: LEfSe 54

55 What matters: LEfSe You can plot the results as a cladogram –Lots and lots of graphical parameters to modify if desired 55 1. Click here 2. Then here

56 What matters: LEfSe 56

57 What matters: LEfSe Finally, you can see the raw data for individual biomarkers –These are generated as a zip file of individual plots 57 1. Click here 2. Then here

58 What matters: LEfSe In Galaxy, download a result by clicking on its “disk” 58 Click here Then here

59 What matters: LEfSe 59 Strep. mitis Veillonellaceae Actinobacteria

60 Summary MetaPhlAn –Raw metagenomic reads in –Tab-delimited species relative abundances out HUMAnN –Quality-controlled metagenomic reads in –Tab-delimited gene, module, and pathway relative abundances out LEfSe –Tab-delimited, stratified relative abundances in –Significantly differentially abundant features out 60

61 Ramnik Xavier Harry Sokol Dan Knights Moran Yassour Thanks! 61 Nicola SegataLevi Waldron Human Microbiome Project Owen White Joe Petrosino George Weinstock Karen Nelson Lita Proctor Dirk Gevers Kat Huang Bruce BirrenMark Daly Doyle WardAshlee Earl http://huttenhower.sph.harvard.edu Joseph Moon Felix Wong Tim Tickle Xochi Morgan Daniela Boernigen Rob Knight Jesse Zaneveld Greg Caporaso Mark Silverberg Boyko Kabakchiev Andrea Tyler Emma Schwager Jim Kaminski Brian Palmer Eric Franzosa Boyu Ren Ren LuKoji Yasuda Sahar Abubucker Brandi Cantarel Alyx Schubert Mathangi Thiagarajan Beltran Rodriguez-Mueller Erica Sodergren Anthony Fodor Marty Blaser Jacques Ravel Pat Schloss Makedonka Mitreva Yuzhen Ye Mihai Pop Larry Forney Barbara Methe Jacques Izard Katherine Lemon Wendy Garrett Michelle Rooks Bruce Sands Ruth Ley Omry Koren Rob Beiko Morgan Langille Jeroen Raes Karoline Faust Interested? We’re recruiting just about everything!

62


Download ppt "Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower 08-08-13 Harvard School of Public Health Department of Biostatistics."

Similar presentations


Ads by Google