Presentation is loading. Please wait.

Presentation is loading. Please wait.

18-21 August 2009 The Biosphere. 18-21 August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell

Similar presentations


Presentation on theme: "18-21 August 2009 The Biosphere. 18-21 August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell"— Presentation transcript:

1 18-21 August 2009 The Biosphere

2 18-21 August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell http://www.rna.ccbb.utexas.edu/

3 18-21 August 2009 Unaligned rRNA sequences in a multiple alignment editor

4 18-21 August 2009 Aligned rRNA sequences in editor

5 18-21 August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell http://www.rna.ccbb.utexas.edu/

6 18-21 August 2009 The 530 Loop of E. coli Stem with canonical Watson-Crick base pairing Bulge Non-canonical G-U basepair Loop

7 18-21 August 2009 530 loop of E.coli & T.jannaschii

8 18-21 August 2009 The 530 loop structure of six species 1

9 18-21 August 2009 Six taxa showing aligned 530 loop region of the 16S rRNA

10 18-21 August 2009 Simlarity matrices comparing the 530 loop sequences and the full rRNA sequences of the six listed taxa A. Similarity matrix for 530 loop B. Similarity matrix for complete 16S rRNA

11 18-21 August 2009 The Biosphere E.coli AqxPyrop T.jannaschii P.freundenreichii M.vannielii S.solfa

12 18-21 August 2009 Acknowledgement of rRNA secondary structure image: Cannone J.J., Subramanian S., Schnare M.N., Collett J.R., D'Souza L.M., Du Y., Feng B., Lin N., Madabusi L.V., Müller K.M., Pande N., Shang Z., Yu N., and Gutell R.R. (2002). The Comparative RNA Web (CRW) Site: An Online Database of Comparative Sequence and Structure Information for Ribosomal, Intron, and Other RNAs. BioMed Central Bioinformatics, 3:2. [Correction: BioMed Central Bioinformatics. 3:15.] Smith T.F., Gutell R., Lee J., and Hartman H. 2008. The origin and evolution of the ribosome. Biology Direct, 3:16. Woese CR. 1987. Bacterial evolution. Microbiol Rev. 1987 51(2):221-71. Zuckerkandl E, Pauling L. 1965. Molecules as documents of evolutionary history. J Theor Biol. 8(2):357-66. Cole, J., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R., Kulam-Syed-Mohideen, A., McGarrell, D., Marsh, T., Garrity, G. and Tiedje, J. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acid Research. 2009. In press. References

13 18-21 August 2009 Sequence Alignment Accuracy, Time, Memory

14 18-21 August 2009 Multiple Sequence Alignment Pairwise dynamic programming –Smith-Waserman, Needleman Wunsch –Can be transformed into probabilistic framework Multidimensional dynamic programming –Not practical Progressive alignment –Muscle, ClustalW –Both are progressive iterative

15 18-21 August 2009 BLAST Heuristic search strategy Locate high-scoring short matches –3aa or 5 to 11 bases Extend short matches Determine significance using extreme value distribution statistics

16 18-21 August 2009 BLAST (cont.) E value –Database dependent Bits –Database independent % Similarity (identity) –For aligned segment s –NOT overall % identity

17 18-21 August 2009 Model Based Alignment Profile Hidden Markov Models –Protein and nucleic acid –Models primary sequence Stochastic Context-Free Grammars –Incorporates RNA secondary structure

18 18-21 August 2009 Profile HMM

19 18-21 August 2009 Hidden Markov Model

20 18-21 August 2009 Hidden Markov Model

21 18-21 August 2009 Hidden Markov Model

22 18-21 August 2009

23 2D Structure Conserved from Domain to Family Diagrams from the Gutell Lab Comparative RNA Web Site (http://www.rna.icmb.utexas.edu)

24 18-21 August 2009 SCFG rRNA Model

25 18-21 August 2009 SCFG Limitations Model primary and secondary structure –Can’t model pseudoknots or higher-order interactions Time complexity O(ML 3 ) –Solved by Nawroki et al. Space complexity O(ML 2 ) –Est 16 GB memory for rRNA –Solved by Eddy Partial sequences –Disrupt internal alignment –Solved by Nawrorki et al.

26 18-21 August 2009

27

28

29 Aligner References MUSCLE http://www.drive5.com/muscle/ http://www.drive5.com/muscle/ BLAST http://blast.ncbi.nlm.nih.gov/ http://blast.ncbi.nlm.nih.gov/ HMMER http://hmmer.janelia.org/ http://hmmer.janelia.org/ INFERNAL http://infernal.janelia.org/ http://infernal.janelia.org/

30 18-21 August 2009 Distance Calculation Phylogenetic methods only score base substitution, not insertion or deletion. Score comparable positions –Mask out unaligned regions, insertions –Ignore positions with deletion

31 18-21 August 2009 Other Common Distances Hamming distance –No gap - insert –Original Blast Edit distance –Penalize for gaps –RDP Probe Match Matching word percentage (q-gram) –Does not require alignment –RDP Sequence Match

32 18-21 August 2009 Clustering Accuracy, Time, Memory

33 18-21 August 2009 Unsupervised Classification (Clustering) Hierarchical Agglomerative –Single Linkage (Nearest neighbor) –Average Linkage (UPGMA) –Compete Linkage (Furthest Neighbor) Partitional Clustering –K-Means –Not often used in this field Self Organizing Maps –Using word frequency

34 18-21 August 2009 Hierarchical Clustering ≤0.03 Complete Linkage Single Linkage

35 18-21 August 2009

36 FastGroupII

37 18-21 August 2009 Supervised Classification K-Nearest Neighbors –SeqMatch, Megan, easyTaxon –Last Common Ancestor Bayesian –RDP Classifier Kernel methods –Support Vector Machines

38 18-21 August 2009

39

40 RDP-II Screenshots fast search algorithm, limit searches to sequences spanning specific regions, change depth and edit distance fast search algorithm, limit searches to sequences spanning specific regions, change depth and edit distance place sequences into bacterial taxonomy, works well with partial or full-length sequences, bootstrap confidence estimate, prior alignment not required place sequences into bacterial taxonomy, works well with partial or full-length sequences, bootstrap confidence estimate, prior alignment not required finds nearest neighbor, more accurate than BLAST, uses “q-gram” matching method finds nearest neighbor, more accurate than BLAST, uses “q-gram” matching method

41

42 18-21 August 2009 RDP Pyrosequencing Pipeline Tools for high-throughput analysis

43 18-21 August 2009 Thirty-One Years of rRNA Sequencing

44 Twenty-Eight Years Later Proc. Natl. Acad. Sci., USA Vol. 103, No. 32, pp 12115-12120, August 2006 www.pnas.org/cgi/doi/10.1073/pnas.0605127103

45 18-21 August 2009 Multiplexed Amplicon Pyrosequencing

46

47 18-21 August 2009 RDP Pyrosequencing Pipeline

48 18-21 August 2009 Initial Processing Steps Sort by barcode (key) Quality filter –Forward & (optional) reverse primers –Ambiguities –Length Trim key & primer sequences

49 18-21 August 2009 Taxonomy Independent Global Alignment Cluster Based OTU Assignment Standard Ecological Metrics Many 3rd Party Data Formats Taxonomy Dependent RDP Classifier Sequence Match Many 3rd Party Data Formats Two Analysis Tracks

50 18-21 August 2009 Infernal Aligner –(Nawrocki and Eddy. 2007, PLoS Comput Biol) Fast - 500/min Probabilistic Model –Model describes shared features Incorporates 2d Structure –Cannone et al. 2002, BioMed Central Bioinformatics Model Based Alignment http://www.rna.icmb.utexas.edu

51 18-21 August 2009 Complete Linkage Clustering (Operational Taxonomic Units) Distance based method Guaranteed intra-cluster distance N 2 algorithm Current online limit 150,000 unique reads Memory-efficient version in testing ≤0.03

52 18-21 August 2009 RDP Naive Bayesian Classifier Fast - 3000/min Places sequences into bacterial taxonomy Works well on partial or full-length sequences Does not require alignment Easily re-trained to match new taxonomies Bootstrap confidence estimates Online GUI - Soap service - Open source

53 18-21 August 2009 From Wang et. al., AEM, 2007 Classifier Accuracy on 200 bp Regions

54 18-21 August 2009 RDP Classifier Bootstrap Performance (Genus Level - Short Reads) V3V6V4 Bootstrap cutoff0%50%80%0%50%80%0%50%80% Human Gut % classified10092.482.310073.540.410097.087.9 % matching92.095.098.179.096.598.792.894.595.7 Soil % classified10071.348.310032.716.710074.456.3 % matching70.085.594.648.080.084.384.193.396.8


Download ppt "18-21 August 2009 The Biosphere. 18-21 August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell"

Similar presentations


Ads by Google