Download presentation
Presentation is loading. Please wait.
Published byAlberta Charles Modified over 9 years ago
1
18-21 August 2009 The Biosphere
2
18-21 August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell http://www.rna.ccbb.utexas.edu/
3
18-21 August 2009 Unaligned rRNA sequences in a multiple alignment editor
4
18-21 August 2009 Aligned rRNA sequences in editor
5
18-21 August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell http://www.rna.ccbb.utexas.edu/
6
18-21 August 2009 The 530 Loop of E. coli Stem with canonical Watson-Crick base pairing Bulge Non-canonical G-U basepair Loop
7
18-21 August 2009 530 loop of E.coli & T.jannaschii
8
18-21 August 2009 The 530 loop structure of six species 1
9
18-21 August 2009 Six taxa showing aligned 530 loop region of the 16S rRNA
10
18-21 August 2009 Simlarity matrices comparing the 530 loop sequences and the full rRNA sequences of the six listed taxa A. Similarity matrix for 530 loop B. Similarity matrix for complete 16S rRNA
11
18-21 August 2009 The Biosphere E.coli AqxPyrop T.jannaschii P.freundenreichii M.vannielii S.solfa
12
18-21 August 2009 Acknowledgement of rRNA secondary structure image: Cannone J.J., Subramanian S., Schnare M.N., Collett J.R., D'Souza L.M., Du Y., Feng B., Lin N., Madabusi L.V., Müller K.M., Pande N., Shang Z., Yu N., and Gutell R.R. (2002). The Comparative RNA Web (CRW) Site: An Online Database of Comparative Sequence and Structure Information for Ribosomal, Intron, and Other RNAs. BioMed Central Bioinformatics, 3:2. [Correction: BioMed Central Bioinformatics. 3:15.] Smith T.F., Gutell R., Lee J., and Hartman H. 2008. The origin and evolution of the ribosome. Biology Direct, 3:16. Woese CR. 1987. Bacterial evolution. Microbiol Rev. 1987 51(2):221-71. Zuckerkandl E, Pauling L. 1965. Molecules as documents of evolutionary history. J Theor Biol. 8(2):357-66. Cole, J., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R., Kulam-Syed-Mohideen, A., McGarrell, D., Marsh, T., Garrity, G. and Tiedje, J. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acid Research. 2009. In press. References
13
18-21 August 2009 Sequence Alignment Accuracy, Time, Memory
14
18-21 August 2009 Multiple Sequence Alignment Pairwise dynamic programming –Smith-Waserman, Needleman Wunsch –Can be transformed into probabilistic framework Multidimensional dynamic programming –Not practical Progressive alignment –Muscle, ClustalW –Both are progressive iterative
15
18-21 August 2009 BLAST Heuristic search strategy Locate high-scoring short matches –3aa or 5 to 11 bases Extend short matches Determine significance using extreme value distribution statistics
16
18-21 August 2009 BLAST (cont.) E value –Database dependent Bits –Database independent % Similarity (identity) –For aligned segment s –NOT overall % identity
17
18-21 August 2009 Model Based Alignment Profile Hidden Markov Models –Protein and nucleic acid –Models primary sequence Stochastic Context-Free Grammars –Incorporates RNA secondary structure
18
18-21 August 2009 Profile HMM
19
18-21 August 2009 Hidden Markov Model
20
18-21 August 2009 Hidden Markov Model
21
18-21 August 2009 Hidden Markov Model
22
18-21 August 2009
23
2D Structure Conserved from Domain to Family Diagrams from the Gutell Lab Comparative RNA Web Site (http://www.rna.icmb.utexas.edu)
24
18-21 August 2009 SCFG rRNA Model
25
18-21 August 2009 SCFG Limitations Model primary and secondary structure –Can’t model pseudoknots or higher-order interactions Time complexity O(ML 3 ) –Solved by Nawroki et al. Space complexity O(ML 2 ) –Est 16 GB memory for rRNA –Solved by Eddy Partial sequences –Disrupt internal alignment –Solved by Nawrorki et al.
26
18-21 August 2009
29
Aligner References MUSCLE http://www.drive5.com/muscle/ http://www.drive5.com/muscle/ BLAST http://blast.ncbi.nlm.nih.gov/ http://blast.ncbi.nlm.nih.gov/ HMMER http://hmmer.janelia.org/ http://hmmer.janelia.org/ INFERNAL http://infernal.janelia.org/ http://infernal.janelia.org/
30
18-21 August 2009 Distance Calculation Phylogenetic methods only score base substitution, not insertion or deletion. Score comparable positions –Mask out unaligned regions, insertions –Ignore positions with deletion
31
18-21 August 2009 Other Common Distances Hamming distance –No gap - insert –Original Blast Edit distance –Penalize for gaps –RDP Probe Match Matching word percentage (q-gram) –Does not require alignment –RDP Sequence Match
32
18-21 August 2009 Clustering Accuracy, Time, Memory
33
18-21 August 2009 Unsupervised Classification (Clustering) Hierarchical Agglomerative –Single Linkage (Nearest neighbor) –Average Linkage (UPGMA) –Compete Linkage (Furthest Neighbor) Partitional Clustering –K-Means –Not often used in this field Self Organizing Maps –Using word frequency
34
18-21 August 2009 Hierarchical Clustering ≤0.03 Complete Linkage Single Linkage
35
18-21 August 2009
36
FastGroupII
37
18-21 August 2009 Supervised Classification K-Nearest Neighbors –SeqMatch, Megan, easyTaxon –Last Common Ancestor Bayesian –RDP Classifier Kernel methods –Support Vector Machines
38
18-21 August 2009
40
RDP-II Screenshots fast search algorithm, limit searches to sequences spanning specific regions, change depth and edit distance fast search algorithm, limit searches to sequences spanning specific regions, change depth and edit distance place sequences into bacterial taxonomy, works well with partial or full-length sequences, bootstrap confidence estimate, prior alignment not required place sequences into bacterial taxonomy, works well with partial or full-length sequences, bootstrap confidence estimate, prior alignment not required finds nearest neighbor, more accurate than BLAST, uses “q-gram” matching method finds nearest neighbor, more accurate than BLAST, uses “q-gram” matching method
42
18-21 August 2009 RDP Pyrosequencing Pipeline Tools for high-throughput analysis
43
18-21 August 2009 Thirty-One Years of rRNA Sequencing
44
Twenty-Eight Years Later Proc. Natl. Acad. Sci., USA Vol. 103, No. 32, pp 12115-12120, August 2006 www.pnas.org/cgi/doi/10.1073/pnas.0605127103
45
18-21 August 2009 Multiplexed Amplicon Pyrosequencing
47
18-21 August 2009 RDP Pyrosequencing Pipeline
48
18-21 August 2009 Initial Processing Steps Sort by barcode (key) Quality filter –Forward & (optional) reverse primers –Ambiguities –Length Trim key & primer sequences
49
18-21 August 2009 Taxonomy Independent Global Alignment Cluster Based OTU Assignment Standard Ecological Metrics Many 3rd Party Data Formats Taxonomy Dependent RDP Classifier Sequence Match Many 3rd Party Data Formats Two Analysis Tracks
50
18-21 August 2009 Infernal Aligner –(Nawrocki and Eddy. 2007, PLoS Comput Biol) Fast - 500/min Probabilistic Model –Model describes shared features Incorporates 2d Structure –Cannone et al. 2002, BioMed Central Bioinformatics Model Based Alignment http://www.rna.icmb.utexas.edu
51
18-21 August 2009 Complete Linkage Clustering (Operational Taxonomic Units) Distance based method Guaranteed intra-cluster distance N 2 algorithm Current online limit 150,000 unique reads Memory-efficient version in testing ≤0.03
52
18-21 August 2009 RDP Naive Bayesian Classifier Fast - 3000/min Places sequences into bacterial taxonomy Works well on partial or full-length sequences Does not require alignment Easily re-trained to match new taxonomies Bootstrap confidence estimates Online GUI - Soap service - Open source
53
18-21 August 2009 From Wang et. al., AEM, 2007 Classifier Accuracy on 200 bp Regions
54
18-21 August 2009 RDP Classifier Bootstrap Performance (Genus Level - Short Reads) V3V6V4 Bootstrap cutoff0%50%80%0%50%80%0%50%80% Human Gut % classified10092.482.310073.540.410097.087.9 % matching92.095.098.179.096.598.792.894.595.7 Soil % classified10071.348.310032.716.710074.456.3 % matching70.085.594.648.080.084.384.193.396.8
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.