Doug Raiford Lesson 7
RNA World Hypothesis RNA world evolved into the DNA and protein world DNA advantage: greater chemical stability Protein advantage: more flexible and efficient enzymes (biomolecules that catalyze) ▪ 20 amino acids vs. 4 nucleotides ▪ Chemically, more diverse Remnants remain in ribosomes, nucleases, polymerases, and splicing molecules
Primary: sequence Secondary: double stranded regions Reverse complements Tertiary: three- dimensional structure >tRNA. Carries amino acid for Isolucine AGGCUUGUAGCUCAGGUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGGUGGUUCA AGUCCACUCAGGCCUACCA CCA Tail Acceptor Step D arm Anticodon arm Anticodon T arm
How find regions of reverse complementation? What do we have? Sequence A’s like pairing with U’s and G’s like pairing with C’s Stronger bond (3 hydrogen bonds) between G’s and C’s Should result in lowest free energy (max enthalpy)
tRNA Transports amino acid to the ribosome CCA Tail Acceptor Step D arm Anticodon arm Anticodon T arm
Visualization
Good at finding longer base- pairings (stacked base-pairs) Need to find the conformation that provides the minimal total free energy RNA often has many alternate conformations at different temperatures Stacked base-pairs add stability Loops/bulges introduce positive free energy and are destabilizing
First nucleotide base- pairs with last First nucleotide base- pairs with some other (other than last) nucleotide (including none) Recurse on rest Recurse on every possible set of two strings Recurrence relations
As luck would have it… Zuker came up with a dynamic programming solution j GGGAAAUCC G0 G0 G0 A0 A0 A0 U0 C0 C0 i
GGGAAAUCC G0 G00 G00 A00 A00 A00 U00 C00 C00 Start with zeros on diagonal Populate diagonally j i
Will look at last value to illustrate Match first and last character, recurse on rest GGGAAAUCC G G G A0000 A000 A00 U0000 C000 C00 j i αACUG A00 0 C000 U 000 G0 00
GGGAAAUCC G G G A000 A00 A0 U000 C00 C0 Min of all pairs of substrings j i -3 GGGAAAUCC G-G-G-A C-C-U A A G-G A C-C-U A A G
n 2 plus 2n for each visited cell So O(n 3 ) Populate matrix plus traverse row/column for each cell
Any prediction method must account for these
Now O(n 4 ) Interior loops most expensive Can exploit the fact that along diagonals, loops have same size Can calculate once Limits search space Back to O(n 3 )
Zuker’s site Zuker’s site 1 gccgaggtgg tggaattggt agacacgcta ccttgaggtg gtagtgccca atagggctta 61 cgggttcaag tcccgtcctc ggtacca tRNA for Leucine in E. coli, a prototypical organism Codon: uua Anti-codon: aat CCA Tail Acceptor Step D arm Anticodon arm Anticodon T arm
Just like proteins: conformation What if a T-A base-pair mutate to an G-C Still same function What would this do to a search or sequence alignment? GCAGGACCAUAUA ||||||||||||| CGUCCUGGUAUAU GCAGGACCAGAUA ||||||||||||| CGUCCUGGUCUAU
Phenomenon known as covariance (not to be confused with statistical covariance) GCAGGACCAUAUA ||||||||||||| CGUCCUGGUAUAU GCAGGACCAGAUA ||||||||||||| CGUCCUGGUCUAU
How might we locate covariant pairs? MSA then compare all pair- wise combinations of columns High degree of agreement in two columns (G’s match with C’s, A’s match with U’s) an indication of base-pairing χ 2 test Compare to expected number of parings given sequence composition
Pairing depicted with nested parentheses AAGACUUCGGUCUGGCGACAUUC ((( ))) (( ( )))
Mountain plots A mountain plot represents a secondary structure in a plot of height versus position, where the height m(k) is given by the number of base pairs enclosing the base at position k. I.e. loops correspond to plateaus (hairpin loops are peaks), helices to slopes.
Circle plot
Data structure capable of capturing secondary structure Ordered Binary Tree
Productions S → aSu | uSa | cSg | gSc S → aS | cS | gS | uS S → Sa | Sc | Sg | Su S → SS S → ⍉
Derivation S → aS S → aSc S → aScc S → acSgcc S → acgScgcc S → acggSccgcc S → acgggScccgcc S → acggggSccccgcc S → acgggguSccccgcc S → acgggguuSccccgcc S → acgggguucSccccgcc S → acgggguucgSccccgcc S → acgggguucgaSccccgcc S → acgggguucgaaSccccgcc S → acgggguucgaauSccccgcc S → acgggguucgaauccccgcc
Parse tree a←S | S→c | S→c | c←S→g | g←S→c | g←S→c | g←S→c | g←S→c S→u | | u←S S→a \ / u←S S→a \ / c←S—S→g
Conformation of RNA dictates function Determining secondary structure can help determine tertiary structure Dynamic programming approach to identifying minimum energy conformations Zuker MFOLD View using dot plots, nested parens, mountain or circular plots Covariance: base-pairs mutate but still form pairs, exploit to find pairings