Protein Evolution Jean Yeh, SoCalBSI Mike Thompson, UCLA Summer 2005
How do proteins evolve? Point mutations – exchange of one nucleotide for another Silent – same amino acid Missense – different amino acid Nonsense – stop Insertions and deletions (indels) – addition or removal of one or more nucleotides
Frameshift Mutations
Frameshift Mutations (cont.) An insertion or deletion of some number of nucleotides that is not divisible by three Leads to a shift in reading frame Generally renders the original protein nonfunctional, perhaps through a stop codon (nonsense mutation) But what if it led to a functional protein?
Frameshift Errors Pellegrini,M. and Yeates,T.O. (1999) Searching for frameshift evolutionary relationships between protein sequence families. Proteins, 37, 278–283
Goal To see if frameshift mutations can account for evolution of some proteins Analysis will be based on amino acid scoring matrices created by Drs. Pellegrini and Yeates in a previously published paper (“Searching for frameshift evolutionary relationships between protein sequence families”. Proteins, 37, 278– ; mbi.ucla.edu/~yeates/frameshift/)
Methods Using a database of closely related genomes, pull out genes matching the following pattern: If genes on either side of X and Y were conserved, one probably arose from the other Genome 1 Genome 2 Gene A Gene X Gene Y Gene B
Methods (cont.) Compile list of ‘X and Y’ genes Run comparisons on underlying amino acid sequences, based on amino acid tables that take into account frameshift mutations See if relationships in fact exist between the seemingly unrelated genes
Database Peter Bowers had two databases (prokaryotic and fungal) culled from various internet sources Started with prokaryotic database because it was more complete Dr. Yeates felt sequences had diverged too much Switched to fungal databases – more incomplete but more closely related genomes
Coding Wrote programs in Perl to update the fungal database Nucleotide stop and start positions Contig numbering Started with complete genomes and pulled lists of bidirectional best hits Too few to be of use
Bidirectional Best Hit Gene 1 Genome 1 Genome 2 Genome 1 Genome 2 Gene 5 Gene 10 Gene 13 Gene 1 -> Gene 13 gives best e-score Gene 13 Gene 1 Gene 4 Gene 13 -> Gene 1 gives best e-score
Coding (cont.) Compiled lists of all alignments between two genomes, then took any bidirectional hits Filtered for those alignments that match the desired pattern Have sequences for eight pairs of genomes (ranging from 4 to 82 sequences per pair)
Analysis Ran local alignments on the obtained sequences, using scoring matrices from the website Used different gap penalties Also tried test sequences that have been shifted by one or two frames
Future Work So far the results have been inconclusive Would probably need to do a full statistical estimation of alignment scores according to the extreme value distribution Could also work with underlying nucleotide sequences
Acknowledgements Peter Bowers Mike Thompson Todd Yeates Nam Tonthat SoCalBSI