Lecture 7: Gen(om)e duplications 9/23/09
Homework 1. Clustal and trees 2. Ensembl links 3. OMIM
HW #1 GNAT1
Fasta file >Human_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLEECLEFIAIIY GNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMSDIIQRLWKDSGIQACFERAS EYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGIIETQFSFKDLNFRMFDVGGQRSERKKWIHC FEGVTCIIFIAALSAYDMVLVEDDEVNRMHESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLS ICFPDYDGPNTYEDAGNYIKVQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF >Chimp_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLEECLEFIAIIY GNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMSDIIQRLWKDSGIQACFERAS EYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGIIETQFSFKDLNFRMFDVGGQRSERKKWIHC FEGVTCIIFIAALSAYDMVLVEDDEVNRMHESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLS ICFPDYDGPNTYEDAGNYIKVQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF >Dog_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLEECLEFIAIIY GNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMSDIIQRLWKDSGIQACFERAS EYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGIIETQFSFKDLNFRMFDVGGQRSERKKWIHC FEGVTCIIFIAALSAYDMVLVEDDEVNRMHESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLS ICFPDYDGPNTYEDAGNYIKVQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF Note: Programs will use whatever is in the identifier up to the 1st space as labels. If you don’t like genbank #s, you can change this to species names.
CLUSTAL multiple sequence alignment Human_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Chimp_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Dog_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Cow_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Rat_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Mouse_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Zfish_GNAT1 MGAGASAEEKHSRELEKKLKEDADKDARTVKLLLLGAGESGKSTIVKQMKIIHKDGYSLE 60 ***********************:*****************************:****** Human_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Chimp_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Dog_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Cow_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Rat_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Mouse_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Zfish_GNAT1 ECLEFIVIIYSNTMQSILAVVRAMTTLNIGYGDAAAQDDARKLMHLADTIEEGTMPKELS 120 ******.***.**:*****:********* ***:* *********:************:* Human_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Chimp_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Dog_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Cow_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Rat_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Mouse_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Zfish_GNAT1 DIILRLWKDSGIQACFDRASEYQLNDSAGYYLNDLERLIQPGYVPTEQDVLRSRVKTTGI 180 *** ************:***************.*****: ******************** Human_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Chimp_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Dog_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Cow_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Rat_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Mouse_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Zfish_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 ************************************************************ Human_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Chimp_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Dog_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Cow_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYNGPNTYEDAGNYIK 300 Rat_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYDGPNTYDDAGNYIK 300 Mouse_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Zfish_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFVEKIKKAHLSMCFPEYDGPNTFEDAGNYIK 300 ****************************** *********:***:*:****::******* Human_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Chimp_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Dog_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Cow_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Rat_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Mouse_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Zfish_GNAT1 VQFLDLNLRRDIKEIYSHMTCATDTENVKFVFDAVTDIIIKENLKDCGLF 350 ****:**:***:*************:************************ 350 sites * Fixed : /350 = 92.6%
HW #1 GNGT1 Fixed =49/74 = 66% Human_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEVRDYVEERSGEDPLVKGIPED 60 Chimp_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEVRDYVEERSGEDPLVKGIPED 60 Dog_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEVRDYVEERSGEDPLVKGIPED 60 Cow_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEFRDYVEERSGEDPLVKGIPED 60 Mouse_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMMVSKCCEEVRDYIEERSGEDPLVKGIPED 60 Rat_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERVMVSKCCEEVRDYIEERSREDPLVKGIPED 60 Zfish_GNGT1 MPIIDVENMTDLDKAKMEVTQLKTEVKLERAKVSKCCEEITEYIQGGADEDPLVKGIPEE 60 **:*::*::*: ** **** ***.**.*** *******. :*:: : **********: Human_GNGT1 KNPFKELKGGCVIS 74 Chimp_GNGT1 KNPFKELKGGCVIS 74 Dog_GNGT1 KNPFKELKGGCVIS 74 Cow_GNGT1 KNPFKELKGGCVIS 74 Mouse_GNGT1 KNPFKELKGGCVIS 74 Rat_GNGT1 KNPFKELKGGCVIS 74 Zfish_GNGT1 KNPFKE-KGGCVIC 73 ****** ******.
Protein interactions Rhodopsin GNAT1 GNB1 GNGT1
Relative constraint, % of fixed sites GNAT1324 / 350 = 92.6% GNB1306 / 340 = 88% GNGT149 / 74 = 66%
Trees
Ensembl search finds lots of groups Interpro domain - identifies and groups proteins by protein signatures Ensembl families - proteins grouped by phylogenetic relationship Vega / Havana - the human hand curated part of the ensembl database. They confirm each predicted gene in different genomes Find proteins, pseudogenes, processed pseudogenes
We want Ensembl protein_coding Gene Check that it is rhodopsin and not some rhodopsin related gene
Transcript and protein info are useful
Protein - use links at left to look at the sequence
Protein sequence
Exon shows sequences of exons as well as those of UTRs, and introns Start 5’UTR Intron
cDNA sequence includes known SNPs Variation in human population
Can export sequence
Ensembl There is a dizzying array of data and info on this web site. We will try to use it as a “helpful” tool to gather more sequences Often we just want to get all the homologs from all the species where Ensembl has made that link -
At bottom of sequence list is link to sequence display
Go back to the gene page and scroll down to find orthologs This shows pairwise comparisons in clustalw format.
OMIM
Q4. Making trees Clustalw is a bit limited Sequences are compared using distances Trees are drawn by neighbor joining Nice to have more options Max likelihood, distance, parsimony Phylip - set of modules that you can mix and match to make trees Phylemon Pasteur Institute
Methods Parsimony - Alignment Input characters to parsimony tree program Distance Alignment Calculate distances Input distances to tree program Maximum likelihood Alignment Input characters to ML program
Steps to make a distance tree StepsProgram Align sequencesClustalw-multialign Calculate distancesDNAdist Protdist Use distances to make a tree Neighbor Display treeExternal program
Steps to make a distance tree Align sequences Can do in clustalw at EBI web site or at Pasteur web site
Pasteur Institute - Phylogenetics
Clustalw2 at Pasteur - under alignment and under multiple Either paste in sequences or select fasta file and upload
Leave defaults and hit Run
Save files to keep results Clustal does make dendogram which you can save
Save files to keep results You can pass the results of this to the next program here
Calculate distances If DNA use DNAdist If protein (AA) use Protdist
Pass alignment to protdist
Use Protdist under distance Upload or paste data and say Run
Save distance matrix then send to neighbor joining program to make tree
Tell it which # taxa is the outgroup - this will root your tree! 7
CLUSTAL multiple sequence alignment Human_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Chimp_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Dog_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Cow_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Rat_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Mouse_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Zfish_GNAT1 MGAGASAEEKHSRELEKKLKEDADKDARTVKLLLLGAGESGKSTIVKQMKIIHKDGYSLE ***********************:*****************************:****** Human_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Chimp_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Dog_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Cow_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Rat_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Mouse_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Zfish_GNAT1 ECLEFIVIIYSNTMQSILAVVRAMTTLNIGYGDAAAQDDARKLMHLADTIEEGTMPKELS ******.***.**:*****:********* ***:* *********:************:* Human_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Chimp_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Dog_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Cow_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Rat_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Mouse_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Zfish_GNAT1 DIILRLWKDSGIQACFDRASEYQLNDSAGYYLNDLERLIQPGYVPTEQDVLRSRVKTTGI *** ************:***************.*****: ******************** Note: Zebrafish is taxa #7
Save tree
What does this tree mean??? Tree shows relationships and branch lengths (((Cow_GNAT1: ,Rat_GNAT1: ): ,Mouse_G NAT: ): , ((Human_GNAT: ,Chimp_GNAT: ): ,Dog_ GNAT1: ): ,Zfish_GNAT: ); Just relationships: (((Cow,Rat),Mouse),((Human,Chimp),Dog),Zfish)
You can download FigTree for drawing trees Mac PC
Tree - does this make sense?
What is the difference between homologs, orthologs and paralogs?????
Orthologs Have common ancestor, derived by descent Paralogs Gene duplicates within the same organism Homologs = orthologs + paralogs
LWS RH2 SWS2 SWS1 RH1 Lamprey LWS Lamprey RHB Lamprey RHA Lamprey S2 Lamprey S1
How do gen(ome)s evolve? What can change? DNA mutation DNA deletions / insertions (indels) Recombination Selection - change in gene frequency Gene transfer Duplications
Human Chicken Frog Zebrafish Dog Human Chicken Frog Zebrafish Dog Lamprey Gene duplication
Ohno Evolution by Gene Duplication, 1970 Gene duplication is the primary way that you get new genes to work with Genome duplications Double # of chromosomes Keep balance in biochemical machinery Duplicate regulatory structure New genes can evolve to do new jobs!
Gene vs genome duplications How do you know what has duplicated?
Mechanisms for duplication 1.Tandem duplication 2.Insertion of retrotransposed gene 3.Genome / chromosome duplication
1. Mismatched recombination Leads to extra genes inserted right next to original gene Unequal crossover
Normal DNA recombination Switches genes from one chromosome to the other Leads to new gene combinations
Mismatched recombination If chromosomes misalign, recombination leads to gain of gene on one chromosome and loss of gene on the other. Tandem arrays of genes
Opsin gene tandem arrays on X chromosome Only first 2 genes are expressed so it doesn’t matter if there are more green genes. They are just along for ride.
Misaligned recombination If recombination happens within gene, get chimera Intermediate phenotype - changes pigment light sensitivity Opsin genes on X chromosome
Human red and green opsins 530 nm 560 nm A S A A164S=+2 nm Y F T F261Y=+10 nm A269T=+14 nm 554 nm
Normal human visual pigments Normal max = 420, 535, 565 nm
Deuteranomoly - green pigment shifted towards red max = 420, 550, 565 nm 5% male 0.04% female
2. Insertion of retrotransposed gene Gene can be transcribed to mRNA mRNA then gets reverse transcribed and inserted into DNA Clue a gene is retrotransposed? No introns - all coding sequence
Comparison of rhodopsin genes Vertebrate rhodopsin gene Fish rhodopsin gene
Possibilities Lost introns and stayed in place mRNA sequence reinserted somewhere else in the genome
Fugu - human comparison Rh1 Human chr 3 Fugu scaffold 830 Human chr Z Fugu Rh gene has been inserted into chromosome