Comparative genomics Joachim Bargsten February 2012
Comparative genomics The study of the relationship of genome structure and function across different biological species or strains. Why should we do this? How are we going to do this?
Study evolution Resolve Differences Mechanism Tree of life
Motivation Transfer knowledge from and to simpler model organisms Human C. elegans
Motivation
Overview Molecular phylogenetics Multiple sequence alignment Phylogenetic tree estimation Ortholog prediction Genome rearrangements Large scale inversions, deletions and translocations Synteny & Collinearity Structural variations Presented by Lin Ke
Molecular phylogenetics The use of molecular data to establish the relationship between species, organisms or gene families Homology sequences that share common ancestry. This is a all or nothing relation. Sequences are never “a bit” homologous. Orthologs: homologs in different species derived by a speciation event Paralogs: homologs in the same or different species derived by a duplication event
Homology (co-)orthologs last common ancestor
Homology inparalogs last common ancestor
Homology outparalogs last common ancestor
Phylogenetic tree estimation How do we estimate a phylogenetic tree? Identify evolutionary conserved region Multiple sequence alignment MAFFT Estimate the phylogenetic tree PhyML
Phylogenetic tree estimation Multiple sequence alignment
Phylogenetic tree estimation
Infer evolutionary relationships between species and genes/proteins Rooted tree Order of evolutionary events Unrooted tree Evolutionary relationships between descendants
Non-coding regions Phylogenetic footprinting Distantly related species Phylogenetic shadowing Closely related species Use sequence comparison and multiple alignment to find exons and non-coding functional regions E.g. Transcription factor binding sites
What can we do with it? Gene annotation Gene or protein function prediction Identify non-coding elements in the genome Species phylogeny Genome evolution
Genome alignment Pairwise alignment Match chromosome sequence from species A to species B
Genome alignment – dot plot
Dot-plot chromosome 2L tomato - potato
Synteny & collinearity Synteny gene loci are on the same chromosome Conserved synteny gene loci are on the same chromosome in different species Collinearity The order of the gene loci is preserved across species
inverted
Resources Comparative genomics plants Plant Genome Duplication Database Plaza
Exercise ssh –X cd /mnt/geninf15/work/bif_course_2012/comparative_genomics_jwb less assignment.txt kwrite assignment.txt