Repetitive elements Evolutionary ‘signposts’ Significance Evolutionary ‘signposts’ Passive markers for mutation assays Actively reorganise gene organisation by creating, shuffling or modifying existing genes Chromosome structure and dynamics Provide tools for medical, forensic, genetic analysis
Repetitive sequences AAA, ATATATAT, CGTCGTCGT etc.. 5 main classes Tandem repeats Transposon-derived repeats Segmental duplications Processed pseudogenes
1) Tandem repeats Blocks of tandem repeats at subtelomeres pericentromeres Short arms of acrocentric chromosomes Ribosomal gene clusters
Tandem / clustered repeats Broadly divided into 4 types based on size class Size of repeat Repeat block Major chromosomal location Satellite 5-171 bp > 100kb centromeric heterochromatin minisatellite 9-64 bp 0.1 – 20kb Telomeres microsatellites 1-13 bp < 150 bp Dispersed HMG3 by Strachan and Read pp 265-268
Satellites Large arrays of repeats Some examples Satellite 1,2 & 3 - found in all chromosomes a (Alphoid DNA) b satellite HMG3 by Strachan and Read pp 265-268
Minisatellites Moderate sized arrays of repeats Some examples Hypervariable minisatellite DNA - core of GGGCAGGAXG - found in telomeric regions - used in original DNA fingerprinting technique by Alec Jeffreys HMG3 by Strachan and Read pp 265-268
Microsatellites 1-13 bp repeats e.g. (A)n ; (AC)n Individual genotype VNTRs - variable number of tandem repeats, SSR - simple sequence repeats 1-13 bp repeats e.g. (A)n ; (AC)n 2% of genome (dinucleotides - 0.5%) Used as genetic markers (especially for disease mapping) A father might have a genotype of 12 repeats and 19 repeats, a mother might have 18 repeats and 15 repeats while their first born might have repeats of 12 and 15. Individual genotype HMG3 by Strachan and Read pp 265-268
Microsatellite genotyping The most common way to detect microsatellites is to design PCR primers that are unique to one locus in the genome and that base pair on either side of the repeated portion Therefore, a single pair of PCR primers will work for every individual in the species and produce different sized products for each of the different length microsatellites Fig 7.7 HMG3 by Strachan and Read pp 190
Microsatellite genotyping .
CA repeat genotyping Marker D17S800 Allele types A (3,6) B (1,5) . Marker D17S800 A B C D E Allele types A (3,6) B (1,5) C (3,5) D (2,5) E (3,6) N.B. ‘stutters’ or shadow bands Caused by strand slippage On rare occasions, microsatellites can cause the DNA polymerase to make an extra copy of CA similar to the way we find it difficult to say toy boat several times in a row with consistent accuracy. If an individual¹s DNA polymerase adds to the repeated sequence, then this slightly larger version can be passed on to offspring who will usually replicate it accurately. Over time, as animals in a population breed, they will recombine their microsatellites during sexual reproduction and the population will maintain a variety of microsatellites that is characteristic for that population and distinct from other populations which do not interbreed Fig 7.8 HMG3
strand slippage during replication Fig 11.5 HMG3 by Strachan and Read pp 330
strand slippage during replication Fig 11.5 HMG3 by Strachan and Read pp 330
2) Transposon-derived repeats Repetitive elements… 2) Transposon-derived repeats A.k.a. interspersed repeats 45% of genome Arise mainly as a result of transposition either through a DNA or a RNA intermediate 4 main types LINES, SINES, LTRs and DNA transposons
LINEs (long interspersed elements) Transposon-derived repeats… LINEs (long interspersed elements) Most ancient of eukaryotic genomes Autonomous transposition (reverse trancriptase) ~6-8kb long Internal polymerase II promoter and 2 ORFs 3 related LINE families in humans – LINE-1, LINE-2, LINE-3. Believed to be responsible for retrotransposition of SINEs and creation of processed pseudogenes Nature (2001) pp879-880 HMG3 by Strachan & Read pp268-272
SINEs (short interspersed elements) Transposon-derived repeats… SINEs (short interspersed elements) Non-autonomous (successful freeloaders! ‘borrow’ RT from other sources such as LINEs) ~100-300bp long Internal polymerase III promoter No proteins Share 3’ ends with LINEs 3 related SINE families in humans – active Alu, inactive MIR and Ther2/MIR3. Nature (2001) pp879-880 HMG3 by Strachan & Read pp268-272
LINES and SINEs have preferred insertion sites In this example, yellow represents the distribution of mys (a type of LINE) over a mouse genome where chromosomes are orange. There are more mys inserted in the sex (X) chromosomes.
Try the link below to do an online experiment which shows how an Alu insertion polymorphism has been used as a tool to reconstruct the human lineage http://www.geneticorigins.org/geneticorigins/pv92/intro.html
Long Terminal Repeats (LTR) Transposon-derived repeats… Long Terminal Repeats (LTR) Repeats on the same orientation on both sides of element e.g. ATATATNNNNNNNATATAT Autonomous or non-autonomous Autonomous retroposons encode gag, pol genes which encode the protease, reverse transcriptase, RNAseH and integrase Nature (2001) pp879-880 HMG3 by Strachan & Read pp268-272
DNA transposons (lateral transfer?) Transposon-derived repeats… DNA transposons (lateral transfer?) DNA transposons Inverted repeats on both sides of element e.g. ATGCNNNNNNNNNNNCGTA From GenesVII by Levin Nature (2001) pp879-880
Transposon derived repeats major types class family size Copies* % genome* LINE LINE-1 (Kpn family) ~6.4kb 0.8x106 15.4 SINE Alu ~0.3kb 1.3x106 10.7 LTR e.g.HERV ~1.3kb 0.7x106 7.9 DNA transposon mariner ~0.25kb 0.4x106 2.7 * Updated from HGP publications HMG3 by Strachan & Read pp268-272
3) Segmental duplications Closely related sequence blocks at different genomic loci Transfer of 1-200kb blocks of genomic sequence Segmental duplications can occur on homologous chromosomes (intrachromosomal) or non homologous chromosomes (interchromosomal) Not always tandemly arranged Relatively recent
Segmental duplications Interchromosomal segments duplicated among non-homologous chromosomes Intrachromosomal duplications occur within a chromosome / arm Nature Reviews Genetics 2, 791-800 (2001);
Segmental duplications Segmental duplications in chromosome 22 Segmental duplications
Segmental duplications - chromosome 7.
Nature Reviews Genetics 2, 791-800 (2001)
4) Pseudogenes - processed
Repetitive sequences AAA, ATATATAT, CGTCGTCGT etc.. 5 main classes Tandem repeats Transposon-derived repeats Segmental duplications Processed pseudogenes
Insights from the HGP……… 7) Repeat content Age distribution Comparison with other genomes Variation in distribution of repeats Distribution by GC content Y chromosome Nature (2001) 409: pp 879-891
Repeat content……. a) Age distribution Most interspersed repeats predate eutherian radiation (confirms the slow rate of clearance of nonfunctional sequence from vertebrate genomes) LINEs and SINEs have extremely long lives 2 major peaks of transposon activity No DNA transposition in the past 50MYr LTR retroposons teetering on the brink of extinction Most IR predate eutherian radiation (confirms the slow rate of clearance of nonfunctional sequence from vertebrate genomes) LINEs and SINE have extremely long lives 2 major peaks of transposon activity No DNA transposition in the past 50MYr LTR teetering on the brink of extinction
a) Age distribution overall decline in interspersed repeat activity in hominid lineage in the past 35-40MYr compared to mouse genome, which shows a younger and more dynamic genome Most IR predate eutherian radiation (confirms the slow rate of clearance of nonfunctional sequence from vertebrate genomes) LINEs and SINE have extremely long lives 2 major peaks of transposon activity No DNA transposition in the past 50MYr LTR teetering on the brink of extinction
b) Comparison with other genomes Higher density of transposable elements in euchromatic portion of genome Higher abundance of ancient transposons 60% of IR made up of LINE1 and Alu repeats whereas DNA transposons represent only 6% (a few human genes appear likely to have resulted from horizontal transfer from bacteria!!)
c) Variation in distribution of repeats Some regions show either High repeat density e.g. chromosome Xp11 – a 525kb region shows 89% repeat density Low repeat density e.g. HOX homeobox gene cluster (<2% repeats) (indicative of regulatory elements which have low tolerance for insertions)
d) Distribution by GC content High GC – gene rich ; High AT – gene poor LINEs abundant in AT-rich regions SINEs lower in AT-rich regions Alu repeats in particular retained in actively transcribed GC rich regions E.g. chromosme 19 has 5% Alus compared to Y chromosome
Repeat content……. e) The Y chromosome ! Unusually young genome (high tolerance to gaining insertions) Mutation rate is 2.1X higher in male germline Possibly due to cell division rates or different repair mechanisms
Working draft published – Feb 2001 Finished sequence – April 2003 Annotation of genes going on
References Text: 1) Human Molecular Genetics 3 by Strachan and Read – Chapter 9 pp 265-268 Optional Reading Batzer MA, Deininger PL Alu repeats and human genomic diversity Nature Rev Genet 3 (5): 370-379 May 2002 BS Emanuel & TH Shaikh Segmental duplications: an 'expanding' role in genomic instability and disease Nature Reviews Genetics 2, 791-800 (2001) Nature (2001) 409: pp 879-891