Presentation is loading. Please wait.

Presentation is loading. Please wait.

Precise Identification of Structural Variations in the Human Genome by Splitting Shotgun Reads Zemin Ning1, Anthony Cox1, David Adams1, Paul Flicek2, Charles.

Similar presentations


Presentation on theme: "Precise Identification of Structural Variations in the Human Genome by Splitting Shotgun Reads Zemin Ning1, Anthony Cox1, David Adams1, Paul Flicek2, Charles."— Presentation transcript:

1 Precise Identification of Structural Variations in the Human Genome by Splitting Shotgun Reads
Zemin Ning1, Anthony Cox1, David Adams1, Paul Flicek2, Charles Shaw-Smith1, Mark Griffiths1, Adam Spargo1, Jane Rogers1 and Richard Durbin1 1The Wellcome Trust Sanger Institute 2EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA UK S Target Site Duplications and Length Distribution INTRODUCTION   A large extent of structural variations exists in the human genome between individuals1,2. Disease and disease susceptibility may be associated with this type of genetic variation.. Current experimental or computational methods provide a means to study human diversity and investigations include copy-number variations using array CGH3,4, identification of insertions/deletions using sequencing traces5, and fine scale mapping using pair-ending fosmids6, from which hundreds of submicroscopic copy-number variants and inversions have been identified. It was reported that the sequences involved sometimes contained entire genes and their regulatory regions, up to millions of DNA bases in size. However, the comparative microarray studies reported in the literature lack the sequence level precision on breakpoints and also the surveys were only on a small fraction of the sequence. The in silico strategy6 using fosmid ends achieved higher resolution, but it still cannot, in most cases, provide exact loci for breakpoints, nor a solution to detect variants less than 5 kb. Short indels (<50 bps) can be identified by aligning shotgun reads against the genome assembly. However, there is still much progress to be made in order to detect accurately all types of structural variations in the different size ranges. We have developed a computational method for the precise identification of structural variants across the genome by aligning shotgun reads against the reference sequence. As individual reads covering the boundaries of variation regions are split, this enables us to pinpoint the exact breakpoint loci as well as to extract sequences involved between the boundaries if applicable. DNA samples used in this analysis were from 10 different human individuals and one chimpanzee male with a total number of 74 million shotgun reads, providing a wealth of resources and diversity in studying structural variations in the human genome. Reference Sample Reads VNTR a b d ’ d ’’ Deletion d Figure 1. Length distribution of structural variants with Chimp ancestral data included. Figure 2. Length distribution of target site duplications. Detection of Structural Variants Deletion b a (a) Deletion Sample Reads Reference Insertion VNTR (b) Insertion (d) VNTRs (c) Insertion with sequence 2’ A’ 1 A’’ 1’ 2 Experimental validation – PCR Tests 1. Insertion Chr1: 2. Deletion Chr1: 4. Insertion Chr13: 3. Deletion Chr6: Results and Conclusion DNA Sources and Reads Exonic, Intronic and Noncoding Mapping 2549 145 236 285 1831 1281 A B C   A total number of 7,293 structural variants have been identified: 2,500 deletions, 2,358 insertions and 2,435 VNTRs, using 44 million shotgun reads from 10 different human individuals. To assess the ancentral states of variation with the chimpanzee genome, we also used 30 million chimp reads. Compared with one existing database dbRIP7 of structural variations, there are 545 exact matches among 2095 retrotransposon insertion polymorphisms (L1, Alu and SVA). 66% of sequences of structural variants can be masked as retrotransposons; 28% of human variants share the same location with the chimp, i.e. ancestral states; 89% of ancestral deletions are retrotransposons, 66% for VNTRs; 38% of variants are located in exon/intro regions; Conclusion: Mobile transposons are significantly more active in the intro-genetic regions and this might lead to phenotype differences among human individuals. Type of Variation Coding_Exonic Coding_Intronic Noncoding Total SV_deletion 17 892 1591 2500 SV_insertion 2 897 1459 2358 SV_VNTRs 8 966 1461 2435 Species Cell lines Number of reads Human HAPMAP 17109 1,841,054 HAPMAP 17119 5,977,374 HAPMAP 11321 4,488,765 HAPMAP 07340 3,728,821 HAPMAP 10470 557,845 Celera HuAA 2,788,046 Celera HuBB 19,397,599 Celera HuCC 1,745,337 Celea HuDD 2,011,152 Celera HuFF 1,507,522 Total Human 44,043,515 Chimpanzee Clint 30,838,333 Total Reads 74,881,848 Genes Affected by Detected Variants Type of Variation Chr Name of the gene offset_start offset_end SV_deletion 1 SEC22L1 10 ENSG SFMBT2 11 ENK17_HUMAN FAM55A 16 UBN1 18 DHFR 21 BAGE4 22 ARVCF GSTT2 Q8N7Q6_HUMAN XP_ 3 PFN2 Q96EG4_HUMAN 4 Q9UN78_HUMAN 6 ENSG KIAA1949 SV_insertion 12 KRT4 2 ENSG SV_VNTRs 13 ENSG Q8WYY0_HUMAN NP_ 655266 655362 17 KRTAP4-10 ENOSF1 702428 702517 CU025_HUMAN KIF25 X FMR1NB (a) 230 331 903 658 296 1171 C B A (b) Chromosomes, Reads and Structural Variants Figure 3. Data overlaps among three datasets: A – Devine_lab5; B – This Study; C – dbRIP7. (a) Deletion; (b) Insertion. Chr No Reads Chr_Size Deleletion Deletion/Size Insertion Insertion/Size VNTRs VNTRs/Size Total Total/Size 1 194 191 182 567 2 147 146 127 420 3 128 114 123 365 4 139 104 115 358 5 110 80 86 276 6 224 219 662 7 154 132 400 8 112 83 74 269 9 119 100 333 10 149 158 456 11 152 168 130 450 12 142 430 13 141 416 14 886057 58 48 55 161 15 959438 64 72 71 207 16 858369 70 51 85 206 17 848567 61 50 78 189 18 744095 57 155 19 636170 41 35 20 82 81 245 21 399977 37 22 811301 54 62 88 204 X 105 69 255 Y 96878 2500 0.813 2358 0.767 2435 0.791 7293 2.371 References 1. Inoue, K. & Lupski, J. R. Molecular mechanisms for genomic disorders. Annu. Rev. Genomics Hum. Genet. 3, 199–242 (2002). 2. Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 33 Suppl, 228–237 (2003).  3. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).  4. Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).  5. Bennett, E.A. et al. Natural genetic variation caused by transposable elements in humans. Genetics 168, (2004). 6. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005). 7. Wang J, Song L, Grover D, Azrak S, Batzer MA, Liang P. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum Mutat. 27, (2006). Acknowledgement: The Project is funded by the Wellcome Trust.


Download ppt "Precise Identification of Structural Variations in the Human Genome by Splitting Shotgun Reads Zemin Ning1, Anthony Cox1, David Adams1, Paul Flicek2, Charles."

Similar presentations


Ads by Google