Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Genes and MS in Tasmania, cont. Lecture 6, Statistics 246 February 5, 2004.

Similar presentations


Presentation on theme: "1 Genes and MS in Tasmania, cont. Lecture 6, Statistics 246 February 5, 2004."— Presentation transcript:

1 1 Genes and MS in Tasmania, cont. Lecture 6, Statistics 246 February 5, 2004

2 Nature and numbr of relatives needed to give accurate haplotypes Exercise. Explain why it is that when we have both sets of parental genotypes, and the markers are reasonably polymorphic, we can reconstruct an individual’s haplotypes with high probability. What are the difficult cases? If we have no parents, or just one parent, and grandparents’, siblings’ or offsprings’ genotypes are available, which are most informative for an individual’s haplotype reconstruction?

3 3 Simulation Study Simulated many different types of pedigrees 300 times each to see which constellations of relatives give the best opportunity of being able to reconstruct haplotypes correctly. Ranking the contributions in order of importance (assuming that the proband has been genotyped): 1.Parents 2.Grandparents & Siblings 3.Offspring

4 4 Genotyping We used STR (short tandem repeat) also known as microsatellite markers …AGCTAGCGCGC….GCGCGGCATTA… …AGCTAGCGCGC….GCGCGGCGCATTA… Eventual plan: 5 cM genome wide scan (~ 800 markers) with dinucleotide STRs

5 Data collected in the Tasmanian MS Study

6 6 MS study in Tasmania: data Collected 170 (out of an estimated 300) MS cases and 105 controls, and a constellation of ~ 4 relatives for each Created a case/control study with 338 case haplotypes and 208 control haplotypes Genotyping carried out at the Australian Genome Research Facility: almost 1 million genotypes (the 2nd largest genotyping project ever carried out in Australia

7 7 Cases (170) Controls (105) Grdpts 29 (4%)12 (3%) Parents 174 (22%)123 (30%) Siblings 374 (46%) 215 (52%) Spouse 67 (7%) 14 (3%)* Offspring 168 (20%) 50 (12%) Relatives of cases and controls 809 17 (1%) Other 0 414*

8 Some issues associated with the data preparation

9 9 Errors, errors and errors Marker location errors: allocation to wrong chromosome, wrong order, map distances out, Généthon (Dib et al,1996), Marshfield (Broman et al,1998), DeCODE (Kong et al, 2001) included a physical map, Pedigree (relationship) errors: PREST (McPeek & Sun) Genotyping errors (caused by assay or analysis): ones causing Mendelian inconsistencies; ones which don’t, PEDCHECK (O’Connell), SIBMED (Douglas et al), MERLIN (Abecasis et al) Data handling errors (e.g. mixed up samples) Binning (allele labelling) errors: inconsistencies over time

10 10 Error checking: a little detail With genome-wide genotypes, moderately close relationships can be confirmed or falsified: 7 paternity errors (6 incorrect fathers, 1 incorrect mother) Mix-ups typically stand out (2 DNA sample swaps, 2 duplicate samples, 1 case of contaminated DNA, 1 adopted child unrelated to anyone else Mendelian checks picked up many genotyping errors: 1,472 inconsistencies (0.15% genotyping errors); 15 markers removed; using Mendel on the X found 3 data entry errors and 4 cases where the recorded sex was wrong Multilocus methods can pick up more, in effect identifying close double recombinants: 58 errors inferred by this method and put to missing Other errors demanded special methods.

11 11 Unforeseen Problem Marker binning was not consistent over time. Genotyping at 796 markers took over 2 years. Heuristic approach: Look at all markers with allele bin differences of 1 bp Seek large frequency differences:  2 allele by box Carry out allele binning slippage test (for pairs of adjacent alleles and boxes):  2 Markers were flagged if any of the above, and examined for systematic trends A founder is an individual with no parent in the sample.

12 12 Example output showing partial allele slippage Absolute frequencies for given allele (106) in each box is shown in (time) order of genotyping Alleles in size order Summary information Note slippage of allele 104 into allele 106 for Box 7 (yellow) (Time) order of Genotyping - Box 1, 2+3, 5, 6, 7, 21-23, 24-27. Numbers indicate number of individuals in each box

13 13 Isolated bin (probably slippage)

14 14 Example of highly polymorphic marker

15 15 Box 1 - all alleles shifted +1bp

16 16 Box 1 Alleles 150, 154 shifted +1bp

17 17 Fixing allele calls Need to track changes carefully

18 18 Obtaining haplotypes Haplotypes were reconstructed using the Lander-Green- Kruglyak algorithm (Genehunter/Merlin/Allegro). We’ll go into the details of the algorithm later this lecture, or in the next. Appropriate case and control datasets with these haplotypes were then prepared. Here’s how, from Genehunter output.

19 19 Genehunter Output The genotype data for family MS003 (input) ***** MS003 0.000 302 0 0 1 8 11 7 4 5 9 8 6 3 6 5 2 9 5 7 4 6 5 1 4 6 6 303 0 0 1 5 10 5 3 3 7 1 1 2 5 5 5 4 5 7 1 4 4 1 3 4 8 301 303 302 2 5 10 5 3 3 7 1 1 2 5 5 8 11 7 4 5 9 8 6 3 6 5 304 303 302 0 5 4 5 7 1 4 4 1 3 4 8 2 9 5 7 4 6 5 1 4 6 6 305 303 302 0 5 10 5 3 3 7 1 1 2 5 5 8 11 7 4 5 9 8 6 3 6 5 MS003 301 303 302 2 2 5 8 10 11 5 7 3 4 3 5 7 9 1 8 1 6 2 3 5 6 5 5 MS003 302 0 0 2 1 2 8 9 11 0 0 7 4 0 0 0 0 0 0 1 6 0 0 0 0 5 6 MS003 303 0 0 1 1 5 5 4 10 0 0 7 3 0 0 0 0 0 0 1 1 0 0 0 0 5 8 MS003 304 303 302 2 0 2 5 4 9 5 5 7 7 1 4 4 6 4 5 1 1 3 4 4 6 6 8 MS003 305 303 302 1 0 5 8 10 11 5 7 3 4 3 5 7 9 1 8 1 6 2 3 5 6 5 5 The haplotype reconstruction for family MS003 (output) Proband Father’s transmitted haplotype Mother’s transmitted haplotype Untransmitted haplotypes

20 Extracting untransmitted haplotypes from GENEHUNTER Three types of controls: untransmitted haplotypes (akin to controls in TDT), haplotypes from matched controls to the affecteds, random controls To derive the untransmitted haplotypes: use GENEHUNTER to generate the haplotypes (creates haplo.dump file) extract the untransmitted haplotype use reconstructed haplotypes of the parents to find the untransmitted haplotype of the affected by negation Example:Affected’s haplotypes Haplotypes of Parents of affected 2 5 2 12 10 6 8 2 3 0 0 1 0 2 1 2 2 1 1 7 7 1 2 3 13 12 9 7 2 5 2 12 10 6 8 1 2 3 13 12 9 8 Untransmitted haplotypes are: 2 3 0 0 1 0 2, 1 2 2 1 1 7 8

21 Assessing haplotype sharing

22 22 Nonparametric haplotype sharing analysis Why nonparametric, rather than likelihood-based methods? Likelihood methods make many assumptions regarding the genealogy of the population. We don’t how many of these assumptions are robust to violations. Likelihood methods are computationally intensive, perhaps prohibitively so, especially for genome wide scans where these is a need to maximize over the very large state space of possible ancestral haplotypes (MCMC) Likelihood methods have a hard time at HLA because the LD there is extremely high and non uniform (block-like structure) Simpler statistics will do better here unless we can model background LD

23 23 Haplotype sharing statistics for genome wide scan data cf. fine mapping Previous statistics (mainly likelihood based) concentrated on fine mapping and the exact localization of variant. They assume a signal exists. For us, localization is not the primary interest, rather, detection is the main interest in a genome-wide scan.

24 24 Towards a sharing statistic Our aim was to come up with a statistic that describes haplotype sharing effectively At the markers closest to the disease locus the sharing statistic should be particularly large as the haplotype sharing should –extend the furthest, and also –the association of disease with particular haplotypes (alleles in the single marker case) should be strongest there We needed something that was not as computationally intensive as e.g. DHSMAP or BLADE

25 25 3 5 9 8 7 6 10 1 5 4 3 2 5 Cases 3 2 1 3 7 6 10 1 5 4 1 3 2 1 2 1 3 5 6 10 1 5 2 1 3 4 2 3 7 3 1 6 10 9 1 1 2 5 6 5 9 1 1 4 1 3 1 2 3 1 9 8 7 6 5 3 1 3 2 1 5 9 7 9 1Controls 7 1 2 1 1 3 5 7 1 5 1 3 2 9 3 9 2 1 2 7 5 3 4 2 2 5 Testing for shared haplotypes Score for haplotype sharing (- log p) Pter--Qter

26 26 Sharing drop-off & allelic heterogeneity Marker Proportions of Cases Proportions of Controls 12341234 = Cluster 1 haplotypes = Cluster 2 haplotypes = neither cluster 1 nor 2 haplotypes

27 27 Haplo_clusters (Melanie Bahlo) Calculates a sharing statistic at every marker Assigns significance using a permutation test Allows for several clusters of ancestral haplotypes (allelic heterogeneity)


Download ppt "1 Genes and MS in Tasmania, cont. Lecture 6, Statistics 246 February 5, 2004."

Similar presentations


Ads by Google