ANALYSIS OF SELECTIVE DNA POOLING DATA IN FOX Joanna Szyda, Magdalena Zatoń-Dobrowolska, Heliodor Wierzbicki, Anna Rząsa
MAIN OBJECTIVES: ASSES POLYMORPHISM OF MICROSATELLITES IDENTIFY MARKER-TRAIT ASSOCIATIONS METHODOLOGICAL OBJECTIVES: TOOLS FOR THE ANALYSIS OF SPARSE DATA
SELECTIVE (INDIVIDUAL) GENOTYPING MATERIALMETHODSRESULTSCONCLUSIONS qqQQ MORE POWER STANDARD (LINEAR) MODELS NOT VALID
SELECTIVE DNA POOLING MATERIALMETHODSRESULTSCONCLUSIONS qqQQ M1M2M2M3M3M4M4 QTL m1 M1 m1 m1 m2 m2 m3 m3 M4 M4 m4 m4 M1 m1 M1 M1 M2 M2 M3 M3 m4 m4 M4 M4
SELECTIVE DNA POOLING MATERIALMETHODSRESULTSCONCLUSIONS CHEAP~18%-60% more efficient (Barrat et al. 02) MORE POWERFULL~10%-70% less individuals HIGH TECHNICAL ERRORDNA pool formation (DNA quantification) DNA amplification (differential amplification, shadow bands) POOLING POPULATIONS:no relationship information testing for association POOLING HALFSIBS:partial relationship information testing for linkage
ANIMALS MATERIALMETHODSRESULTSCONCLUSIONS POLAR FOX (Alopex lagopus) NORWEGIAN TYPE “LARGE” FINNISH TYPE “SMALL” 63 77
MARKERS MARKERDOG GENOMEFOX GENOMEHET. REN112I0201 ? 0.76 C ? 0.77 C ? 0.76 FH ? 0.86 C ? 0.81 FH ? 0.82 C ? 0.79 G ? 0.64 REN153O1212 ? 0.76 REN227M1213 ? 0.74 FH ? 0.70 REN275L1916 ? 0.82 FH ? 0.77 REN100J1320 ? 0.83 REN128E2122 ? 0.70 LEI00227 ? 0.70 REN248F1430 ? 0.70 REN43H2431 ? 0.66 REN106I0736 ? 0.78 REN67C1837?0.83 MATERIALMETHODSRESULTSCONCLUSIONS
MARKERS MATERIALMETHODSRESULTSCONCLUSIONS MARKER SELECTION CRITERIA: POLYMORPHISM number of alleles allele lengths AMPLIFICATION PROPERTIES temperature ?
MARKER ALLELE FREQUENCY IN POOLS MATERIALMETHODSRESULTSCONCLUSIONS
MARKER ALLELE FREQUENCY IN POOLS MATERIALMETHODSRESULTSCONCLUSIONS LOW POLYMORPHISM WITHIN EACH POOL “POOL-SPECIFIC” ALLELES POOR CORRESPONDENCE BETWEEN REPLICATES
BINOMIAL DISTRIBUTION MATERIALMETHODSRESULTSCONCLUSIONS allele pool BINOMIAL DISTRIBUTION Odds Ratio, Logistic Regression allele pool n 12 2n 21 n 22 3n 31 n 32 4n 41 n 42
ODDS RATIO MATERIALMETHODSRESULTSCONCLUSIONS ln (OR) = ln distribution ln (OR) ~ N (0,1) variance ln (OR) = confidence intervals ln (OR)±
ODDS RATIO IN SPARSE DATA MATERIALMETHODSRESULTSCONCLUSIONS ln (OR) = ln allele pool SPARSE DATA PROBLEM ln (OR) = ln c = 0standard c = 0.5Haldane(55) c ij = 2 (n i. n.j / n 2 )Bishop(75) Agresti (99): c=0.5not valid for ln(OR)>4 c ij not valid for ln(OR)>8
ODDS RATIO: P-values MATERIALMETHODSRESULTSCONCLUSIONS
ODDS RATIO - CI MATERIALMETHODSRESULTSCONCLUSIONS 0.01 CI FOR “CONCORDANT” POOLS 0.01 CI FOR “DISCORDANT” POOLS
ODDS RATIO - REMARKS MATERIALMETHODSRESULTSCONCLUSIONS many 2x2 comparisons (theoretically) possible: 18 m4 – 60 m1,m6 significance pattern often inconsistent between alleles – sparse data difficult to summarize ORs with a single value marker C association C association C ? REN227M12 no association REN275L19 ? (sparse data) LEI002 ? (sparse data)
FURTHER WORK MATERIALMETHODSRESULTSCONCLUSIONS use all table cells account for sparseness in testing multivariate logistic models
MULTINOMIAL DISTRIBUTION MATERIALMETHODSRESULTSCONCLUSIONS allele pool MULTINOMIAL DISTRIBUTION Multivariate Logistic Regression allele pool n 12 2n 21 n 22 3n 31 n 32 4n 41 n 42 allele pool n 11 n 12 n 13 n 14 n 15 2n 21 n 22 n 23 n 24 n 25 3n 31 n 32 n 33 n 34 n 35 4n 41 n 42 n 43 n 44 n 45
MODEL MATERIALMETHODSRESULTSCONCLUSIONS GENERAL LOGISTIC MODEL CONSIDERED MODELS FOR ALLELE FREQUENCIES
TEST STATISTIC MATERIALMETHODSRESULTSCONCLUSIONS MODEL SELECTION POWER DIVERGENCE FAMILY Cressie, Read (1984) Pearson’s X 2 Likelihood Ratio Test estimated frequencies observed frequencies DATAMODEL
TEST STATISTIC MATERIALMETHODSRESULTSCONCLUSIONS NORMALISATION SPARSE DATA ! INCREASING CELLS ASYMPTOTICS ! ?
TEST STATISTIC MATERIALMETHODSRESULTSCONCLUSIONS ANALYTICAL Osius, Rojek (1989): D( =1) Farrington (1996):D( =1)+ Copas (1989):a*D ( = 1) EMPIRICAL – Bootstrap, Jackknife EVALUATION OF REAL DATA NORMAL PROPERTIES - simulation D ? D ?
LITERATURE MATERIALMETHODSRESULTSCONCLUSIONS Agresti, A. (1990) Categorical data analysis. New York, Chichester, Brisbane, Toronto, Singapore. John Wiley & Sons. Agresti, A. (1999) On logit confidence intervals for the odds ratio with small samples. Biometrics 55: Barratt, B. J., Payne, F., Rance, H. E.,Nutland, S., Todd, J. A., Clayton, D. G. (2002) Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Annals of Human Genetics 66: Bishop, Y.M.M., Fienberg, S.E., Holland, P. (1975) Discrete multivariate analysis. Cambridge, Massachusetts: MIT Press. Copas, J.B. (1989) Unweighted Sum of Squares Test for Proportions. Applied Statistics 38: Cressie, N.A.C., Read, T.R.C. (1984) Multinomial goodness-of-t tests, Journal of the Royal Statistical Society Ser.B 46: Farrington, C.P. (1996) On assessing goodness of fit of generalized linear models to sparse data. Journal of the Royal Statistical Society Ser.B 58: Haldane, J.B.S. (1956) The estimation and significance of the logarithm of a ratio of frequencies. Annals of Human Genetics 20: Osius, G., Rojek, D. (1989) Normal goodness-of-fit tests for parametric multinomial models with large degrees of freedom. Fahbereich Mathematik/Informatik, Universitaet Bremen. Mathematik Arbeitspapiere 36: