PHOEBE UK Biobank: how big is “big”? Paul Burton Dept of Health Sciences Dept of Genetics University of Leicester
PHOEBE Structure of talk What is UK Biobank? How big is “big”? Expected event rates in UK Biobank Biobank harmonisation
PHOEBE What is UK Biobank?
PHOEBE A prospective cohort study 500,000 adults across UK Middle aged (40-69 years) A population-based biobank Not disease or exposure based “Broad spectrum” not “fully representative” Individuals not families MRC, Wellcome Trust, DH, Scottish Executive £61M Basic design features
PHOEBE Longitudinal health tracking Nested case-control studies Long time-horizon Owned by the Nation Central Administration – Manchester PI: Prof Rory Collins - Oxford 6 collaborating groups (RCCs) of university scientists Basic design features
PHOEBE Primary justifications Roles that can best be fulfilled by a new large cohort study of the type represented by UK Biobank Secondary justifications Roles that could be provided by other types of study, but given that UK Biobank is to go ahead anyway these additional roles can be taken on at relatively low marginal cost Justification for UK Biobank
PHOEBE A platform for research in biomedical science Studies of the joint effects of genes and environment/life-style Genotype-based studies*** The genetics of disease progression Direct association of genes with disease*** Universal controls Family-based studies
PHOEBE How big is “big”? With Anna Hansell, Imperial College
PHOEBE Contemporary pre-eminence of genetic association studies rather than genetic linkage studies Covers both stand-alone case-control studies, and nested case-control studies in large cohorts Sample size determining in both settings The statistical power of case-control studies
PHOEBE Power calculations Work with the least powerful setting Binary disease, binary genotype, binary environmental exposure Logistic regression; interactions = departure from a multiplicative model Complexity (arbitrary but realistic)
PHOEBE Summarise power using MDORs calculated by ‘iterative simulation’ Estimate minimum ORs detectable with 80% power at stated level of statistical significance under specified scenario
PHOEBE Summary of results 80% power for genotype frequency = 0.1 Genetic main effect 1.5, p=10 -4 5,000 cases Genetic main effect 1.3, p=10 -4 10,000 cases Genetic main effect 1.2, p=10 -4 20,000 cases Genetic main effect 1.4, p=10 -7 10,000 cases Genetic main effect 1.3, p=10 -7 20,000 cases G:E interaction with environmental exposure prevalance = 0.2 2.0, p=10 -4 20,000 cases
PHOEBE Expected event rates in UK Biobank With Anna Hansell, Imperial College
PHOEBE Taking account of Age range at recruitment years Recruitment over 5 years All cause mortality Disease incidence (“healthy cohort effect”) Migration overseas Withdrawal from the study
PHOEBE
Smaller sample sizes
PHOEBE Conclusions Having taken account of realistic bioclinical complexity, a cohort-based biobank such as UK Biobank needs to be very large if it is to provide a stand-alone infrastructure Anything much less than 500,000 recruits will severely curtail the number of diseases that will be able to be studied based on that biobank alone The value of any biobank will be greatly augmented if it proves possible to set up a coherent and scientifically harmonised international network of biobanks and large cohort studies
PHOEBE Harmonising biobanks internationally
PHOEBE Why harmonise? Investigate less common (but not rare!!!) conditions UKBB: Ca stomach 2,500 cases in 29 years 6 UKBB equivalents: 10,000 cases in 20 years Investigate smaller ORs GME 1.5 1.2 requires 5,000 20,000 4 UKBB equivalents Analysis based on subsets – homogeneous classes of phenotype, or e.g. by sex
PHOEBE Why harmonise? Earlier analyses UKBB: Alzheimers disease, 10,000 cases in 18 yrs 5 UKBB equivalents 9 years Events at younger ages Broad range of environmental exposures Aim for 4-6 UKBB equivalents 2M – 3M recruits
PHOEBE International biobank harmonisation programs Public Population Program in Genomics (P 3 G) Tom Hudson, Bartha Knoppers, Leena Peltonen … Population Biobanks FP6 Co-ordination Action (PHOEBE – Promoting Harmonization Of Epidemiological Biobanks in Europe) Camilla Stoltenberg, Leena Peltonen, Paul Burton … Human Genome Epidemiology Network (HuGENet) Muin Khoury, Julian Little …
PHOEBE Extra slides
PHOEBE Hattersley AT, McCarthy MI. Lancet 2005;366: Examples of some polymorphisms or haplotypes that have shown consistent association with complex disease
PHOEBE Genetic main effects
PHOEBE Whole genome scan Genetic main effect, p<10 -7
PHOEBE Gene-environment interaction 20,000 cases
PHOEBE Gene-environment interaction 10,000 cases
PHOEBE Rarer genotypes Genetic main effects
PHOEBE Proposed assessment visit model Welcome5 Consent5 Blood/Urine10 Touchscreen Questionnaire25 Interviewer Questionnaire5 Physical Measures15 Exit5 TOTAL70
PHOEBE Taking account of Age range at recruitment years Recruitment over 5 years All cause mortality Disease incidence (“healthy cohort effect”) Migration overseas Comprehensive withdrawal (max 1/500 p.a.) Partial withdrawal ( c.f Birth Cohort)
PHOEBE
Necessary to contact subjects
PHOEBE Issues that are often ignored in standard power calculations Multiple testing/low prior probability of association* Interactions* Unobserved frailty Misclassification* Genotype Environmental determinant Case-control status Subgroup analyses* Population substructure
PHOEBE Harmonisation Prospective Retrospective Description Comparison Harmonised synthesis
PHOEBE