Big Data in UK Biobank: Opportunities and Challenges Funders: Wellcome Trust and Medical Research Council, with Department of Health, Scottish & Welsh.

Slides:



Advertisements
Similar presentations
Health Survey for England Jenny Harris
Advertisements

Registries, Databases & Clinical Networks David J Burn Newcastle University.
Methods of collecting biological data: Considerations, challenges and implications ESRC Research Methods Festival 2012.
SUPERSIZED NATION By Jennifer Ericksen August 24, 2007.
UNIVERSITY OF CAMBRIDGE
Cross-sectional study. Definition in Dictionary of pharmaceutical medicine 2009 by G Nahler Dictionary of pharmaceutical medicine cross-sectional study.
Virginia Rodriguez Funes, MD, FACS El Salvador. Background  The Latin American population it is now the largest single ethnic group in the United States,
Atrial Fibrillation in Patients with Cryptogenic Stroke Gladstone DJ et al. N Engl J Med 2014; 370: Presented by Kris Huston | July 21, 2014.
CVD prevention & management: a new approach for primary care Rod Jackson School of Population Health University of Auckland New Zealand.
Is low-dose Aspirin use associated with a reduced risk of colorectal cancer ? a QResearch primary care database analysis Prof Richard Logan, Dr Yana Vinogradova,
Potential Roles and Limitations of Biomarkers in Alzheimer’s Disease Richard Mayeux, MD, MSc Columbia University.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence July–August 2009.
1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence May–June 2011.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence March–April 2010.
Journal Club Alcohol and Health: Current Evidence January-February 2005.
1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence November–December 2010.
Manish Chaudhary BPH, MPH
Surveillance of Heart Diseases and Stroke Using Centers for Medicare and Medicaid (CMS) Data: A Researcher’s Perspective Judith H. Lichtman, PhD MPH Associate.
Body Weight and Mortality: New Population Based Evidences Body Weight and Mortality: New Population Based Evidences Dongfeng Gu, MD Dongfeng Gu, MD Fu.
Introduction to Molecular Epidemiology Jan Dorman, PhD University of Pittsburgh School of Nursing
Biology in Focus, HSC Course Glenda Childrawi, Margaret Robson and Stephanie Hollis A Search For Better Health Topic 11: Epidemiology.
Y. Chaiter 1, Y. Machluf 1, A. Pirogovsky 2, A. Yona 1, A. Navon 1, O. Tal 2, E. Ringler 1, G. Abbebe-Campino 1, Y. Erlich 1, and N. Ash 2 1 Israel Defence.
Life Line Screening: Age and gender specific prevalence of AAA, PAD, carotid stenosis and AF and their associations with smoking Sarah Lewington CTSU,
Validity and Reliability Dr. Voranuch Wangsuphachart Dept. of Social & Environmental Medicine Faculty of Tropical Medicine Mahodil University 420/6 Rajvithi.
© Goodacre, Slattery, Upton 2007 Understanding Australia’s health This area of study includes: –Measuring the health status of Australians using life expectancy,
Chronic disease and its impact on disability and the need for LTC Carol Jagger Experts' Seminar on Ageing and Long-Term Care Needs 20 May 2011.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Improving the Quality of Physical Health Checks
Results Recruitment 507 out of 4417 patients were eligible to take part in the study 131 of them (25.5%) participated in the study Demographics Male-female.
Nut consumption and diseases 實習生:張瀞文 指導老師:蕭佩珍營養師 1.
Alzheimer’s disease Platforms for translational research.
The EuroHOPE- project: Comparison of treatment and outcome for AMI and stroke patients in Europe On behalf of the EuroHOPE-team: Terje P. Hagen Department.
Illinois State University Exercise and Body Composition Relationships of Total and Regional Body Composition to Morbidity and Mortality.
Sense of coherence and carotid artery intima media thickness measurements: presenting four case reports from the Spili cohort in rural Crete, Greece D.
Dr K N Prasad Community Medicine
National Heart, Lung, and Blood Institute Women’s Health Initiative Branch Jacques Rossouw, MD Chief, WHI Branch Program for Prevention and Population.
Group 7 Burden of disease in Brazil. KEY HEALTH INDICATORS Years of life lost (YLLs): Years of life lost due to premature mortality. Years lived with.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
 Blog questions from last week  hhdstjoeys.weebly.com  Quick role play on stages of adulthood  Early Middle Late  Which component of development are.
Biobanks of Cerice Center for Gene Expression Research in Cancer Epidemiology Eiliv Lund, UiTø.
CANCER CONTROL NHPA’s. What is it? Cancer is a term to describe a diverse group of diseases in which some of the cells in body become defective. The following.
MRC/CSO Social and Public Health Sciences Unit Socioeconomic gradients in coronary heart disease - the relative role of lifestyle Linsay Gray 1, Julie.
Adverse Outcomes After Hospitalization and Delirium in Persons with Alzheimer Disease Charles Wang, PharmD Candidate.
Adding a Biosocial Component to Understanding Society Research Methods Festival Session 56 – Understanding Society July 2010 Stephanie McFall.
The Diabetic Retinopathy Clinical Research Network Effect of Diabetes Education During Retinal Ophthalmology Visits on Diabetes Control (Protocol M) 11.
Identifying Persons in Need of Weight-loss Treatment: Evaluation of Potential Treatment Algorithms Caitlin Mason School of Physical and Health Education.
Health Checks. Introductions Today’s Layout 14:00 – 14:30 Welcome and Introductions Update from Hospital Discharges Slot for any updates from Go To people.
Introduction to Disease Prevalence modelling Day 6 23 rd September 2009 James Hollinshead Paul Fryers Ben Kearns.
Vicky Copley, PHE Risk Factor Intelligence
Breast Cancer Surveillance Consortium (BCSC): A Research Infrastructure sponsored by the National Cancer Institute Breast Cancer Risk Models William Barlow,
Emerging Roles of Large-scale Genome and Epidemiology study Korean Genome and Epidemiology Study (KoGES) by KNIH Yeonjung Kim Ph.D. Division of Epidemiology.
Automated -80°C archive: rapid access to samples.
PUTTING PREVENTION FIRST Vascular Checks/ NHS Health Checks.
UK Biobank: an open-access resource for international health research Rory Collins UK Biobank Principal Investigator/Chief Executive BHF Professor of Medicine.
NHS Health Check programme An opportunity to engage 15 million people to live well for longer Louise Cleaver National Programme Support Manager.
Fast fOod Shops Significantly Impact Cardiovascular Karma - the FOSSICK IV Trial - Reddy M, Luque M, Ferrer Ferrer L, Koju R, Zaman MJ, Inspiron D, Caprnda.
An Overview DCC Annual Retreat February 20, 2012.
CHEST 2014; 145(4): 호흡기내과 R3 박세정. Cigarette smoking ㅡ the most important risk factor for COPD in the US. low value of FEV 1 : an independent predictor.
1 Effect of Ramipril on the Incidence of Diabetes The DREAM Trial Investigators N Engl J Med 2006;355 FM R1 윤나리.
General update from UKB Meeting for scientific community, 26 June 2014, London Access: >700 researchers, >100 applications, >50 approved, >40 have data.
Alcohol, Other Drugs, and Health: Current Evidence July–August 2017
Prevention Diabetes.
Hypertension November 2016
Projects: Background, Design, Study Population, Exposure & Outcome Presentations start Continue on and
Prevention Diabetes Dr Abir Youssef 29/11/2018.
Prospective Studies Collaboration Lancet 2009; 373:
Body-mass index and cause-specific mortality in 900 000 adults: collaborative analyses of 57 prospective studies  Prospective Studies Collaboration  The.
The Cardiovascular Health Study:
Hypertension November 2016
Generating reliable evidence on the determinants of NCDs
Presentation transcript:

Big Data in UK Biobank: Opportunities and Challenges Funders: Wellcome Trust and Medical Research Council, with Department of Health, Scottish & Welsh Governments, British Heart Foundation and Diabetes UK Rory Collins UK Biobank Principal Investigator BHF Professor of Medicine & Epidemiology Nuffield Department of Population Health University of Oxford, UK

UK Biobank Prospective Cohort 500,000 UK men and women aged 40-69 years when recruited and assessed during 2006-2010 Extensive baseline questions and measurements, with stored biological samples (and opportunities to add enhanced assessments in large subsets) Repeat assessments over time in subsets of the participants to allow for sources of variation General consent for follow-up through all health records and for all types of health research Sufficiently large numbers of people developing different conditions to assess causes reliably

Need for prospective studies to be LARGE: CHD versus SBP for 5K vs 50K vs 500K people in the Prospective Studies Collaboration (PSC) 5000 people 50,000 people 500,000 people Age at risk: 256 256 256 Age at risk: 80-89 80-89 128 128 128 70-79 70-79 64 Age at risk: 64 64 80-89 60-69 60-69 32 32 32 50-59 70-79 60-69 50-59 16 16 16 40-49 8 8 8 50-59 40-49 Data taken from the Prospective Studies Collaborative meta-analysis co-ordinated by CTSU, Oxford and illustrates graphically the need for very large sample sizes to reliably study interactions (in this example the interaction between BP and age). The first graph (All) shows the relationship between systolic BP and age on Ischemic Heart Disease (IHD – floating absolute risk) in 1-million people published in the Lancet in 2002 and the other two graphs show the same relationship in randomly selected samples of 500,000 and 50,000. The tight 95% Confidence Intervals illustrate the reliability of the large sample size in the first two graphs, but the sample in only 50,000 (which is still orders of magnitude larger than many studies) has very wide confidence intervals, particularly in younger people. Key Message: studies at least as large as UK Biobank are required to study gene-environment interactions and for many interactions reliable evidence will only be generated by combining data from 4-5 biobanks worldwide – this emphasises the need for harmonisation of biobanks at an international level. 4 40-49 4 4 2 2 2 1 1 1 120 140 160 180 120 140 160 180 120 140 160 180 Usual SBP (mmHg) Usual SBP (mmHg) Usual SBP (mmHg)

Locations of UK Biobank assessment centres around the UK (with people recruited from urban and rural areas)

UK Biobank: 500,000 participants aged 40-69 recruited in 2007-10 40-49 119,000 50-59 168,000 60-69 213,000 Gender Male 228,000 Female 270,000 Deprivation More 92,000 Average 166,000 Less 241,000 Generalisability (not representativeness): Heterogeneity of study population allows associations with disease to be studied reliably

Production line baseline assessment visit (improved throughput; efficient staffing)

Baseline assessment: Questionnaire content Self-completion: topics Median time (minutes) Socio-demographics 1.7 Ethnicity 0.1 Work-employment 1.4 Physical activity 4.4 Smoking (non-smokers) 0.5 (past/current smokers) 1.5 Diet (food frequency)* 4.5 Alcohol 1.1 Sleep 1.2 Sun exposure 1.3 Environmental exposures 1.0 Early life factors 0.8 Family history of common diseases 1.6 Reproductive history & screening (women) 2.4 (men) 0.8 Sexual history 0.4 General health 2.1 Past medical history & medications 1.6 Noise exposure 1.0 Psychological status 4.5 Cognitive function tests 10.0 Hearing speech-in-noise test 8.0 Total time 52.5 Interview: topics Median time (minutes) Medical history/medication 3.1 Occupation 0.4 Other 0.6 Total time 4.1 *Subset of 200,000 participants: repeated daily diet diaries conducted via the internet Touchscreen and interview questions (plus extra enhancement questions) available at www.ukbiobank.ac.uk

Baseline assessment: Physical measurements (with enhanced measures in large subsets) All 500,000 participants Blood pressure & heart rate Height (standing/seated) Waist/hip circumference Weight/impedance Spirometry Heel ultrasound Subset: 175,000 participants Hearing test Vascular reactivity Subset: 120,000 participants Visual acuity, refractive index & intraocular pressure Subset: 85,000 participants Retinal images & optical coherence tomograms Fitness test & ECG limb leads

UK Biobank different types of biological sample: allowing a wide range of different assays Sample collection tube Fractions collected Potential assays Na+ EDTA Plasma Buffy coat Red cells Plasma proteome and metabonome Assays of genomic DNA Membrane lipids and heavy metals Lithium Heparin (PST) Plasma proteome and metabonome (without haemolysis) Silica clot accelerator (SST) Serum Serum proteome and metabonome (without haemolysis) Acid citrate dextrose Whole blood Assays of DNA extracted from EBV immortalised cell lines (B-cell transcriptome) EDTA Standard haematological parameters Tempus RNA stabilisation Whole blood with lysis reagent Blood transcriptome Representative transcriptomes of other tissues Urine Urine proteome and metabonome Gut microbiome Saliva Mixed saliva sample Salivary proteome and metabonome Salivary microbiome (Mucosal proteome and metabonome)

Further enhancements of the phenotyping of UK Biobank participants currently being conducted Web-based assessments of diet completed

Web-based dietary assessment: 24-hr recall Design considerations: Easy and quick: takes only 10-15 minutes Automated data collection and coding Repeatable (capturing seasonal variation) Detailed enough to estimate nutrient intake Over 200,000 participants completed the questionnaire at least once, and about 90,000 did so more than once

Future web-based assessments for exposures Cognitive function Repeat assessment of baseline measures Broaden cognitive phenotyping with new measures Complements enhanced cognitive function assessment that is planned for the imaging assessment visit Occupational history Information about all previous occupations (not just latest) Greater detail on type of work and duration Physical activity questionnaire (RPAQ) Complement data from activity monitor

Further enhancements of the phenotyping of UK Biobank participants currently being conducted Web-based assessments of diet completed; and next to be cognition/mental health (2014) Wrist-worn accelerometers to be mailed to all participants who agree to wear one (2013-15)

UK Biobank wrist-worn accelerometer ~45% of participants agree to wear one Willing participants sent device by mail It is to be worn continuously for 7 days Returned by mail and data downloaded Device cleaned and sent to next participant 100K participants from mid-2013 to mid-2015 (50,000 complete data-sets already obtained)

Further enhancements of the phenotyping of UK Biobank participants currently being conducted Web-based assessments of diet completed; and next to be cognition/mental health (2014) Wrist-worn accelerometers to be mailed to all participants who agree to wear one (2013-15) Biobank chip to genotype (GWAS; candidate SNPs; exome) all participants (2013-15)

Genotyping of all UK Biobank participants 820K bespoke UK Biobank Affymetrix genotyping chip: 250,000 SNPs in a whole-genome array 200,000 markers for known risk factor or disease associations, copy number variation, loss of function, and insertions/deletions 150,000 exome markers for high proportion of non-synonymous coding variants with allele frequency over 0.02% Estimate (“impute”) additional genotypes by combining measured genotypes with reference sequence data Researchers can study associations of genotype data with biochemical risk factors and detailed phenotyping from baseline assessment, along with health outcomes

Further enhancements of the phenotyping of UK Biobank participants currently being conducted Web-based assessments of diet completed; and next to be cognition/mental health (2014) Wrist-worn accelerometers to be mailed to all participants who agree to wear one (2013-15) Biobank chip to genotype (GWAS; candidate SNPs; exome) all participants (2013-15) Standard panel of assays (e.g. lipids; clotting) on samples from all participants (2014-15)

Rationale for assaying many standard markers in baseline samples from all 500,000 participants   Cost-effective way of increasing the usability of the resource for researchers, by providing data for: Cross-sectional analyses with prevalent disease Identification of subsets based on assay values Conducting these assays in all of the participants at the same time should facilitate good quality control Lower cost for conducting all of these assays at one time rather than in multiple retrievals and assays Facilitates management of depletable samples

Consideration of a proposal to conduct assays of biomarkers of infectious disease in all participants Request from the international research community to facilitate studies of the associations of infectious agents with disease (in particular, different types of cancer) Plan would be to assay a panel of infectious agents (e.g. HPV, Hepatitis B & C, HBV, EBV, H. pylori) in the baseline sample collected from all 500,000 participants As with the biochemical and genetic assays that are being conducted, assays of a wide range of infectious agents would increase the efficient use of the resource Detailed proposal for funding is now being developed

Further enhancements of the phenotyping of UK Biobank participants currently being conducted Web-based assessments of diet completed; and next to be cognition/mental health (2014) Wrist-worn accelerometers to be mailed to all participants who agree to wear one (2013-15) Biobank chip to genotype (GWAS; candidate SNPs; exome) all participants (2013-15) Standard panel of assays (e.g. lipids; clotting) on samples from all participants (2014-15) Information from multiple imaging modalities (e.g. brain/heart/body MRI; bone/joint DEXA)

Imaging of 100,000 UK Biobank participants MRI of brain, heart and abdomen DEXA of bones, joints and body Ultrasound of carotid arteries Shortened baseline assessment plus more detailed cognitive function tests and ECG to detect rhythm disturbances Pilot phase: 4-6,000 people in 1 centre (2014-15) Main phase: 95,000 people in 3 centres (2015-19) Opportunities for repeat imaging in sub-sets (e.g. as part of MRC’s focus on dementia)

(floated so mean = PSC rates at age 65-69) Body Mass Index (BMI) vs Heart Disease and Stroke (PSC:1M people followed for 12 years; Lancet 2009) 160 Heart disease (18 237 deaths) At BMI >25: 5 units higher BMI associated with ~40% higher IHD & stroke mortality Annual deaths per 1000 (floated so mean = PSC rates at age 65-69) 80 At BMI <25: positive association continues for IHD, but not for stroke 40 Stroke (6122 deaths) 20 10 15 20 25 30 35 40 50 Baseline BMI (kg/m2) Adjusted for age, sex, smoking & study; first 5 years of follow-up excluded 22

Similar age, gender, BMI & % body fat, but different amounts of INTERNAL FAT 5.86 litres of internal Fat 1.65 litres of internal fat 23

Mortality: little change Atrial fibrillation (AF): prevalence and mortality during the period between 1993 and 2007 Prevalence: increasing Mortality: little change Piccini et al. Circulation: Cardiovascular Quality and Outcomes. 2012

Consideration of prolonged cardiac monitoring Cardiac arrhythmias (especially AF) can indicate significant underlying cardiac disease can directly cause significant morbidity and mortality important risk factors for cardio-embolic events (esp. stroke) Detection requires prolonged monitoring many are intermittent (e.g. paroxysmal AF) substantial under-detection with standard 12 lead ECG AF increases with age (<50 years: <1%; >80 years: 10%+) No large-scale population-based prospective studies with prolonged monitoring, so the full extent/impact of AF on health outcomes is likely to have been underestimated

Example of device for prolonged arrhythmia detection iRhythmZio Patch Has been used in 18,000 people Non-invasive stick-on patch Comfortable (median wear 12 days) Can be applied in clinic or at home Beat-to-beat ECG recording Validated against reference Holter Potentially recyclable device chip which stores data for downloading Planning to pilot feasibility and acceptability during imaging pilot

UK Biobank: Centralised follow-up of health Death and cancer registries In-patient and out-patient hospital episodes (including psychiatric) and related procedure registries Primary care records of health conditions, prescriptions, diagnostic tests and other investigations Other health-related: disease registries; dispensing records; imaging; screening; dental records Direct to participants: self-reported medical conditions; treatments actually being taken; degree of functional impairment; cognitive and psychological scores

Health outcome data-linkage challenges Regulation, bureaucracy, and permissions (despite explicit consent from participants) Data transfer, matching and coding queries Understanding different data structures Mapping between coding systems Mapping between different countries Presenting outcome data to researchers Original outcome codes Post-adjudication outcomes

Progress with UK-wide linkage to outcome data (both before and after baseline assessment) Key messages: The slide shows the data types, countries and data providers from the perspective of a UK cohort study with participants recruited in England (89%), Scotland (7%) and Wales (4%). It demonstrates just some of the complexity of the processes required to link to, incorporate and make available for researchers what might seem superficially to be very straightforward data from these major sources Deaths and cancers: It has been possible for many years to obtain routine coded data on deaths by cause and cancer registrations using similar systems across England, Scotland and Wales Currently flagging of a cohort is done by the Medical Research information Service at the NHS Information Centre for England and Wales, and the NHS Central Register in Scotland. Data formats are different for Scotland but the type of information is essentially the same. Hospital discharge data: It is also possible to obtain nationwide hospital inpatient and outpatient coded data from Scotland, Wales and England from separate sources for each country as shown. Data formats vary by country but the type of information is similar. Primary care data: Scotland and Wales are now able to link to coded primary care data for around half of their populations. England is following with the development of the GP extraction system. In all three countries, national linkage ‘one stop shops’ are being developed to pull all of these data (and a range of other country specific datasets) together for easier access for research purposes. The most comprehensive and accessible is currently in Wales – SAIL (Secure Anonymised Information Linkage system, developed collaboratively between Health Info Research Unit, Swansea University and NHS Wales. The Scottish system (Scottish Health Informatics Programme) is similar although not so readily accessible. A new English system (CPRD) aims to do something similar to SAIL and SHIP – coverage currently is limited especially for primary care (currently 10-20%) but ambitious plans for wider population coverage. The whole field is made over-complicated by : - Frequent developments of new initiatives and systems, not all of which survive - Frequent relabelling of existing systems, Different regulatory mechanisms for accessing data for each data provider Differences between countries in the structure of the NHS and legislation/regulation The other datasets that initiatives such as SAIL, SHIP and CPRD either have currently on a country-wide basis or are moving towards include: laboratory reports, imaging reports, and disease registry and audit systems. At present, these tend to be patchy in coverage. In general Wales and Scotland have more country wide datasets available than England (although patches of England are good for various different types of data), and accessibility of data from a one stop shop is easiest for Wales.

Meaning of coded data from health records What do the coded data actually tell us? Characteristics of coded data How accurate? How detailed? How complete? Do we need to go beyond the coded data?

UK Biobank: Expected numbers of participants developing diseases during long-term follow-up Condition 2012 2017 2022 Diabetes 10,000 25,000 40,000 MI/CHD death 7,000 17,000 28,000 Stroke 2,000 5,000 9,000 COPD 3,000 8,000 14,000 Breast cancer 2,500 6,000 Colorectal cancer 1,500 3,500 Prostate cancer Lung cancer 800 4,000 Hip fracture Rh. arthritis Alzheimer’s

General strategy for outcome adjudication Avoid false positive cases (but tolerate some false negatives) Geographical generalisability Cost-effectiveness Future-proofed Scalability Staged approach: Ascertain Confirm Classify

Staged approach to outcome adjudication CHARACTERISTICS POSSIBLE DATA SOURCES ASCERTAINMENT of suspected cases Cost-effective Feasible Scalable Death registers Cancer registers Hospital episodes Primary care records Web-based questionnaires

Staged approach to outcome adjudication CHARACTERISTICS POSSIBLE DATA SOURCES ASCERTAINMENT of suspected cases Cost-effective Feasible Scalable Death registers Cancer registers Hospital episodes Primary care records Web-based questionnaires CONFIRMATION of “case-ness” As above, but greater cost/lower feasibility Cross-referencing e-records Disease registers

Staged approach to outcome adjudication CHARACTERISTICS POSSIBLE DATA SOURCES ASCERTAINMENT of suspected cases Cost-effective Feasible Scalable Death registers Cancer registers Hospital episodes Primary care records Web-based questionnaires CONFIRMATION of “case-ness” As above, but greater cost/lower feasibility Cross-referencing e-records Disease registers CLASSIFICATION of disease cases More involved and costly per case Review of clinical records Tumour collections/assays Specialised databases (e.g. imaging)

Expert Working Groups developing protocols for ascertainment, confirmation and classification Cancer Diabetes Cardiac outcomes Stroke Mental health outcomes Ocular outcomes Neurodegenerative outcomes Respiratory outcomes Musculoskeletal outcomes Pilots progressing well; preparing for scaling up of algorithms and then for web adjudication Pilots commencing Pilots being developed

UK Biobank: Principles of Access UK Biobank is available to all bona fide researchers for all types of health-related research that is in public interest No preferential or exclusive access (and, in particular, access does not involve “collaboration” with UK Biobank) Researchers have to pay for access to the Resource for their proposed research on a cost-recovery basis only Access to the biological samples that are limited and depletable will be carefully controlled and coordinated Researchers are required to publish their findings and return the data so that other researchers can use them

“Showcase”: e-catalogue of data items currently in the UK Biobank Resource (www.ukbiobank.ac.uk)

Showcase supports search strategies for data items in the UK Biobank Resource

Body Composition: % Body Fat

Preliminary applications subdivided by type of researcher, location and type of research

What makes UK Biobank special? PROSPECTIVE: It can assess the full effects of a particular exposure (such as smoking) on all types of health outcome (such as cancer, vascular disease, lung disease, dementia) DETAILED: The wide range of questions, measures and samples at baseline allows good assessment of exposures, and outcome adjudication allows good disease classification BIG: Inclusion of large number of participants allows reliable assessment of the causes of a wide range of diseases, and of the combined impact of many different exposures Unique combination of BREADTH and DEPTH