1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Lecture 41 Prof Duncan Shaw. Genetic Variation Already know that genes have different alleles - how do these arise? Process of mutation - an alteration/change.
Lecture 2 Strachan and Read Chapter 13
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Genetic Analysis in Human Disease
Polymorphisms: Clinical Implications By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of Medicine, KSU.
Genomics, Cancers & Infectious Diseases Qunyuan Zhang Division of Statistical Genomics Washington University School of Medicine.
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Human Genetics Overview.
An Update in Genetics of Epilepsy
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Genomics Alexandra Hayes. Genomics is the study of all the genes in a person, as well as the interactions of those genes with each other and a person’s.
Understanding Genetics of Schizophrenia
Whole Exome Sequencing for Variant Discovery and Prioritisation
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Georgia Wiesner, MD CREC June 20, GATACAATGCATCATATG TATCAGATGCAATATATC ATTGTATCATGTATCATG TATCATGTATCATGTATC ATGTATCATGTCTCCAGA TGCTATGGATCTTATGTA.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
What is the Human Genome Project? Identify all the approximately 35,000 genes in human DNA Determine the sequences of the 3,000,000,000 bases ( = 200 phone.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Let’s think about it… What are autosomes? What are sex chromosomes?
Next-Generation Sequencing
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
Allele. Alternate form of a gene gene variant autosome.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
CS177 Lecture 10 SNPs and Human Genetic Variation
Genomes and Genomics.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Your genome: What does your DNA say about you? Personal Genetics Education Project (pgEd) Harvard Medical School personal genetics education.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Human Genetics and the Pedigree. Section Objectives Understand how different mutations occur. Be able to identify different diseases and disorders.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Class 22 DNA Polymorphisms Based on Chapter 10 Recombinant DNA Technology Copyright © 2010 Pearson Education Inc.
Chapter 12 Assessment How could manipulating DNA be beneficial?
Genetic disorders can be due to any of the following factors: A. Monogenetic Disorders: Caused by a mutation in a single gene 1. Autosomal recessive alleles:
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
Unit 1 – Living Cells.  The study of the human genome  - involves sequencing DNA nucleotides  - and relating this to gene functions  In 2003, the.
Notes: Human Genome (Right side page)
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
Personal DNA Testing Melanie Wark Mackenzie Steen.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Chapter 13 Section 13.3 The Human Genome. Genomes contain all the information needed for an organism to grow and survive The Human Genome Project (HGP)
Genome-Wides Association Studies (GWAS) Veryan Codd.
Genomics and Disease Gene Identification. Is the Disease Genetic or Environmental.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Clinical Interpretation and Implications of Whole-Genome.
Interpreting exomes and genomes: a beginner’s guide
Single Nucleotide Polymorphisms (SNPs
Nucleotide variation in the human genome
Gil McVean Department of Statistics
Interpretation Next Generation Sequencing (Bench Clinic)
Human Cells Human genomics
School of Pharmacy, University of Nizwa
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
what are autosomes? What are sex chromosomes?
Linking Genetic Variation to Important Phenotypes
By Michael Fraczek and Caden Boyer
Chapter 7 Multifactorial Traits
Class Notes #8: Genetic Disorders
Presentation transcript:

1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics

2 The human genome 22 chromosomes + X and Y Sequence of 3,200 million base pairs (of A,T,G,C) Codes for ~30,000 genes 1000s? of genes contain mutations contributing to disease ‘phenotypes’

3 Single nucleotide polymorphisms (SNPs or ‘snips’) DNA sequence variation in one nucleotide: A, T, C, or G ~15 million+ SNPs – 90% of genetic variation Two forms (alleles)- a C/T SNP has ‘genotypes’ C,C or C,T or T,T How to link genotype(s) with disease phenotype(s)? Look for shared SNP mutations in families, cases vs controls

4 Disease gene mapping - timeline : ‘linkage mapping’-rare genes causing severe disease in families – Cystic Fibrosis, Huntington’s disease : ‘association mapping’ common genes involved in common disease (asthma, heart disease diabetes) - case control studies (~1 million SNPs) 2010-onwards: ‘next generation sequencing’ – test all 15 million+ SNPs. Low frequency variants with intermediate effect on common disease

5 Human genome project - timeline 1990: Start of ‘Human Genome Project’ (to generate one genome sequence) 2003: One sequence completed: cost $300 million 2010: 3,000 sequences now completed 2011: 30,000 sequences expected: cost ~$5000 each

6

7 Breast cancer genetics Rare genes (clear inheritance in families): <25% of inherited risk Common genes (low-risk, association mapping) <5% of inherited risk ~70% of risk not explained by all breast cancer genes found so far – so, many genes are ‘missing’…

8 Susceptibility genes in breast cancer: more is less? “The large number of anticipated susceptibility factors, their low predictive value and the high frequency of these variants…..make these findings of limited use in clinical practice” Ref: Willems PJ (2007) Clin Genet 72:

9 183,000 samples: found 180 ‘height’ genes –enriched for genes in shared biological pathways –and genes involved in skeletal growth defects Genes found only explain 10% of variation in height Many genes missing…..

10 Linking genes with disease

11 “1000 Genomes” - a deep catalog of human variation July 2010 –Sequenced 6 people (two families - parents and a daughter) –sequenced genomes of 179 people –sequencing exons 700 people (‘exomes’ -protein-coding genome) Ongoing –2,500 DNA samples from 27 populations around the world Next generation sequencing

12 Copyright ©2009 American Association for Clinical Chemistry Next generation sequence data analysis

13 Exome data Sequence of protein-coding exons-one ‘exome’ contains coding regions of all ~30,000 genes Exome contains 30 megabases DNA (whole genome has 3200 megabases) Detect all SNP variation in a person. Align ‘short reads’ (millions of sequences of ~100 bases against the reference genome) Requires 40X ‘depth’ to reliably identify all DNA variation

14 Sequence data – the ‘filtering’ problem Each person has mutations that could affect protein function and mutations implicated in inherited disorders. Most variants have no effect on health To find disease gene(s) filter out ‘normal’ variation (reference data:1000 genomes, web databases) Common disease may involve complex interactions between networks of 100’s of genes Machine learning and other mathematical tools required to interpret complex phenotype/sequence data

15

16

17 “The production of billions of NGS reads has also challenged the infrastructure of existing information technology systems in terms of data transfer, storage and quality control, computational analysis to align or assemble read data….” “Advances in bioinformatics are ongoing, and improvements are needed if these systems are to keep pace with the continuing developments in NGS technologies. It is possible that the costs associated with downstream data handling and analysis could match or surpass the data- production costs…” (Metzker, Nat Rev Genet 2010, 11, )

18 Some applications of DNA sequence data Disease gene mapping Disease diagnosis/disease sub-types Differences between populations, migration patterns Biotechnology (bacterial genomes, genetic engineering) Infectious disease control Evolution/Taxonomy/classification Archaeology Forensic science

19 Machine learning to identify genetic factors in breast cancer 3000 cases with early-onset breast cancer (Southampton data), genotyped with 1000s of SNPs Identify new breast cancer genes – integrate phenotypic data (tumour sub-types, survival, response to treatment) with genotypes/sequence and gene functional information (web databases) Machine learning models: test gene : gene and gene : phenotype interactions. New genes? Groups of genes distinguishing sub-types of disease?

20 Web-based tools to improve diagnosis of ‘dosage’ diseases ‘Dosage’ – number of copies of a gene (more or less than 2 due to duplication or deletion) Gene(s) in duplicated/deleted might cause disease if abnormal ‘dose’. Which gene(s)? Identification influences patient treatments. Data-mine for known gene function in literature/databases. Prediction of disease causing genes - machine learning models (integrate gene function, expression, known ‘dosage genes’) Web-based tools for mining/querying/presentation of data for clinicians to improve diagnosis.

21 Conclusions Majority of genetic variation underlying human disease is unknown Next-generation sequencing will, in time, reveal all of these genes But…finding missing disease genes in DNA sequence presents huge challenges for medicine, mathematics and computer/web science Exome and whole genome sequence analysis will transform all ‘bioscience’ research fields NGS is now generating vast data sets - novel and multidisciplinary approaches to management, visualisation, analysis and interpretation are urgently needed