Gil McVean Department of Statistics

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012.
Recombination and genetic variation – models and inference
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Molecular Evolution. Morphology You can classify the evolutionary relationships between species by examining their features Much of the Tree of Life was.
Sampling distributions of alleles under models of neutral evolution.
Genomes as the Hub of Biology UNIT 2. The hub of biology As biologists, we seek not only to understand how a single organism works, but how organisms.
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
CS177 Lecture 10 SNPs and Human Genetic Variation
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
PACKET #59 CHAPTER #23 Microevolution 10/31/2015 4:20 PM 1.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Modern Evolutionary Biology I. Population Genetics II. Genes and Development: "Evo-Devo" A. Overview Can changes like this….…explain changes like this?
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Genomes and their evolution
Press report 13/10/ publications selected.
Common variation, GWAS & PLINK
Variation among organisms
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Pipelines for Computational Analysis (Bioinformatics)
COALESCENCE AND GENE GENEALOGIES
Gil McVean Department of Statistics, Oxford
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Relationship between Genotype and Phenotype
Bud Mishra Asking big questions:
Genomes and Their Evolution
Genomes and Their Evolution
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
BI820 – Seminar in Quantitative and Computational Problems in Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Incorporating changing population size into the coalescent
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Modern Evolutionary Biology I. Population Genetics
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College.
Modern Evolutionary Biology I. Population Genetics
Dr. Israa ayoub alwan Lec -9-
SNPs and CNPs By: David Wendel.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Gil McVean Department of Statistics Bioinformatics Gil McVean Department of Statistics

What is it to be a human?

What is it to be an individual? Species Diversity (percent) Humans 0.08 - 0.1 Chimpanzees 0.12 - 0.17 Drosophila simulans 2 E. coli 5 HIV1 30 Photos from UN photo gallery www.un.org/av/photo

Is it your genes?

Is it your transcripts?

Is it your proteins?

Is it your protein interactions?

Is it your systems?

Bioinformatics and genome biology Bioinformatics is the analytical wing of genome biology It concerns itself with large amounts of data (more than you can look at!) It uses computers and efficient algorithms It is Data assembly Data summary Data modelling Data analysis

The raw material

The output

Classical bioinformatics I: DNA and protein sequence alignment

Classical bioinformatics II: Genome assembly

Classical bioinformatics III: Gene finding

Classical bioinformatics IV: Protein structure prediction

Bioinformatics of genetic variation An area of considerable current attention is human genetic variation The aim of current experiments is to map the genetic basis of human phenotypic variation Disease susceptibility Normal variation It is challenging because of The scale of the data The structure of the data The underlying processes that shape variation Bioinformatics is needed to Assemble, collate, check and summarise data Model the data Make inferences

What does the data look like? Single Nucleotide Polymorphisms (SNPs) Insertion-Deletion Polymorphisms (INDELs) TGCTTGGCAGGGCAGACTGACTGT TGCTTGGCAGGGCAGACTGACTGT TGCATGGCAGGGCAG-CTGACTGT TGCATGGCAGGGCAG-CTGACTGT TGCATGGCAGGGCAGACTGACTGT TGCATGGCAGGGCAGACTGACTGT SNP INDEL

Collections of SNPs HCB JPT YRI CEU SNP

Engineering challenges Identifying SNPs Working out which SNPs will work on a given platform Controlling the genotyping work-flow Controlling the output quality Performing quality-assurance exercises Identifying problems, gaps and inconsistencies

A Bioinformatics problem: How small is my P-value? The basic idea of association studies is to look for genetic differences between groups Cases (D) It is easy to ask the question “Is there a significant difference in the frequency of a mutation between groups?” Controls (C) Locus of interest

The problems In a study of several hundred thousand mutations (or even millions) it is unlikely that we have actually typed the causal variant(s). In a study of several hundred thousand mutations (or even millions), even if NONE of them are causal a lot of them will show significance at the 5%, 1% or even 0.01% level Differences in the frequency of disease incidence between groups (for example African Americans and European Americans) will be associated with ANY genetic difference between them

What we really want to ask “Does any of the genome show an association with disease over and above any effect I might expect from the correlation between genotype and environmental risk?” “If so, what is the most likely position for the causal mutation(s)?” Answering these questions is difficult, but a natural way to approach the problem is to model the process

Modelling genetic variation Evolutionary parameters Population Sample Stochastic Evolutionary process Stochastic Sampling process Selection Mutation Genetic drift Recombination Migration ATGCATGGGCTATTGGACCT ATGGATGGGCTATTGCACCT MODEL ATGCATGGGCAATTGCACCT ATGCATGGGCAATTGGACCT ATGGATGGGCTATTGCACCT Inference

Genes in populations Present day

Ancestry of current population Present day

Ancestry of sample Present day

The coalescent: samples in populations Most recent common ancestor (MRCA) coalescence Ancestral lineages Present day time

How does this help us to think about mapping disease? Individuals are related to each other through their genealogical history Two nearby points on the genome will have similar genealogical histories, a result of which is that mutations at these positions will also be correlated Understanding how genealogical history changes along the genome (through recombination) and between populations (through historical demography) will allow us to Construct more powerful tests for disease association Localise disease-associated mutations

The bioinformatics module Genomic technologies Annotating genomes Modelling gene evolution Mapping disease genes Measuring gene and protein expression Predicting protein structure