010101100010010100001010101010011011100110001100101000100101 Welcome to CS374! A survey of computer science in genomics today ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.

Slides:



Advertisements
Similar presentations
Applications of genome sequencing projects 1) Molecular Medicine 2) Energy sources and environmental applications 3) Risk assessment 4) Bioarchaeology,
Advertisements

applications of genome sequencing projects
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Are you ready for the genomic age? An introduction to human genomics Jacques Fellay EPFL School of Life Sciences Swiss Institute of Bioinformatics Lausanne,
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Credits to Vanessa Patel for some of the slides.
Atelier INSERM – La Londe Les Maures – Mai 2004
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
DNA Sequencing. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT.
CS273a Lecture 1, Autumn 10, Batzoglou DNA Sequencing.
DNA Sequencing. CS273a Lecture 3, Autumn 08, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
Human population migrations Out of Africa, Replacement –Single mother of all humans (Eve) ~150,000yr –Single father of all humans (Adam) ~70,000yr –Humans.
Human population migrations Out of Africa, Replacement –Single mother of all humans (Eve) ~150,000yr –Single father of all humans (Adam) ~70,000yr –Humans.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
 Archaeology – “the scientific study of material remains (as fossil relics, artifacts, and monuments) of past human life and activities”  Studies.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Medical variations Gabor T. Marth Boston College Biology Department BI543 Fall 2013 February 5, 2013.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
The Biology and Genetic Base of Cancer. 2 (Mutation)
Molecular & Genetic Epi 217 Association Studies
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Molecular & Genetic Epi 217 Association Studies: Indirect John Witte.
What is a SNP?. Lecture topics What is a SNP? What use are they? SNP discovery SNP genotyping Introduction to Linkage Disequilibrium.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
The International Consortium. The International HapMap Project.
DNA Sequencing.
Motivations to study human genetic variation
Copyright OpenHelix. No use or reproduction without express written consent1.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Unit 1 – Living Cells.  The study of the human genome  - involves sequencing DNA nucleotides  - and relating this to gene functions  In 2003, the.
Sequencing of the South Asian Genome Lamri Amel Postdoctoral fellow 1.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Human Population Genomics
Gil McVean Department of Statistics
Population genetics Dr Gavin Band
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Detection of the footprint of natural selection in the genome
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Detection of human adaptation during the past 2000 years
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Volume 152, Issue 8, Pages (June 2017)
KDM4A SNP-A482 (rs586339) correlates with worse outcome in patients with NSCLC. A, schematic of the human KDM4A protein is shown with both the protein.
Presentation transcript:

Welcome to CS374! A survey of computer science in genomics today ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

CS374 – Course Goals Survey of current research in computational genomics Practice giving a stellar presentation Practice reading literature

CS374 – Course Requirements Presentation Critique of one topic Summaries of two topics Class attendance

Introduction: DNA sequencing ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

DNA – what is a genome? DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding RNA folding

Human Genome Project 1990: Start 2000: Bill Clinton: 2001 : Draft 2003: Finished $3 billion 3 billion basepairs “most important scientific discovery in the 20th century” now what?

There is never “enough” sequencing 100 million species 7 billion individuals Somatic mutations (e.g., HIV, cancer) Sequencing is a functional assay

Sequencing Growth Cost of one human genome 2004: $30,000, : $100, : $10, : $4,000 (today) : $1,000 ???: $300 How much would you pay for a smartphone?

DNA Sequencing – Gel Electrophoresis “Ancient” method, used for the human genome 1.Start at primer(restriction site) 2.Grow DNA chain 3.Include dideoxynucleoside (modified a, c, g, t) 4.Stops reaction at all possible points 5.Separate products with length, using gel electrophoresis

DNA Sequencing - Illumina

Medicine –Mendelian diseases –Cancer –Drug dosage (eg. Warfarin) –Disease risk –Diagnosis of infections –… Ancestry Genealogy Nutrition? Psychology? Baby Engineering???... Uses of Genomes

GINA: Genetic information cannot be used by insurance & employers –Covers relatives up to 4 th degree –Excludes life & disability insurance Overdiagnosis Bad news you’d rather not find out Paternity testing Genetic engineering of babies? … Ethical Issues

Cost Killer apps Roadblocks? How soon will we all be sequenced? Time 2013? 2018? Cost Applications

Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

The Hominid Lineage

Human population migrations Out of Africa, Replacement –Single mother of all humans (Eve) ~150,000yr –Single father of all humans (Adam) ~70,000yr –Humans out of Africa ~50000 years ago replaced others (e.g., Neandertals) Multiregional Evolution –Generally debunked, however, –~5% of human genome in Europeans, Asians is Neanderthal, Denisova

Coalescence Y-chromosome coalescence

Why humans are so similar A small population that interbred reduced the genetic variation Out of Africa ~ 50,000 years ago Out of Africa

Migration of Humans

Migration of Humans

Some Key Definitions Mary: AGCCCGTACG John: AGCCCGTACG Josh: AGCCCGTACG Kate: AGCCCGTACG Pete: AGCCCGTACG Anne: AGCCCGTACG Mimi: AGCCCGTACG Mike: AGCCCTTACG Olga: AGCCCTTACG Tony: AGCCCTTACG Mary: AGCCCGTACG John: AGCCCGTACG Josh: AGCCCGTACG Kate: AGCCCGTACG Pete: AGCCCGTACG Anne: AGCCCGTACG Mimi: AGCCCGTACG Mike: AGCCCTTACG Olga: AGCCCTTACG Tony: AGCCCTTACG Alleles: G, T Major Allele: G Minor Allele: T G/G G/T G/G T/T T/G G/G G/T G/G T/T T/G Recombinations: At least 1/chromosome On average ~1/100 Mb Linkage Disequilibrium: The degree of correlation between two SNP locations MomDad

Human Genome Variation SNP TGCTGAGA TGCCGAGA Novel Sequence TGCTCGGAGA TGC GAGA Inversion Mobile Element or Pseudogene Insertion TranslocationTandem Duplication Microdeletion TGC - - AGA TGCCGAGA Transposition Large Deletion Novel Sequence at Breakpoint TGC

The Fall in Heterozygosity H – H POP F ST = H H – H POP F ST = H

The HapMap Project ASWAfrican ancestry in Southwest USA 90 CEUNorthern and Western Europeans (Utah) 180 CHBHan Chinese in Beijing, China 90 CHDChinese in Metropolitan Denver100 GIHGujarati Indians in Houston, Texas100 JPTJapanese in Tokyo, Japan 91 LWKLuhya in Webuye, Kenya100 MXLMexican ancestry in Los Angeles 90 MKKMaasai in Kinyawa, Kenya180 TSIToscani in Italia100 YRIYoruba in Ibadan, Nigeria100 Genotyping: Probe a limited number (~1M) of known highly variable positions of the human genome

Linkage Disequilibrium & Haplotype Blocks pApA pGpG Linkage Disequilibrium (LD): D = P(A and G) - p A p G Linkage Disequilibrium (LD): D = P(A and G) - p A p G Minor allele: A G

Population Sequencing – 1000 Genomes Project 1000 Genomes Project Population Sequencing – 1000 Genomes Project 1000 Genomes Project The 1000 Genomes Project Consortium et al. Nature 467, (2010) doi: /nature09534

The Cancer Genomes Atlas – TCGA

Association Studies Control Disease

Global Ancestry Inference

Fixation, Positive & Negative Selection Neutral Drift Positive Selection Negative Selection How can we detect negative selection? How can we detect positive selection?

Conservation and Human SNPs CNSs have fewer SNPs SNPs have shifted allele frequency spectra CNSs have fewer SNPs SNPs have shifted allele frequency spectra Neutral CNS

How can we detect positive selection? Ka/Ks ratio: Ratio of nonsynonymous to synonymous substitutions Very old, persistent, strong positive selection for a protein that keeps adapting Examples: immune response, spermatogenesis Ka/Ks ratio: Ratio of nonsynonymous to synonymous substitutions Very old, persistent, strong positive selection for a protein that keeps adapting Examples: immune response, spermatogenesis

How can we detect positive selection?

Long Haplotypes –iHS test Less time: Fewer mutations Fewer recombinations