Presentation is loading. Please wait.

Presentation is loading. Please wait.

010101100010010100001010101010011011100110001100101000100101 Welcome to CS374! A survey of computer science in genomics today ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.

Similar presentations


Presentation on theme: "010101100010010100001010101010011011100110001100101000100101 Welcome to CS374! A survey of computer science in genomics today ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG."— Presentation transcript:

1 010101100010010100001010101010011011100110001100101000100101 Welcome to CS374! A survey of computer science in genomics today ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

2 CS374 – Course Goals Survey of current research in computational genomics Practice giving a stellar presentation Practice reading literature

3 CS374 – Course Requirements Presentation Critique of one topic Summaries of two topics Class attendance

4 010101100010010100001010101010011011100110001100101000100101 Introduction: DNA sequencing ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

5 DNA – what is a genome? DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding RNA folding

6 Human Genome Project 1990: Start 2000: Bill Clinton: 2001 : Draft 2003: Finished $3 billion 3 billion basepairs “most important scientific discovery in the 20th century” now what?

7 There is never “enough” sequencing 100 million species 7 billion individuals Somatic mutations (e.g., HIV, cancer) Sequencing is a functional assay

8 Sequencing Growth Cost of one human genome 2004: $30,000,000 2008: $100,000 2010: $10,000 2011: $4,000 (today) 2012-13: $1,000 ???: $300 How much would you pay for a smartphone?

9 DNA Sequencing – Gel Electrophoresis “Ancient” method, used for the human genome 1.Start at primer(restriction site) 2.Grow DNA chain 3.Include dideoxynucleoside (modified a, c, g, t) 4.Stops reaction at all possible points 5.Separate products with length, using gel electrophoresis

10 DNA Sequencing - Illumina

11 Medicine –Mendelian diseases –Cancer –Drug dosage (eg. Warfarin) –Disease risk –Diagnosis of infections –… Ancestry Genealogy Nutrition? Psychology? Baby Engineering???... Uses of Genomes

12 GINA: Genetic information cannot be used by insurance & employers –Covers relatives up to 4 th degree –Excludes life & disability insurance Overdiagnosis Bad news you’d rather not find out Paternity testing Genetic engineering of babies? … Ethical Issues

13 Cost Killer apps Roadblocks? How soon will we all be sequenced? Time 2013? 2018? Cost Applications

14 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

15 The Hominid Lineage

16 Human population migrations Out of Africa, Replacement –Single mother of all humans (Eve) ~150,000yr –Single father of all humans (Adam) ~70,000yr –Humans out of Africa ~50000 years ago replaced others (e.g., Neandertals) Multiregional Evolution –Generally debunked, however, –~5% of human genome in Europeans, Asians is Neanderthal, Denisova

17 Coalescence Y-chromosome coalescence

18 Why humans are so similar A small population that interbred reduced the genetic variation Out of Africa ~ 50,000 years ago Out of Africa

19 Migration of Humans

20 http://info.med.yale.edu/genetics/kkidd/point.html

21 Migration of Humans http://info.med.yale.edu/genetics/kkidd/point.html

22 Some Key Definitions Mary: AGCCCGTACG John: AGCCCGTACG Josh: AGCCCGTACG Kate: AGCCCGTACG Pete: AGCCCGTACG Anne: AGCCCGTACG Mimi: AGCCCGTACG Mike: AGCCCTTACG Olga: AGCCCTTACG Tony: AGCCCTTACG Mary: AGCCCGTACG John: AGCCCGTACG Josh: AGCCCGTACG Kate: AGCCCGTACG Pete: AGCCCGTACG Anne: AGCCCGTACG Mimi: AGCCCGTACG Mike: AGCCCTTACG Olga: AGCCCTTACG Tony: AGCCCTTACG Alleles: G, T Major Allele: G Minor Allele: T G/G G/T G/G T/T T/G G/G G/T G/G T/T T/G Recombinations: At least 1/chromosome On average ~1/100 Mb Linkage Disequilibrium: The degree of correlation between two SNP locations MomDad

23 Human Genome Variation SNP TGCTGAGA TGCCGAGA Novel Sequence TGCTCGGAGA TGC - - - GAGA Inversion Mobile Element or Pseudogene Insertion TranslocationTandem Duplication Microdeletion TGC - - AGA TGCCGAGA Transposition Large Deletion Novel Sequence at Breakpoint TGC

24 The Fall in Heterozygosity H – H POP F ST = ------------- H H – H POP F ST = ------------- H

25 The HapMap Project ASWAfrican ancestry in Southwest USA 90 CEUNorthern and Western Europeans (Utah) 180 CHBHan Chinese in Beijing, China 90 CHDChinese in Metropolitan Denver100 GIHGujarati Indians in Houston, Texas100 JPTJapanese in Tokyo, Japan 91 LWKLuhya in Webuye, Kenya100 MXLMexican ancestry in Los Angeles 90 MKKMaasai in Kinyawa, Kenya180 TSIToscani in Italia100 YRIYoruba in Ibadan, Nigeria100 Genotyping: Probe a limited number (~1M) of known highly variable positions of the human genome

26 Linkage Disequilibrium & Haplotype Blocks pApA pGpG Linkage Disequilibrium (LD): D = P(A and G) - p A p G Linkage Disequilibrium (LD): D = P(A and G) - p A p G Minor allele: A G

27 Population Sequencing – 1000 Genomes Project 1000 Genomes Project Population Sequencing – 1000 Genomes Project 1000 Genomes Project The 1000 Genomes Project Consortium et al. Nature 467, 1061-1173 (2010) doi:10.1038/nature09534

28 The Cancer Genomes Atlas – TCGA

29 Association Studies Control Disease

30 Global Ancestry Inference

31 Fixation, Positive & Negative Selection Neutral Drift Positive Selection Negative Selection How can we detect negative selection? How can we detect positive selection?

32 Conservation and Human SNPs CNSs have fewer SNPs SNPs have shifted allele frequency spectra CNSs have fewer SNPs SNPs have shifted allele frequency spectra Neutral CNS

33 How can we detect positive selection? Ka/Ks ratio: Ratio of nonsynonymous to synonymous substitutions Very old, persistent, strong positive selection for a protein that keeps adapting Examples: immune response, spermatogenesis Ka/Ks ratio: Ratio of nonsynonymous to synonymous substitutions Very old, persistent, strong positive selection for a protein that keeps adapting Examples: immune response, spermatogenesis

34 How can we detect positive selection?

35 Long Haplotypes –iHS test Less time: Fewer mutations Fewer recombinations


Download ppt "010101100010010100001010101010011011100110001100101000100101 Welcome to CS374! A survey of computer science in genomics today ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG."

Similar presentations


Ads by Google