Lecture 6: Genotype by sequencing

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

DNA Purification and Analysis. The Southern Blot Technique: -technique for DNA analysis developed in the 1970s by E. M. Southern -Southern Blot.
Cultivation of the blue mussel (Mytillus edulis) has grown strongly in Scotland over the last ten years. The further development of sustainable and productive.
DNA Typing bsapp.com. bsapp.com DNA strands come from the nucleus or the mitochondria bsapp.com.
RFLP Restriction Fragment Length Polymorphism Marie Černá, Markéta Čimburová, Marianna Romžová.
DNA polymorphisms Insertion-deletion length polymorphism – INDEL Single nucleotide polymorphism – SNP Simple sequence repeat length polymorphism – mini-
A Lot More Advanced Biotechnology Tools DNA Sequencing.
Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome Jay Shendure, Gregory J. Porreca, Nikos B. Reppas, Xiaoxia Lin, John P. McCutcheon.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
(RFLP Electrophoresis)
6.3 Advanced Molecular Biological Techniques 1. Polymerase chain reaction (PCR) 2. Restriction fragment length polymorphism (RFLP) 3. DNA sequencing.
Lab 5: Cellular Respiration
POLYMERASE CHAIN REACTION AMPLIFYING DNA What do you need to replicate DNA? umZT5z5R8.
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
Applications of DNA technology
Module 1 Section 1.3 DNA Technology
A Lot More Advanced Biotechnology Tools (Part 1) Sequencing.
Manipulation of DNA. Restriction enzymes are used to cut DNA into smaller fragments. Different restriction enzymes recognize and cut different DNA sequences.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Used for detection of genetic diseases, forensics, paternity, evolutionary links Based on the characteristics of mammalian DNA Eukaryotic genome 1000x.
L AB 6: M OLECULAR B IOLOGY L AB 6: M OLECULAR B IOLOGY Description Transformation insert foreign gene in bacteria by using engineered plasmid.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
(RFLP Electrophoresis)
Taqman Technology and Its Application to Epidemiology Yuko You, M.S., Ph.D. EPI 243, May 15 th, 2008.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Human Genome.
6.3 Advanced Molecular Biological Techniques 1. Polymerase chain reaction (PCR) 2. Restriction fragment length polymorphism (RFLP) 3. DNA sequencing.
Locating and sequencing genes

CASE7——RAD-seq for Grape genetic map construction
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
Plant Breeding Shree Krishna Adhikari ©Shree Krishna Adhikari.
DNA Fingerprinting Maryam Ahmed Khan February 14, 2001.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 6: Genotype.
What is “Bioinformatics”?
GENETIC MARKERS (RFLP, AFLP, RAPD, MICROSATELLITES, MINISATELLITES)
Next generation sequencing
One method of rapidly analyzing and comparing DNA is gel electrophoresis. Gel electrophoresis separates macromolecules - nucleic acids or proteins - on.
Recombinant DNA Technology
21.8 Recombinant DNA DNA can be used in
Genetic markers and their detection
Introduction to RAD Acropora millepora.
Section 3: Gene Technologies in Detail
Sequencing Technologies
DNA Technology Now it gets real…..
PCR and RLFP’s.
Stuff to Do.
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
Restriction digestion and Southern blot
Relationship between Genotype and Phenotype
Lecture 6: Genotype by sequencing
How are areas of DNA that don’t code for proteins (genes) used by our cells? How can we make use of these areas?
Lab 8: PTC Polymerase Chain Reaction Lab
2nd (Next) Generation Sequencing
Molecular Biology lecture -Putnoky
DNA Polymorphisms: DNA markers a useful tool in biotechnology
تهیه کننده بهارا رستمی نیا بهار 94
DNA and the Genome Key Area 8a Genomic Sequencing.
Lecture 9 Genome Mapping By Ms. Shumaila Azam
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
MUTATIONS.
Sequence the 3 billion base pairs of human
BF nd (Next) Generation Sequencing
DNA Sequencing.
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
SBI4U0 Biotechnology.
Forensic DNA Sadeq Kaabi
A Lot More Advanced Biotechnology Tools
Presentation transcript:

Lecture 6: Genotype by sequencing Statistical Genomics Lecture 6: Genotype by sequencing Zhiwu Zhang Washington State University

Outline Genetic markers Sequencing Full vs. reduced Experiment Data process and format

Human genome project Funded by DOE, NIH and Welcome Trust in the UK Begun in 1990 Original planed to last 15 years. Institute for Genomic Research and U. of Washington provided over 450K BAC each was tagged and contain 3~4 Kb across the entire human genome

Human genome project Accelerate the completion date to 2003 Celera Genomics Craig Venter was among those sequenced Identified 20~120K genes Sequence of 3 billion base pairs Cost near 3 billion dollars

Types of genetic markers RFLP: Restriction fragment length polymorphism SSR: Simple Sequence Repeats SNP: Single Nucleotide Polymorphism Chip Sequencing

RFLP Restriction Enzyme Restriction fragment length polymorphism

SSR

SNP by hybridization http://www.genome.gov/10000533

Fredric Sanger 1958 Nobel Price of Chemistry for Protein identification by electrophoresis 1980 Nobel Price of Chemistry for DNA sequencing

Ladder of DNA length dNTP (deoxynucleotides) ddNTP: (dideoxynucleotides): chain reaction terminator

1st Generation DNA sequencing Fred Sanger and Alan R. Coulson, Nature 24, 687–695 (1977)

2nd generation sequencing Sequencing-by-synthesis by 454 Life Science: Margulies, M. et al. Nature 437, 376–380 (2005). Multiplex Polony sequencing by George M. Church lab at Harvard Medical School: Shendure, J. et al. Science 309, 1728–1732 (2005). 1 2 3 4 5 6

Sequencing-by-synthesis 454 Life Science: Margulies, M. et al. Nature 437, 376–380 (2005). 1 2 3 4 5 6 T G C T A C … T T T T T T … http://en.wikipedia.org/wiki/File:Sequencing_by_synthesis_Reversible_terminators.png

Multiplex Polony sequencing George M. Church lab at Harvard Medical School: Shendure, J. et al. Science 309, 1728–1732 (2005). http://wjingpan.blog.sohu.com/140002432.html

Cluster Generation

$1000 Genome   Price Price/unit $/Genome* Consumables $/Gb HiSeq X Five $6M $1.2M $1,425 $1,200 $10.6 HiSeq X Ten $10M $1M $1,000 $800 $7 http://blog.genohub.com/illuminas-latest-release-hiseq-3000-4000-nextseq-550-and-hiseq-x5/

DNA/RNA fragmentation Physical Fragmentation 1) Acoustic shearing 2) Sonication 3) Hydrodynamic shear Enzymatic Methods 4) DNase I or other restriction endonuclease, non-specific nuclease 5) Transposase Chemical Fragmentation 6) Heat and divalent metal cation

Reduced Genotyping Sequencing Restriction site

Restriction enzymes: ApeKI Recognition: 5’GCWGC3’ W: A or T Expected size: 4x4x2x4x4=512bp= 0.5Kb Genome coverage 100 bp read/512 bp size=20%

Restriction enzymes: PstI Recognition: 5’ CTGCAG3’ Expected size: 4^6=4096bp= 4Kb Genome coverage 100 bp read/4096 bp size=2.5%

Multiplex barcode Aalborg University, Denmark: Craig et al. Nat. Methods 2000, 5: 887–893. 4~8 bases

Adapter and Barcode By Sharon Mitchell

Genotyping by sequencing (GBS) Digest DNA Ligate adapters with barcodes 5. Illumina sequencing 3. Pool DNAs 4. PCR . . Here, I need to make a brief introduction to genotyping by sequencing. This flowchart shows the protocol of GBS. The DNA samples are digested with enzyme and barcoded. Then we pool the DNA and do the PCR. Eventually the samples are sequenced in Illumina sequencer. This protocol is more simpler than the similar methods, like rrl and rad. Elshire et al. 2011. PLoS One

Cost reduction by multiplexing Besides, the GBS is cost effective. One sample only costs 9 dollors, if 384 plex is used. Considering the coverage, we used the 96 plex protocol on all the switchgrass samples.

Sequencing depth Definition: Expected sequencing times per base pair Calculation 100Mb genome, 100M read of 100 bp: 100X 3G genome, 1% reduced, 50 multiplex, 6G data (1byte one base): 6G/(50x3Gx1%)=4X

Genomic coverage and depth ApeKI PstI Recognition bases 5 6 Fragment size .5Kb 4Kb Genome coverage (100bp read) 20% 2.5% Number of unique sequence (3G genome) 3G/.5Kb=6M 3G/4Kb=.75M Sequencing depth (60G data on 3G genome) 60/(3x.2)=100X 60/(3*.025)=800X

Distribution of length Expectation of length=length/number of cut Variance=Squared Expectation (need proof)

Distribution of length size=300000000 x=round(runif(n,1,size)) y=sort(x) interval=y[-1]-y[-n] hist(interval) Ex=size/n Va=Ex*Ex m=mean(interval) v=var(interval) m v

Distribution of length Beissinger et al, Genetics. 2013, 193(4):1073-81

Number of reads

FASTQ Line 1: start with @ followed by sequence description @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC Line 1: start with @ followed by sequence description Line 2: Sequence Line 3 start with + followed by description Line 4: Symbols of sequence quality values (same length as sequence) with ! the lowest and ~ the highest. There are 94 symbols with ascii code from 33 to 126. !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Ascii code x CHAR(x) 33 ! 56 8 80 P 103 g 34 " 57 9 81 Q 104 h 35 # 58 : 82 R 105 i 36 $ 59 ; 83 S 106 j 37 % 60 < 84 T 107 k 38 & 61 = 85 U 108 l 39 ' 62 > 86 V 109 m 40 ( 63 ? 87 W 110 n 41 ) 64 @ 88 X 111 o 42 * 65 A 89 Y 112 p 43 + 66 B 90 Z 113 q 44 , 67 C 91 [ 114 r 45 - 68 D 92 \ 115 s 46 . 69 E 93 ] 116 t 47 / 70 F 94 ^ 117 u 48 71 G 95 _ 118 v 49 1 72 H 96 ` 119 w 50 2 73 I 97 a 120 51 3 74 J 98 b 121 y 52 4 75 K 99 c 122 z 53 5 76 L 100 d 123 { 54 6 77 M 101 e 124 | 55 7 78 N 102 f 125 } 79 O 126 ~

Post-sequencing http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0101025

Hapmap format IUPAC code

Genotype in Numeric format myGD=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T)

Genetic map myGM=read.table(file="http://zzlab.net/GAPIT/data/mdp_SNP_information.txt",head=T)

Outline Genetic markers Sequencing Full vs. reduced Experiment Data process and format