Download presentation
Presentation is loading. Please wait.
1
Introduction to RAD Acropora millepora
2
First thing start downloading course achieves
See downloading course archives
3
Plan Start downloading and unpacking data Set up our reference genome
Go through some introduction slides Go through GATK demonstration files User choice, read preparation, de novo, or GATK for real
4
Genetic differentiation vs. distance
Population Genetics N S O M K W A R2 = 0.76 Genetic differentiation vs. distance
5
Linkage Mapping Final map consists of 3816 markers and covers 99.4% (1539cM) of scallop genome with a resolution of 0.41cM.
6
2bRAD-based linkage mapping in scallop
sex-related chromosomal region Based on the high-density map, we were able to identify a 2-cM chromosomal region that contained ~60 sex-related loci.
7
Which loci are under selection across populations?
(Restriction-Associated DNA, RAD) Generate and sequence short tags randomly distributed across genome Fst along linkage group III: between freshwater populations (orange) and between freshwater and marine populations (black). Bars: significant (bootstrap), dots: SNPs
8
Population genomics: “genome scanning” for signatures of selection
Nucleotide diversity (recent selection) Tajima’s D (not so recent selection) coalescent hard sw. LD2hs neutr balancing neutr soft sw. (recent or continuous selection) Excess LD (very recent selection) position along genome Hohenlohe et al 2010 Int. J. Plant Sci. 17:
9
Amplification and barcoding
Adaptor ligation and indexing III Index (1 of 12) (mix indices by 12) Amplification and barcoding Barcode2 III Gel cleanup Gel cleanup III
10
Restriction Digest Type IIb restriction enzyme is used to cut out fragments of the genome
genomic DNA: BcgI cut site 5’- NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN-3’ digested fragments represent a random subset of the genome 3’-NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN -5’
11
Ligation Step Adapters are ligated to the restriction fragments produced by the digest
Adapter 1: contains degenerate bases to ID PCR duplicates 5Ill-NNRW + anti 5Ill-NNRW 5’-CTACACGACGCTCTTCCGATCTNNRWCCNN-3’ 5’-GGWYNNAGATCGGAAGAGC/3InvdT1/-3’ Adapter 2: Contains a 3’ barcode (in red below) 3Ill-BC[1-12] + anti 3Ill-BC[1-12] 5’-CAGACGTGTGCTCTTCCGATCTACCANN-3’ 5’-TGGTAGATCGGA/3InvdT/ 5’- NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN-3’ 3’-NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN -5’ 5’CTACACGACGCTCTTCCGATCTNNRWCCNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGANNYWGG NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’
12
Using degenerate bases to identify PCR duplicates
Adapter 1: 5Ill-NNRW + anti 5Ill-NNRW 5’-CTACACGACGCTCTTCCGATCTNNRWCCNN-3’ 5’-GGWYNNAGATCGGAAGAGC/3InvdT1/-3’ N = A, T, G, or C R = A or G W = A or T Y = C or T 5’-CTACACGACGCTCTTCCGATCTNNRWCCNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGANNYWGG NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ Degenerate bases included in the 5Ill-NNRW adapter
13
Amplification (perform on pooled samples)
Ill-[1-12]-bc CAAGCAGAAGACGGCATACGAGAT[barcode]GTGACTGGAGTTCAGACGTGTGCTCTTCCGAT Mpx2N AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT IC1-P5 AATGATACGGCGACCACCGA IC2-P7 CAAGCAGAAGACGGCATACGA Mpx2N Ill-bc AATGATACGGCGACCACCGA AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGA NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ TAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG[barcode]TAGAGCATACGGCAGAAGACGAAC AGCATACGGCAGAAGACGAAC
14
Amplification Ill-[1-12]-bc
CAAGCAGAAGACGGCATACGAGAT[2ndbarcode]GTGACTGGAGTTCAGACGTGTGCTCTTCCGAT Mpx2N AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT IC1-P5 AATGATACGGCGACCACCGA IC2-P7 CAAGCAGAAGACGGCATACGA P5 P7 AATGATACGGCGACCACCGA AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGA NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ TAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG[barcode]TAGAGCATACGGCAGAAGACGAAC AGCATACGGCAGAAGACGAAC
15
2nd barcode was added during PCR
Final Product p5 p7 Read primer sampled genomic DNA 1st barcode ligated onto fragment 2nd barcode was added during PCR 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN /3InvdT1/CGAGAAGGCTAGA NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ TGGTAGATCGGA/3InvdT/ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT AATGATACGGCGACCACCGA TAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG[barcode]TAGAGCATACGGCAGAAGACGAAC AGCATACGGCAGAAGACGAAC
16
Double Barcoding p5 p7 sampled genomic DNA Ligation indexes
Read primer p5 p7 random bases for duplicate identification sampled genomic DNA 1st barcode ligated onto fragment 2nd barcode was added during PCR Ligation indexes PCR barcodes Setting unique ligation barcodes to columns and PCR barcodes to rows gives a unique combination for each sample A1 B1 If groups using strip tubes, each strips will be layed down as rows in this picture. Then each tube in the strip gets a different adapter barcode and the whole strip can be pooled downstream.
17
Pooling Samples Pooling allows for 12 uniquely barcoded samples to be prepared in a single tube. Saves work and pipet tips Use different Adapter 2 for each column (1-12) Pool samples so that each sample in the pool has a different ligation adapter 5’-CTACACGACGCTCTTCCGATCTNN NNNNNNNNNNCGANNNNNNTGCNNNNNNNNNNNN TGGTAGATCGGA/3InvdT/ /3InvdT1/CGAGAAGGCTAGA NNNNNNNNNNNNGCTNNNNNNACGNNNNNNNNNN NNACCATCTAGCCTTCTCGTGTGCAGAC-5’ Pooled sample 1 If groups using strip tubes, each strips will be layed down as rows in this picture. Then each tube in the strip gets a different adapter barcode and the whole strip can be pooled downstream.
18
GATK
19
Genome reference pipeline (GATK) (http://www. broadinstitute
Trimming/quality filtering Mapping to genome Realign around indels Primary variant calling Base quality recalibration Secondary variant calling Variant quality recalibration based on genotyping replicates Final filtering Assess quality (heterozygote discovery rate)
20
Genetic differentiation vs. distance
Acropora millepora connectivity along the Great Barrier Reef R2 = 0.76 Genetic differentiation vs. distance N S O M K
21
SAM File Format Header Lines @HD:
Header line (first line of file if present) gives the version number and sorting information, here version 1, unsorted alignments. @SQ: These lines give the reference sequence information. There will be as many of these as ‘chromosomes’ in your reference. In our case this is 10. Gives the name of the sequence and its length. @PG: Program line. Gives information about the program used to perform the alignment. In our case bowtie2 version
22
SAM File Format Alignment Lines example from our files
form ‘Sequence Alginment/Map Format Specifications’—The SAM/BAM Format Specification Working Group 2015 (see under additional resources)
23
VCF file Structure VCF = variant call format
from ‘The Variant Call Format Specifications’ 2015
24
VCF file Structure VCF = variant call format
1 row for each variant position columns give individual sample data Fields that describe the variant Sample Data This individual is a heterozygote G/T With a GQ score of 21 Read depth of 6 Haplotype quality of 23, 27 Descriptions of what these these mean are included in the header (see previous slide) from ‘The Variant Call Format Specifications’ 2015
25
Recalibration Plots
26
Recalibration Plots
27
Recalibration Plots
29
Writing your own pipeline
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.