Presentation is loading. Please wait.

Presentation is loading. Please wait.

G ENOTYPING - BY -S EQUENCING WHAT IS IT AND WHAT IS IT GOOD FOR ? K EITH R. M ERRILL NCSU – C ROP S CIENCE.

Similar presentations


Presentation on theme: "G ENOTYPING - BY -S EQUENCING WHAT IS IT AND WHAT IS IT GOOD FOR ? K EITH R. M ERRILL NCSU – C ROP S CIENCE."— Presentation transcript:

1 G ENOTYPING - BY -S EQUENCING WHAT IS IT AND WHAT IS IT GOOD FOR ? K EITH R. M ERRILL NCSU – C ROP S CIENCE

2 GBS VS. RAD-S EQ T HE ULTIMATE THROW DOWN ! ( OF ACRONYMS ) GBS: Genotyping-by-Sequencing RAD-Seq: Restriction-site associated DNA sequencing

3 GBS VS. RAD-S EQ W HAT ’ S THE D IFFERENCE ?

4 T HE C ONCEPT Reduce the Genome Pool Samples Sequence Combined Pool Assign sequences to individuals Call Variants between individuals

5 T HE C ONCEPT Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n It’s all about probability

6 T HE C ONCEPT Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n Reduce the genome and increase the probability of overlap

7 H OW IT WORKS Ind1 Ind2 Ind3 Ind4 Ind5 IndN Tag1 Tag2 Tag3 Tag4 Tag5 TagN Tags (AKA Barcodes, MID Barcodes, etc.) = CAGATA = GAAGTG = TAGCGGAT = GGATA = CACCA = …

8 Tag1 Tag2 Tag3 Tag4 Tag5 TagN H OW IT WORKS (T HE O NE E NZYME M ETHOD ) Ind1 Ind2 Ind3 Ind4 Ind5 IndN Tag1 Tag2 Tag3 Tag4 Tag5 TagN Tag1 Tag2 Tag3 Tag4 Tag5 TagN

9 H OW IT WORKS (T HE T WO E NZYME M ETHOD )

10 H OW IT WORKS S IZE S ELECTION Base-pair range selected

11 H OW IT WORKS P OOLING Tag1 Tag2 Tag3 Tag4 Tag5 TagN Ind1 Ind2 Ind3 Ind4 Ind5 IndN Tag1 Tag2 Tag3 Tag4 Tag5 TagN Ind1 Ind2 Ind3 Ind4 Ind5 IndN Size Selection (optional if using two- enzymes)

12 W HY P OOL S AMPLES ? On the Illumina Hi-seq 2000: 8 lanes of sequencing, each capable of giving 374 million reads. You can’t partition a lane. Sequencing is expensive ($1500 - $3000 per lane). You don’t need/want 374 million reads per individual.

13 A W ORD A BOUT T AGS Hamming vs. Edit Distance Sequence errors may result from things other than sequencing. n-1 errors are the most common error encountered during oligo synthesis.

14 A NALYSIS I T ’ S ABOUT TIME … AND MONEY … AND TIME Key Considerations: Time Computing power available Amount of sequence data (back to time) Availability of a reference genome

15 K EY C ONSIDERATIONS Study goals Availability of a reference genome Expected degree of polymorphism Choice of restriction enzyme DNA sample preparation Adaptor design PCR amplification Sequencing Pooling individuals Analysis

16 A NALYSIS I T ’ S ABOUT TIME … AND MONEY … AND TIME A Few Options: Stacks – For use with bi-parental mapping populations – Takes a lot of time – Looks at entire reads – Reference genome optional – Designed to work nicely with MySQL – More memory intensive UNEAK – For use with species without a reference genome – Uses only 64 bp of each read – MUCH faster than Stacks – Less memory intensive TASSEL – For use with species with a reference genome – Uses only 64 bp of each read – MUCH faster than Stacks – Less memory Intensive Custom scripts – Completely flexible (hence the ‘custom’) – Requires significant knowledge about programming (or knowing someone who does and is willing to help)

17 D OES IT WORK ? N OTE : T HIS IS WITH HEXAPLOID WHEAT AND NO REFERENCE GENOME

18 T HE G OOD No ascertainment bias Random distribution throughout the genome May be useful for species without a reference genome Useful with genomic selection May provide a large number of SNPs Relatively low per sample cost

19 T HE G OOD ( CONT ) GBS is extremely flexible Number of individuals per lane/flowcell Choice of enzymes – Cut sites – Methylation sensitivity Size of fragments selected

20 T HE B AD Poor reproducibility between runs Species without a reference genome *cannot* infer missing data Often dealing with large amounts of missing data Difficult to filter out false SNPs in non-mapping populations, unless you have a reference genome and even then… In my opinion: this would be nigh impossible to use with association studies in species without a reference genome UNLESS you sequence to very high coverage to virtually eliminate missing data (alternatively, you could drastically reduce the genome by your choice of enzymes – but this may be bad if your expected degree of polymorphism is low)

21 Questions?

22 TASSEL-GBS www.maizegenetics.net/index.php?option=co m_content&task=view&id=89&Itemid=119 www.maizegenetics.net/index.php?option=co m_content&task=view&id=89&Itemid=119 GBS_Document – www.maizegenetics.net/tassel/docs/TasselPipeline GBS.pdf www.maizegenetics.net/tassel/docs/TasselPipeline GBS.pdf


Download ppt "G ENOTYPING - BY -S EQUENCING WHAT IS IT AND WHAT IS IT GOOD FOR ? K EITH R. M ERRILL NCSU – C ROP S CIENCE."

Similar presentations


Ads by Google