G ENOTYPING - BY -S EQUENCING WHAT IS IT AND WHAT IS IT GOOD FOR ? K EITH R. M ERRILL NCSU – C ROP S CIENCE.

Slides:



Advertisements
Similar presentations
Identification of markers linked to Selenium tolerance genes
Advertisements

MEDIP, MAP AND MIRA Biological Affinity-Based Methods of DNA Methylation Detecton: Genome Wide.
Cultivation of the blue mussel (Mytillus edulis) has grown strongly in Scotland over the last ten years. The further development of sustainable and productive.
GBS & GWAS using the iPlant Discovery Environment
Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.
What if we want to know what allele(s) of beta-globin an individual has?
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
Molecular Markers.
DNA fingerprinting Every human carries a unique set of genes (except twins!) The order of the base pairs in the sequence of every human varies In a single.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Molecular Markers DNA & PROTEINS –mtDNA = often used in systematics; in general, no recombination = uniparental inheritance –cpDNA = often used in systematics;
Generation and Analysis of AFLP Data
SNP Discovery in the Human Genome C244/144 November 21, 2005.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
High Throughput Sequencing
DNA analysis DNA typing (genotyping) – History –A marker is any polymorphic Mendelian character that can be used to follow a chromosomal segment through.
Reading the Blueprint of Life
Plant Molecular Systematics Michael G. Simpson
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Todd J. Treangen, Steven L. Salzberg
DNA Technology.
GBS Bioinformatics Pipeline(s) Overview
GNUMAP-SNP Nathan Clement The University of Texas Austin, TX, USA.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Hua Bao Sun Yat-sen University, Guangzhou,
The Changing Face of Sequencing
CHAPTER 8: Producing Data Sampling ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
Tahir Mahmood Lecturer Department of Statistics. Outlines: E xplain the role of sampling in the research process D istinguish between probability and.
Gel Electrophoresis A molecular biology tool. Purpose To separate and analyze/compare fragments of DNA.
Taqman Technology and Its Application to Epidemiology Yuko You, M.S., Ph.D. EPI 243, May 15 th, 2008.
Introduction to RNAseq
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.

ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Are you curious who has been going through your window seats? Who’s been getting into the cookies before dinner? Then “Ms. Mary’s Fingerprinting cottage”
CASE7——RAD-seq for Grape genetic map construction
Polymerase Chain Reaction What is PCR History of PCR How PCR works Optimizing PCR Fidelity, errors & cloning PCR primer design Application of PCR.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
G ENETIC T ECHNOLOGY. 1) 1) G ENETIC R ECOMBINATION 1. Remove bacterial plasmid with restriction enzymes 2. Add in gene of interest (plasmid is now recombinant.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
BSL2016 / 2018 LEC 8 Genomic Libraries (1) What is a genomic library and why is it important? How does a genomic library differ from a cDNA library? cDNA.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Library QA & QC Day 1, Video 3
Arun Kumar. B M.Sc 1st Year Biotechnology SSBS
From Reads to Results Exome-seq analysis at CCBR
RAD – technology overview Baird et al PLoS ONE.
GENETIC MARKERS (RFLP, AFLP, RAPD, MICROSATELLITES, MINISATELLITES)
Short Read Sequencing Analysis Workshop
Lecture 6: Genotype by sequencing
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
IEE 380 Review.
Lucas D. Baker1 Vikram E. Chhatre2 Hayley C. Lanier1
Introduction to RAD Acropora millepora.
Department of Computer Science
Lecture 6: Genotype by sequencing
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
2nd (Next) Generation Sequencing
Biotechnology.
DNA and the Genome Key Area 8a Genomic Sequencing.
Lecture 9 Genome Mapping By Ms. Shumaila Azam
9-3 DNA Typing with Tandem Repeats
The Variant Call Format
Presentation transcript:

G ENOTYPING - BY -S EQUENCING WHAT IS IT AND WHAT IS IT GOOD FOR ? K EITH R. M ERRILL NCSU – C ROP S CIENCE

GBS VS. RAD-S EQ T HE ULTIMATE THROW DOWN ! ( OF ACRONYMS ) GBS: Genotyping-by-Sequencing RAD-Seq: Restriction-site associated DNA sequencing

GBS VS. RAD-S EQ W HAT ’ S THE D IFFERENCE ?

T HE C ONCEPT Reduce the Genome Pool Samples Sequence Combined Pool Assign sequences to individuals Call Variants between individuals

T HE C ONCEPT Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n It’s all about probability

T HE C ONCEPT Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n Reduce the genome and increase the probability of overlap

H OW IT WORKS Ind1 Ind2 Ind3 Ind4 Ind5 IndN Tag1 Tag2 Tag3 Tag4 Tag5 TagN Tags (AKA Barcodes, MID Barcodes, etc.) = CAGATA = GAAGTG = TAGCGGAT = GGATA = CACCA = …

Tag1 Tag2 Tag3 Tag4 Tag5 TagN H OW IT WORKS (T HE O NE E NZYME M ETHOD ) Ind1 Ind2 Ind3 Ind4 Ind5 IndN Tag1 Tag2 Tag3 Tag4 Tag5 TagN Tag1 Tag2 Tag3 Tag4 Tag5 TagN

H OW IT WORKS (T HE T WO E NZYME M ETHOD )

H OW IT WORKS S IZE S ELECTION Base-pair range selected

H OW IT WORKS P OOLING Tag1 Tag2 Tag3 Tag4 Tag5 TagN Ind1 Ind2 Ind3 Ind4 Ind5 IndN Tag1 Tag2 Tag3 Tag4 Tag5 TagN Ind1 Ind2 Ind3 Ind4 Ind5 IndN Size Selection (optional if using two- enzymes)

W HY P OOL S AMPLES ? On the Illumina Hi-seq 2000: 8 lanes of sequencing, each capable of giving 374 million reads. You can’t partition a lane. Sequencing is expensive ($ $3000 per lane). You don’t need/want 374 million reads per individual.

A W ORD A BOUT T AGS Hamming vs. Edit Distance Sequence errors may result from things other than sequencing. n-1 errors are the most common error encountered during oligo synthesis.

A NALYSIS I T ’ S ABOUT TIME … AND MONEY … AND TIME Key Considerations: Time Computing power available Amount of sequence data (back to time) Availability of a reference genome

K EY C ONSIDERATIONS Study goals Availability of a reference genome Expected degree of polymorphism Choice of restriction enzyme DNA sample preparation Adaptor design PCR amplification Sequencing Pooling individuals Analysis

A NALYSIS I T ’ S ABOUT TIME … AND MONEY … AND TIME A Few Options: Stacks – For use with bi-parental mapping populations – Takes a lot of time – Looks at entire reads – Reference genome optional – Designed to work nicely with MySQL – More memory intensive UNEAK – For use with species without a reference genome – Uses only 64 bp of each read – MUCH faster than Stacks – Less memory intensive TASSEL – For use with species with a reference genome – Uses only 64 bp of each read – MUCH faster than Stacks – Less memory Intensive Custom scripts – Completely flexible (hence the ‘custom’) – Requires significant knowledge about programming (or knowing someone who does and is willing to help)

D OES IT WORK ? N OTE : T HIS IS WITH HEXAPLOID WHEAT AND NO REFERENCE GENOME

T HE G OOD No ascertainment bias Random distribution throughout the genome May be useful for species without a reference genome Useful with genomic selection May provide a large number of SNPs Relatively low per sample cost

T HE G OOD ( CONT ) GBS is extremely flexible Number of individuals per lane/flowcell Choice of enzymes – Cut sites – Methylation sensitivity Size of fragments selected

T HE B AD Poor reproducibility between runs Species without a reference genome *cannot* infer missing data Often dealing with large amounts of missing data Difficult to filter out false SNPs in non-mapping populations, unless you have a reference genome and even then… In my opinion: this would be nigh impossible to use with association studies in species without a reference genome UNLESS you sequence to very high coverage to virtually eliminate missing data (alternatively, you could drastically reduce the genome by your choice of enzymes – but this may be bad if your expected degree of polymorphism is low)

Questions?

TASSEL-GBS m_content&task=view&id=89&Itemid=119 m_content&task=view&id=89&Itemid=119 GBS_Document – GBS.pdf GBS.pdf