Download presentation
Presentation is loading. Please wait.
1
genes, genomes, and markers
DNA sequencing: genes, genomes, and markers
2
Steps in Genetic Analysis
Knowing how many genes determine a phenotype (Mendelian and/or QTL analysis), and where the genes are located (linkage mapping) is a first step in understanding the genetic basis of a phenotype A second step is determining the sequence of the gene (or genes)
3
Steps in Genetic Analysis
Subsequent steps involve…. Understanding gene regulation Understanding the context of the gene in the sequence of the whole genome Analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways, how these pathways interact with the environment
4
Genome sequence of a diploid plant (2n = 2x = 14)
5,300,000,000 base pairs 165% of human genome Enough characters for 11,000 large novels 60,000,000 base pairs of expressed sequence ~ 1% of total sequence, like humans 125 large novels
5
Molecular tools for determining DNA (or RNA) sequence
Genomic DNA (or RNA) extraction (RNA cDNA) Manipulate DNA with restriction enzymes to reduce complexity and/or facilitate further manipulation Manage and/or maintain DNA in vectors and/or libraries Selecting DNA targets via amplification and/or hybridization Determining nucleotide sequence of the targeted DNA
6
Extracting DNA (orRNA)
Genomic DNA (or RNA): Leaf segments or target tissues Key considerations are Concentration Purity Fragment size
7
RNA-seq: mRNA to cDNA Reverse transcriptase
8
Restriction Enzymes Restriction enzymes make cuts at defined recognition sites in DNA A defense system for bacteria, where they attack and degrade the DNA of attacking bacteriophages The restriction enzymes are named for the organism from which they were isolated Harnessed for the task of systematically breaking up DNA into fragments of tractable size and for various polymorphism detection assays Each enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence
9
Restriction Enzymes Recognition sites and fragment size: a four-base cutter ~ 256 bp (44); more frequently than a six-base cutter, which in turn will cut more often than one with an eight base-cutter Methylation sensitive restriction enzymes Target the epigenome
10
Restriction Enzymes Palindrome recognition sites – the same sequence is specified when each strand of the double helix is read in the opposite direction. Sit on a potato pan, Otis Cigar? Toss it in a can, it is so tragic UFO. tofu Golf? No sir, prefer prison flog Flee to me remote elf Gnu dung Lager, Sir, is regal Tuna nut CRISPR: Clustered regularly interspaced palindromic repeats
11
Vectors Propagate and maintain DNA fragments generated by the restriction digestion Efficiency and simplicity of inserting and retrieving the inserted DNA fragments Key feature of the cloning vector: size of the DNA insert Plasmid ~ 1 kb BAC ~ 200 kb
12
Libraries Repositories of DNA fragments cloned in their vectors or attached to platform-specific oligonucleotide adapters Classified in terms of cloning vector: e.g. plasmid, BAC in terms of cloned DNA fragment source: e.g. genomic, cDNA In terms of intended use: e.g. next generation sequencing (NGS)
13
Genomic DNA libraries Total genomic DNA digested and the fragments cloned into an appropriate vector or system Representative sample all the genomic DNA present in the organism, including both coding and non-coding sequences Enrichment strategies: target specific types of sequences unique sequences Methylated sequences
14
cDNA libraries Generated from mRNA transcripts, using reverse transcriptase The cDNA library represents only the genes that are expressed in the tissue and/or developmental stage that was sampled
15
DNA Amplification: Polymerase Chain Reaction (PCR)
K.B Mullis, 1983 in vitro amplification of ANY DNA sequence
16
Synthetic DNA: oligonucleotides
Primers, adapters, and more …~$0.010 per bp... < ~ 100 bases
17
DNA Amplification: Polymerase Chain Reaction (PCR)
Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.
18
DNA Amplification - PCR
A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.
19
PCR Principles The PCR reaction consists of: Buffer
DNA polymerase (thermostable) Deoxyribonucleotide triphosphates (dNTPs) Two primers (oligonucleotides) Template DNA
20
PCR Principles Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.
21
PCR Principles The choice of what DNA will be amplified by the polymerase is determined by the primers The DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templates Steps in the reaction include denaturing the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension
22
PCR Applications Amplify a target sequence from a pool of DNA (your favorite gene, forensics, fossil DNA) Start the process of genome sequencing Generate abundant markers for linkage map construction molecular markers
23
DNA Hybridization Single strand nucleic acids find and pair with other single strand nucleic acids with a complementary sequence An application of this affinity is to label one single strand and then to use this probe to find complementary sequences in a population of single stranded nucleic acids For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample
24
DNA Hybridization The principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody Southern blot Northern blot Western blot
25
DNA sequencing Advances in technology have removed the technical obstacles to determining the nucleotide sequence of a gene, a chromosome region, or a whole genome.
26
(classic but still relevant)
Sanger DNA Sequencing (classic but still relevant) Start with a defined fragment of DNA Based on this template, generate a population of molecules differing in size by one base of known composition Fractionate the population molecules based on size The base at the truncated end of each of the fractionated molecules is determined and used to establish the nucleotide sequence
27
Sanger Sequencing - ddNTPs
A dideoxy nucleotide lacks a 3' OH and once incorporated, it will terminate strand synthesis. L-1. No free 3' OH
28
Sanger Sequencing Buffer DNA polymerase dNTPs Labeled primer
Target DNA ddGTP ddATP deoxinucleotyde (dNTP) dideoxinucleotyde (ddNTP) ddCTP ddTTP
29
Next Generation Sequencing - Illumina
30
Sequencing - PAC Bio
31
Sequencing considerations
Read length Accuracy Speed Cost Assembly
32
Sequencing – up and coming (?)
33
Genome sizes and whole genome sequencing
Plant Genome size # Genes Arabidposis thaliana 135 Mb 27,000 Fragaria vesca 240 Mb 35,000 Theobroma cacao 415 Mb 29,000 Zea mays 2,300 Mb 40,000 Pinus taeda 23,200Mb 50,000 Paris japonica 148,852Mb ?? Credit: Karl Kristensen, Denmark
34
Sequencing a plant genome
35
Sequencing a plant genome
Fragaria vesca Herbaceous, perennial 2n=2x=14 240 Mb Reference species for Rosaceae Genetic resources Credit: commons.Wikimedia.org Fragaria x ananassa: 2n=8x=56. Domesticated 250 years ago
36
Sequencing a plant genome
Short reads No physical reference De novo assembly Open source
37
Sequencing a plant genome
Roche 454, Illumina X39 coverage (number of reads including a given nucleotide) Contigs (overlapping reads) assembled into scaffolds (contigs + gaps) ~ 3,200 scaffolds N50 of 1.3 Mb (weighted average length) Over 95% (209.8 Mb) of total sequence is represented in 272 scaffolds
38
Anchoring the genome sequence to the genetic map
Sequencing a plant genome Anchoring the genome sequence to the genetic map 94% of scaffolds anchored to the diploid Fragaria reference linkage map using 390 genetic markers Pseudochromosomes ~ linkage groups
39
Sequencing a plant genome
Synteny Homologs Orthologs Paralogs Credit: Biology stackexchange.com
40
Sequencing a plant genome
The small genome size (240 Mb) Absence of large genome duplications Limited numbers of transposable elements, compared to other angiosperms
41
Sequencing a plant genome the transcriptome
Fruits and roots – different types of genes
42
Sequencing a plant genome
Gene prediction 34,809 nuclear genes flavor, nutritional value, and flowering time 1,616 transcription factors RNA genes 569 tRNA, 177 rRNA, 111 spliceosomal RNAs, 168 small nuclear RNAs, 76 micro RNA and 24 other RNAs Chloroplast genome 155,691 bp encodes 78 proteins, 30 tRNAs and 4 rRNA genes Evidence of DNA transfer from plastid genome to the nuclear genome
43
DNA (molecular) markers Linkage mapping, quantitative trait locus (QTL) mapping, anchoring genome sequences
44
Why use markers rather than whole genome sequences?
A way of addressing plant genetics and breeding challenges: The large number of genes per genome Huge genome sizes Often a subset of the total genome is of interest
45
Applications of Markers
Establish evolutionary relations: homoeology and synteny
46
Applications of Markers
Are trait associations due to linkage or pleiotropy? Identify markers that can be used in marker assisted selection Locate genes for qualitative and quantitative traits A starting point for map-based cloning strategies
47
Markers are based on polymorphisms
Amplified fragment length polymorphism Restriction fragment length polymorphism Single nucleotide polymorphism The polymorphisms become the alleles at marker loci The marker locus is not necessarily a gene: the polymorphism may be in the dark matter, in a UTR, in an intron, or in an exon Non-coding regions may be more polymorphic Molecular markers are abundant
48
Marker polymorphisms are based on
mutations Silent *** CTG GGA GAT TAT GGC TTT AAG*** *** CTG GGA GAT TAT GGC TTC AAG*** alignment Leu Gly Asp Tyr Gly Phe Lys Leu Gly Asp Tyr Gly Phe Lys translation Missense *** CTG GGA GAT TAT GGC TAT AAG*** alignment Leu Gly Asp Tyr Gly Tyr Lys translation Nonsense *** CTG GGA GAT TAG GGC TTT AAG*** alignment Leu Gly Asp Tyr Gly Phe Lys Leu Gly Asp STOP translation
49
Markers Polymorphisms can be visualized at the metabolome, proteome, or transcriptome level but for a number of reasons (both technical and biological) DNA-level polymorphisms are currently the most targeted Regardless of whether it is a “perfect” or a “linked” DNA marker, there are two key considerations that need to be addressed in order for the researcher/user to visualize the underlying genetic polymorphism
50
DNA Markers Finding and understanding the genetic basis of the DNA-level polymorphism, which may be as small as a single nucleotide polymorphism (SNP) or as large as an insertion/deletion (INDEL) of thousands of nucleotides Detecting the polymorphism via a specific assay or "platform". The same DNA polymorphism may be amenable to different detection assays
51
Marker examples: Simple Sequence Repeats (SSRs)
Simple sequence repeats (SSRs) (aka microsatellites) are tandemly repeated mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs SSR length polymorphisms are caused by differences in the number of repeats Assayed by PCR amplification using pairs of oligonucleotide primers specific to unique sequences flanking the SSR Multiple platforms
52
Simple Sequence Repeats (SSRs)
Marker examples Simple Sequence Repeats (SSRs) Simple sequence repeats in hazelnut: Note the differences in repeat length AND the consistent flanking sequences Credit: mind42.com
53
Marker examples Simple Sequence Repeats (SSRs)
Highly polymorphic Highly abundant and randomly dispersed Co-dominant Locus-specific Amenable to high throughput assays
54
SSR Concept Individual 1 (AC)x9 Individual 2 (AC)x11 51 bp 55 bp
55
Marker examples: Single Nucleotide Polymorphisms (SNPs)
DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered Alleles Single Nucleotide Polymorphisms (SNPs) …..ATGCTCTTACTGCTAGCGC…… …..ATGCTCTTCCTGCTAGCGC…… …..ATGCTCTTACTGCAAGCGC…… Consensus…..ATGCTCTTNCTGCNAGCGC……
56
Marker examples: SNPs Highly abundant (~ 1 every 200 bp)
Locus-specific Co-dominant and bi-allelic Basis for high-throughput and massively parallel genotyping technologies Connectivity to reference genome sequences
57
SNP Detection Strategies
Locus specific systems Many samples with few markers Markers for key target characters Example: KASP Genome wide systems Fewer samples with many markers Germplasm characterization Genotyping panels for Genome Wide Association Studies Example: Illumina
58
SNPs on KASP and Illumina 9K
59
available for every plant !!!!!
Abundant markers are available for every plant !!!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.