Structural, functional Genome, Transcriptome, Proteome, Metabolome, Interactome Genomics
“ What's the Difference? Well, as a rule, genetics is the study of single genes in isolation. Genomics is the study of all the genes in the genome and the interactions among them and their environment(s). Analogy 1 If genomics is like a garden, genetics is like a single plant. If the plant isn’t flowering, you could study the plant itself (genetics) or look at the surroundings to see if it is too crowded or shady (genomics) – both approaches are probably needed to find out how to make your plant blossom.” Genomics or Genetics?
Structural genomics for plant breeders and applied geneticists = molecular markers How many genes determine important traits? Where these genes are located? How do the genes interact? What is the role of the environment in the phenotype? Molecular breeding: Gene discovery, characterization, and selection using molecular tools Molecular markers are a key implement in the molecular breeding toolkit Genomics and Molecular Markers
Markers are based on polymorphisms Amplified fragment length polymorphism Restriction fragment length polymorphism Single nucleotide polymorphism The polymorphisms become the alleles at marker loci The marker locus is not necessarily a gene: the polymorphism may be in the dark matter, in a UTR, in an intron, or in an exon Non-coding regions may be more polymorphic What is a Molecular Marker?
Changes in the nucleotide sequence of genomic DNA that can be transmitted to the descendants. If these changes occur in the sequence of a gene, it is called a mutant allele. The most frequent allele is called the wild type. A DNA sequence is polymorphic if there is variation among the individuals of the population. DNA Mutations & Polymorphisms
5’ – AGCTGAACTCGACCTCGCGATCCGTAGTTAGACTAG -3’ Wildtype 5’ – AGCTGAACTCGGCCTCGCGATCCGTAGTTAGACTAG -3’ Substitution (transition: A G 5’ – AGCTCAACTCGACCTCGCGATCCGTAGTTAGACTAG -3’ Substitution (transversion: G C) 5’ – AGCTAACTCGACCTCGCGATCCGTAGTTAGACTAG -3’ Deletion (single bp) C 5’ – AGCTTCGCGATCCGTAGTTAGACTAG -3’ Deletion (DNA segment) CAACTCGACC Types of DNA Mutations (1)
5’ – AGCTGAACTCGACCTCGCGATCCGTAGTTAGACTAG - 3’ Wildtype 5’ – AGCTGAACTACGACCTCGCGATCCGTAGTTAGACTAG - 3’ Insertion (single bp) 5’ – AGCTGAACTAGTCTGCCCGACCTCGCGATCCGTAGTTAGACTAG -3’ Insertion (DNA segment) 5’ – AGCAGTTGACGACCTCGCGATCCGTAGTTAGACTAG -3’ Inversion Transposition5’ – AGCTCGACCTCGCGATCCGTAGTTATGAACGACTAG - 3’ Types of DNA Mutation (2)
A way of dealing with the Large number of genes per genome Huge genome size Technical challenges and cost of whole genome sequencing The search for DNA polymorphisms was not driven by a desire to complicate things, but rather by the low number of naked eye polymorphisms (NEPs) Markers may be linked to target genes Markers in target genes are perfect markers What is a perfect marker for a gene deletion? Why Use Markers?
Polymorphisms can be visualized at the metabolome, proteome, or transcriptome level but for a number of reasons (both technical and biological) DNA-level polymorphisms are currently the most targeted Regardless of whether it is a “perfect” or a “linked” DNA marker, there are two key considerations that need to be addressed in order for the researcher/user to visualize the underlying genetic polymorphism DNA Markers
1.Finding and understanding the genetic basis of the DNA-level polymorphism, which may be as small as a single nucleotide polymorphism (SNP) or as large as an insertion/deletion (INDEL) of thousands of nucleotides 2.Detecting the polymorphism via a specific assay or "platform". The same DNA polymorphism may be amenable to different detection assays Key steps for DNA Markers
1.Establish evolutionary relations: homoeology, synteny and orthology Homoeology: Chromosomes, or chromosome segments, that are similar in terms of the order and function of the genetic loci. Homoeologous chromosomes may occur within a single allopolyploid individual (e.g. the A, B, and D genomes in wheat) May also be found in related species (e.g. the 1A, 1B, 1D series of wheat and the 1H of barley) Orthology: Refers to genes in different species which are so similar in sequence that they are assumed to have originated from a single ancestral gene. Synteny: Classically refers to linked genes on same chromosome Also used to refer to conservation of gene order across species 2.Associations due to linkage or pleiotropy Identify markers that can be used in marker assisted selection 3.Locate genes for qualitative and quantitative traits Map-based cloning strategies Applications of Marker Maps
Polymorphisms vs. assays An ever-increasing number of technology platforms have been, and are being, developed to deal with these two key considerations These platforms lead to a bewildering array of acronyms for different types of molecular markers. To add to the complexity, the same type of marker may be assayed on a variety of platforms The ideal marker is one that targets the causal polymorphism (perfect marker). Not always available though….. Polymorphism Detection Issues
Labeled 3’ TGGCTAGCT 5’ Probe 3’ TGGCTAGCT 5’ ||||||||| Target 1 5’-CCTAACCGATCGACTGAC-3’ 2 5’-GGATTGGCTAGCTGACTG-3’ Restriction Fragment Length Polymorphism (RFLP) RFLPs are differences in restriction fragment lengths caused by a SNP or INDEL that create or abolish restriction endonuclease recognition sites. RFLP assays are based on hybridization of a labeled DNA probe to a Southern blot (Southern 1975) of DNA digested with a restriction endonuclease
RFLP Steps
Allele A Allele a AaaaAAaaA Ind 1 Ind 2Ind 5Ind 3Ind 4Ind 8Ind 6Ind 7 Co-Dominant RFLP Polymorphism Restriction Site
Allele A Allele a AaaaAAaaA Ind 1 Ind 2Ind 5Ind 3Ind 4Ind 8Ind 6Ind 7 Dominant RFLP Polymorphisms Restriction Site
Features of RFLPs Co-dominant, unless probe contains restriction site Locus-specific Genes can be mapped directly Supply of probes and markers is unlimited Highly reproducible Requires no special instrumentation Classically: radioisotope-based detection……
Amplified Fragment Length Polymorphism (AFLP) Fragment genomic DNA with frequent and rare cutters AFLPs are differences in restriction fragment lengths caused by SNPs or INDELs that create or abolish restriction endonuclease recognition sites. AFLP assays are performed by selectively amplifying a pool of restriction fragments using PCR.
Digestion with 2 restriction enzymes EcoRI (1/4096) MseI (1/256) Restriction site adapter ligation T A T A 5’ 3’ 5’ 3’ Selective preamplification C T T A T G 5’ 3’ 5’ 3’ Amplification AFLP Protocol
AFLP Polymorphisms Polymorphisms between genotypes may arise from: –Sequence variation in one or both restriction sites –Sequence variation in the region immediately adjacent to the restriction sites –Insertions or deletions within an amplified fragment Band Detection –Denaturing polyacrylamide gel electrophoresis & autoradiography or silver staining –Sequencing
Features of AFLPs Very high multiplex ratio Very high throughput Off-the-shelf technology Fairly reproducible Dominant and co-dominant Detection options Can convert favorite marker to sequence characterized amplified region
Simple Sequence Repeats (SSR) Simple sequence repeats (SSRs) or microsatellites are tandemly repeated mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs SSR length polymorphisms are caused by differences in the number of repeats Assayed by PCR amplification using pairs of oligonucleotide primers specific to unique sequences flanking the SSR Detection by autoradiography, silver staining, sequencing…
Repeat Motifs AC repeats tend to be more abundant than other di-nucleotide repeat motifs in animals (Beckmann and Weber 1992) The most abundant di-nucleotide repeat motifs in plants, in descending order, are AT, AG, and AC Typically, SSRs are developed for di-, tri-, and tetra-nucleotide repeat motifs CA and GA have been widely used in plants Tetra-nucleotide repeats have the potential to be very highly polymorphic; however, many are difficult to amplify SSR Repeats
Simple sequence repeat in hazelnut Note the difference in repeat length AND the consistent flanking sequence
Individual 1 (AC)x9 Individual 2 (AC)x11 51 bp 55 bp Powell et al Proc Natl Acad Sci U S A. 92(17): 7759–7763. Chloroplast SSRs of pine SSR Protocol
Features of SSRs Highly polymorphic Highly abundant and randomly dispersed Co-dominant Locus-specific High throughput Can be automated
Diversity Arrays Technology - DArT
2,500 markers per sample 94 samples - ~$4,500 ~ 2 cents per datapoint DArT Analysis
Features of DArT Very high multiplex ratio Very high throughput Bi-allelic Dominant marker system Requires substantial investment Fairly reproducible DArT sequences now available
DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered Single Nucleotide Polymorphisms (SNP) Alleles …..ATGCTCTTACTGCTAGCGC…… …..ATGCTCTTCCTGCTAGCGC…… …..ATGCTCTTACTGCAAGCGC…… Single Nucleotide Polymorphisms (SNPs) Consensus…..ATGCTCTTNCTGCNAGCGC……
Features of SNPs Highly abundant (1 every 200 bp in barley) Locus-specific Co-dominant and bi-allelic Basis for high-throughput and massively parallel genotyping technologies Genic rather than anonymous marker Phenotype due to SNP can be mapped directly
SNP Detection Strategy Locus specific system –Many samples with few markers Marker assisted selection in commercial breeding programs for key target characters Addition of characteristic major genes to e.g. mapping populations and association panels KASP – buy master mix and synthesize own primers Genome wide system –Fewer samples with many markers Germplasm characterization, academic and breeding Genotyping panels for GWAS Illumina or Affymetrix for higher density arrays, costs↓
Affymetrix Axiom Technology Two colour ligation based assay Utilises unique oligonucleotide complementary to flanking genomic sequence Automated parallel processing
Wheat SNP Arrays
KASP TM Genotyping More Information: /genotyping/#.VCMgyPldWJ0
Sequencing Approaches RRL – Reduced Representation Library RAD-Seq – Restriction Site Associated DNA Sequencing GBS – Genotyping by Sequencing
RADseq: Restriction-site Associated DNA markers Uses Illumina sequencing technology Based on digestion with restriction enzymes. An adapter binds to the restriction site and up to 5kb fragments are sequenced around the target size. Bioinformatics work used to find SNPs on the amplified regions
Genotyping by Sequencing
GP x Morex map
SNPs vs GbS SNPs –Minimal input, don’t even have to isolate DNA –Rapid turn around and data is ready to use –Markers in known genes and generally mapped –More useful in GWAS GbS –Now quite cheap and potentially many markers –Rapid generation of sequence output but markers are anonymous Find an expert bio-informatician to align your data and, if possible, align to reference sequence –More useful in bi-parental mapping studies
SNPS in Allopolyploids
Marker to Candidate Gene