Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNA sequencing and genome architecture

Similar presentations


Presentation on theme: "DNA sequencing and genome architecture"— Presentation transcript:

1 DNA sequencing and genome architecture

2 Steps in Genetic Analysis
Knowing how many genes determine a phenotype (Mendelian and/or QTL analysis), and where the genes are located (linkage mapping) is a first step in understanding the genetic basis of a phenotype A second step is determining the sequence of the gene (or genes)

3 Steps in Genetic Analysis
Subsequent steps involve…. Understanding gene regulation Understanding the context of the gene in the sequence of the whole genome Analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways, how these pathways interact with the environment

4 Genome sequence of a diploid plant (2n = 2x = 14)
5,300,000,000 base pairs 165% of human genome Enough characters for 11,000 large novels 60,000,000 base pairs of expressed sequence ~ 1% of total sequence, like humans 125 large novels

5 Molecular tools for determining DNA (or RNA) sequence
Genomic DNA (or RNA) extraction (RNA cDNA) Manipulate DNA with restriction enzymes to reduce complexity and/or facilitate further manipulation Manage and/or maintain DNA in vectors and/or libraries Selecting DNA targets via amplification and/or hybridization Determining nucleotide sequence of the targeted DNA

6 Extracting DNA (orRNA)
Genomic DNA (or RNA):    Leaf segments or target tissues Key considerations are Concentration Purity Fragment size

7 mRNA to cDNA Reverse transcriptase

8 Restriction Enzymes Restriction enzymes make cuts at defined recognition sites in DNA A defense system for bacteria, where they attack and degrade the DNA of attacking bacteriophages The restriction enzymes are named for the organism from which they were isolated   Harnessed for the task of systematically breaking up DNA into fragments of tractable size and for various polymorphism detection assays Each enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence

9 Restriction Enzymes Recognition sites and fragment size: a four-base cutter ~ 256 bp (44); more frequently than a six-base cutter, which in turn will cut more often than one with an eight base-cutter Methylation sensitive restriction enzymes Target the epigenome

10 Restriction Enzymes Palindrome recognition sites – the same sequence is specified when each strand of the double helix is read in the opposite direction. Sit on a potato pan, Otis Cigar? Toss it in a can, it is so tragic UFO. tofu Golf? No sir, prefer prison flog Flee to me remote elf Gnu dung Lager, Sir, is regal Tuna nut CRISPR: Clustered regularly interspaced palindromic repeats

11 Vectors Propagate and maintain DNA fragments generated by the restriction digestion Efficiency and simplicity of inserting and retrieving the inserted DNA fragments Key feature of the cloning vector: size of the DNA insert Plasmid ~ 1 kb BAC ~ 200 kb

12 Libraries Repositories of DNA fragments cloned in their vectors or attached to platform-specific oligonucleotide adapters Classified in terms of cloning vector: e.g. plasmid, BAC  in terms of cloned DNA fragment source: e.g. genomic, cDNA In terms of intended use: e.g. next generation sequencing (NGS)

13 Genomic DNA libraries Total genomic DNA digested and the fragments cloned into an appropriate vector or system Representative sample all the genomic DNA present in the organism, including both coding and non-coding sequences Enrichment strategies: target specific types of sequences unique sequences   Methylated sequences

14 cDNA libraries Generated from mRNA transcripts, using reverse transcriptase The cDNA library represents only the genes that are expressed in the tissue and/or developmental stage that was sampled  

15 DNA Amplification: Polymerase Chain Reaction (PCR)
K.B Mullis, 1983 in vitro amplification of ANY DNA sequence

16 Synthetic DNA: oligonucleotides
Primers, adapters, and more …~$0.010 per bp... < ~ 100 bases

17 DNA Amplification: Polymerase Chain Reaction (PCR)
Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.

18 DNA Amplification - PCR
A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.

19 PCR Principles The PCR reaction consists of: Buffer
DNA polymerase (thermostable) Deoxyribonucleotide triphosphates (dNTPs) Two primers (oligonucleotides) Template DNA

20 PCR Principles Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.

21 PCR Principles The choice of what DNA will be amplified by the polymerase is determined by the primers The DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templates Steps in the reaction include denaturing the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension 

22 PCR Applications Amplify a target sequence from a pool of DNA (your favorite gene, forensics, fossil DNA) Start the process of genome sequencing Generate abundant markers for linkage map construction molecular markers

23 DNA Hybridization Single strand nucleic acids find and pair with other single strand nucleic acids with a complementary sequence An application of this affinity is to label one single strand and then to use this probe to find complementary sequences in a population of single stranded nucleic acids For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample  The microarray concept:

24 DNA Hybridization The principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody Southern blot Northern blot Western blot

25 Sanger DNA Sequencing - classic but not obsolete
The gold standard for accurate sequencing of short (~ 650 bp) DNA sequences

26 Next Generation Sequencing - Illumina
A tremendous amount of short read data in a short time, at a good price

27 Sequencing - PAC Bio Long reads!

28 Sequencing - Nanopore Potentially, cheap, fast, portable

29 Sequencing considerations
Read length Accuracy Speed Cost Assembly

30 Genome sizes and whole genome sequencing
Plant Genome size # Genes Arabidposis thaliana 135 Mb 27,000 Fragaria vesca 240 Mb 35,000 Theobroma cacao 415 Mb 29,000 Zea mays 2,300 Mb 40,000 Pinus taeda 23,200Mb 50,000 Paris japonica 148,852Mb ?? Credit: Karl Kristensen, Denmark

31 Sequencing a plant genome

32 Sequencing a plant genome
Fragaria vesca Herbaceous, perennial 2n=2x=14 240 Mb Reference species for Rosaceae Genetic resources Credit: commons.Wikimedia.org Fragaria x ananassa: 2n=8x=56. Domesticated 250 years ago

33 Sequencing a plant genome
Short reads No physical reference De novo assembly Open source

34 Sequencing a plant genome
Roche 454, Illumina X39 coverage (number of reads including a given nucleotide) Contigs (overlapping reads) assembled into scaffolds (contigs + gaps) ~ 3,200 scaffolds N50 of 1.3 Mb (weighted average length) Over 95% (209.8 Mb) of total sequence is represented in 272 scaffolds

35 Anchoring the genome sequence to the genetic map
Sequencing a plant genome Anchoring the genome sequence to the genetic map 94% of scaffolds anchored to the diploid Fragaria reference linkage map using 390 genetic markers Pseudochromosomes ~ linkage groups

36 Sequencing a plant genome
Synteny Homologs Orthologs Paralogs Credit: Biology stackexchange.com

37 Sequencing a plant genome
The small genome size (240 Mb) Absence of large genome duplications Limited numbers of transposable elements, compared to other angiosperms

38 Sequencing a plant genome the transcriptome
Fruits and roots – different types of genes

39 Sequencing a plant genome
Gene prediction 34,809 nuclear genes flavor, nutritional value, and flowering time 1,616 transcription factors RNA genes 569 tRNA, 177 rRNA, 111 spliceosomal RNAs, 168 small nuclear RNAs, 76 micro RNA and 24 other RNAs Chloroplast genome 155,691 bp encodes 78 proteins, 30 tRNAs and 4 rRNA genes Evidence of DNA transfer from plastid genome to the nuclear genome

40 Genome architecture and evolution
Key considerations: Genes Chromosomes C value paradox Gene regulation Epigenetics Transposable elements

41 Genome architecture and evolution
Plant #genes (est) 2n = _x = _ Genome size Arabidposis thaliana 27,000 2n = 2x = 10 135 Mb Fragaria vesca 35,000 2n = 2x = 14 240 Mb Theobroma cacao 29,000 2n = 2x = 20 415 Mb Zea mays 40,000 2,300 Mb Pinus taeda 50,000 2n = 2x =24 23,200Mb Paris japonica ?? 2n = 8x = 40 148,852Mb S. Wessler on Transposable elements: Review of retroviruses: Review of cut and past transposition:

42 Genes Other RNAs? DNA ……..……..mRNA……….…….Protein Plant
Transcription tRNA rRNA Translation Plant Estimated # genes Arabidposis thaliana 27,000 Fragaria vesca 35,000 Theobroma cacao 29,000 Zea mays 40,000 Other RNAs?

43 Genes (classically..) DNA specifying a protein 200 – 2,000,000 nt (bp)
promoter Coding region Exon Intron Exon Intron Exon Start codon Stop codon 5’UTR 3’UTR Basal promoter +1 Termination signal ORF mRNA CDS

44 F. vesca: 35,000 genes/7 chromosomes = 5,000 genes/chromosome?
Plant 2n = _X = _ Arabidposis thaliana 2n = 2x = 10 Fragaria vesca 2n = 2x = 14 Theobroma cacao 2n = 2x = 20 Zea mays F. vesca: 35,000 genes/7 chromosomes = 5,000 genes/chromosome?

45 Genes, chromosomes, and genomes
F. vesca: 2n = 2x = 14 genome = 240 Mb average gene ~ 3kb 79,333 genes? 11,333 genes/chromosome? 35,000 genes….. ~ 5,000 genes/chromosome What’s the rest of the genome???????

46 C-value paradox “Organisms of similar evolutionary complexity
differ vastly in DNA content” Federoff, N Science. 338: 1 pg = 978 Mb

47 The C-value paradox Fig. 1.The C-value paradox. The range of haploid genome sizes is shown in kilobases for the groups of organisms listed on the left. [Adapted from an image by Steven M. Carr, Memorial University of Newfoundland] Fedoroff Science 2012;338:

48 C-value paradox Plant Genome size # Genes Arabidposis thaliana 135 Mb
27,000 Fragaria vesca 240 Mb 35,000 Theobroma cacao 415 Mb 29,000 Zea mays 2,300 Mb 40,000 Pinus taeda 23,200Mb 50,000 Paris japonica 148,852Mb ??

49 C-value paradox Junk??????? Shining a Light on the
Genome’s ‘Dark Matter’

50 40% of all human disease-related SNPs are OUTSIDE of genes
Gene regulation Pennisi, E Science 330:1614. 40% of all human disease-related SNPs are OUTSIDE of genes The dark matter is conserved and therefore must have a function DNA sequences in the dark matter are involved in gene regulation ~80% of the genome is transcribed but “genes” account for ~2% RNAs of all shapes and sizes: RNAi lncRNA

51 Epigenetics Observe changes in phenotype without changes in genotype - due to alternative regulation ( 0 – 100%) of the gene Methylation expression Acetylation expression

52 Epigenetics Facultative heterochromatin:
Holoch and Moazed Nature Genetics

53 Transposable elements
DNA sequences that can move to new sites in the genome More than half the DNA in many eukaryotes Two major classes: Transposons: Move via a DNA cut-and-paste mechanism Retrotransposons: Move via an RNA intermediate Potentially disruptive – can eliminate gene function. Therefore, usually epigenetically silenced

54 Transposable elements
Federoff (2012) argues that TE’s, via altering gene regulation, account for the “evolvability” of the “massive and messy genomes” characteristic of higher plants Create new genes Modify genes Program and re-program genes Transposition events lead to genome expansion and help to explain the C value paradox

55 Transposable elements
Transposition events lead to genome expansion and help to explain the C value paradox: TEs nested within TEs nested within TEs

56 Transposable elements
The arrangement of retrotransposons in the maize adh1-F region Fig. 6.The arrangement of retrotransposons in the maize adh1-F region. The short lines represent retrotransposons, with the internal domains represented in orange and the LTRs in yellow. Younger insertions within older insertions are represented by the successive rows from the bottom to the top of the diagram. Small arrows show the direction of transcription of the genes shown under the long blue line that represents the sequence in the vicinity of the adh1 gene. [Adapted with permission from (102)] N V Fedoroff Science 2012;338: Published by AAAS

57 Transposable elements
Sequence adjacent to the bronze (bz) gene in different lines of maize Fig. 7.The organization of the sequence adjacent to the bronze (bz) gene in eight different lines (haplotypes) of maize. The genes in this region are shown in the top diagram: bz, stc1, rpl35A, tac6058, hypro1, znf, tac7077, and uce2. The orientation of the gene is indicated by the direction of the green pentagon, pointing in the direction of transcription; exons are represented in dark green and introns in light green. Each haplotype is identified by its name and the size of the cloned NotI fragment. The same symbols are used for gene fragments carried by Helitrons (Hels), which are represented as bidirectional arrows below the line for each haplotype. Vacant sites for HelA and HelB are provided as reference points and marked by short vertical red bars. Dashed lines represent deletions. Retrotransposons are represented by yellow bars. DNA transposons and TAFTs (TA-flanked transposons), which are probably also DNA transposons, are represented by red triangles; small insertions are represented by light blue triangles. [Redrawn with permission from (113)] Published by AAAS N V Fedoroff Science 2012;338:

58 Transposable elements
85% of the maize genome consists of transposons Transposition events are in real time: differences between maize inbreds Transposons can move large blocks of intervening DNA Transposases are the products of the most abundant genes on earth

59 Transposable elements
~ 24% of the cacao genome ~ 21% of the Fragaria genome ~68,000 TE-related sequences in cacao “Gaucho” is a retrotransposon ~ 11Kb in length and present ~1,000 times “The lack of highly abundant LTR transposons is likely to be the reason F. vesca has a relatively small-size genome”

60 Transposable elements
Reid and Ross. Mendel’s genes….An Ac/Ds-like 0.8 kb insertion: round vs. wrinkled peas Transposon tagging – find genes by mutation due to TE Holoch and Moazed. RNAi….Transposons held in check by epigenetic mechanisms

61 Genome architecture and evolution
Plant #genes (est) 2n = _x = _ Genome size Arabidposis thaliana 27,000 2n = 2x = 10 135 Mb Fragaria vesca 35,000 2n = 2x = 14 240 Mb Theobroma cacao 29,000 2n = 2x = 20 415 Mb Zea mays 40,000 2,300 Mb Pinus taeda 50,000 2n = 2x =24 23,200Mb Paris japonica ?? 2n = 8x = 40 148,852Mb S. Wessler on Transposable elements: Review of retroviruses: Review of cut and past transposition:


Download ppt "DNA sequencing and genome architecture"

Similar presentations


Ads by Google