Techniques of Molecular Biology
Basic molecular biology techniques Isolating nucleic acids Cutting DNA into fragments Ligating DNA fragments Amplifying DNA fragments Hybridization techniques Genomics Sequencing genomes Analyzing genome sequences Proteomics Separating proteins Analyzing proteins
Basic molecular biology techniques Isolating nucleic acids
Basic molecular biology techniques Isolating nucleic acids Cutting DNA into fragments
DNA can be reproducibly split into fragments by restriction endonucleases
DNA fragments can be separated by size in agarose or polyacrylamide gels Because of the phosphates in the sugar phosphate backbone, nucleic acids are negatively charged. In an electric field nucleic acids will move towards the positive pole. Smaller fragments move faster than larger fragments through the pores of a gel. Very large DNA molecules are separated from each other by special types of electrophoresis, e.g. pulsed field electrophoresis.
Basic molecular biology techniques Isolating nucleic acids Cutting DNA into fragments Ligating DNA fragments
Basic molecular biology techniques Isolating nucleic acids Cutting DNA into fragments Ligating DNA fragments Amplifying DNA fragments DNA can be amplified by Cloning PCR
DNA cloning and construction of DNA libraries Cloning in a plasmid vector Genomic library cDNA library
Vectors for DNA cloning
Basic molecular biology techniques Isolating nucleic acids Cutting DNA into fragments Ligating DNA fragments Amplifying DNA fragments DNA can be amplified by Cloning PCR
DNA polymerases dATP dTTP dGTP dCTP
DNA polymerases
The polymerase chain reaction (PCR)
Basic molecular biology techniques Isolating nucleic acids Cutting DNA into fragments Ligating DNA fragments Amplifying DNA fragments Hybridization techniques
Single-stranded nucleic acids can bind to each other by base pairing if they contain complementary sequences Using a single-stranded labeled probe complementary base pairing is able to detect specific nucleic acids among many different nucleic acids. If the probe is used to detect DNA, the analysis is called DNA blot (Southern) analysis. If an RNA fragment is detected, the analysis is called RNA blot (northern) analysis.
Transcriptome analysis using microarrays 26x24 = 624 spots
Basic molecular biology techniques Isolating nucleic acids Cutting DNA into fragments Ligating DNA fragments Amplifying DNA fragments Hybridization techniques Genomics Sequencing genomes
Sequencing techniques dideoxysequencing pyrosequencing dATP dTTP dGTP dCTP Genomic library denature (make single-stranded) anneal primer extend primer to copy one of the strands
Sequencing techniques dideoxysequencing pyrosequencing 2’ deoxynucleotide 2’-3’-dideoxynucleotide Base Base
Sequencing techniques dideoxysequencing 2’ deoxynucleotide 2’-3’-dideoxynucleotide Base Base
Sequencing techniques dideoxysequencing ddCTP ddTTP ddGTP polyacrylamide gel electrophoresis ≈ 800 nucleotides can be sequenced in one run
Sequencing techniques dideoxysequencing pyrosequencing ≈ 200 nucleotides can be sequenced in one run
Next generation sequencing methods https://en.wikipedia.org/wiki/DNA_sequencing
Genomics Sequencing genomes (assembling the sequence)
Genomics Sequencing genomes (assembling the sequence)
Genomics Sequencing genomes (assembling the sequence)
Genomics Sequencing genomes Analyzing genome sequences
Genomics Sequencing of genomes Split genome into pieces and sequence all pieces. Assembling the sequence (computer). Sequence analysis (annotation 1) Identify genes and other elements in sequence. Functional analysis (annotation 2) Determine function of identified elements.
How to find genes in a genome sequence Protein-coding genes Find open reading frames (protein-coding sequences) Find sequence with a codon bias Find upstream regulatory sequences (e.g. CpG islands) Find exon-intron boundaries Genes coding for functional RNAs Find consensus sequences for tRNAs and ribosomal RNAs Find specific RNA secondary structures (e.g. stem loops) Find upstream regulatory sequences
Genomic sequence
Finding open reading frames
Finding open reading frames gagtccagttgaaaagcaactggaatccccttatagataaattaatatctattttaaaattgaatagtttttattctagtttcgttttaagattaataaaattatgtctaaccaagtatttactactttacgcgcagcaacattagctgttattttaggtatggctggtggcttagcagtaagtccagctcaagcttaccctgtatttgcacaacaaaactacgctaacccacgtgaggctaatggtcgtattgtatgtgcaaactgtcacttagcgcaaaaagcagttgaaatcgaagtaccacaagctgttttacctgatactgtttttgaagctgttattgaacttccatacgataaacaagttaaacaagttttagctaatggtaaaaaaggtgacttaaacgttggtatggttttaattttaccagaaggttttgaattagcaccaccagatcgcgttccggcagaaattaaagaaaaagttggtaacctttactaccaaccatacagtccagaacaaaaaaatattttagttgttggtccagttccaggtaaaaaatacagtgaaatggtagtacctattttatctccagatcctgctaaaaataaaaacgtttcttacttaaaatatcctatttattttggtggtaatcgtggtcgtggtcaagtatatccagatggtaaaaaatcaaacaacactatttacaacgcatcagcagctggtaaaattgtagcaatcacagctctttctgagaaaaaaggtggttttgaagtttcaattgaaaaagcaaacggtgaagttgttgtagacaaaatcccagcaggtcctgatttaattgttaaagaaggtcaaactgtacaagcagatcaaccattaacaaacaaccctaacgttggtggtttcggtcaggctgaaactgaaattgtattacaaaaccctgctcgtattcaaggtttattagtattcttcagttttgttttacttactcaagttttattagttcttaagaaaaaacaattcgaaaaagttcaattagcagaaatgaacttctaatatttaattttttgtagggctgctgtgcagctcctacaaattttagtatgttatttttaaagtttgatatactgaaaacaaagttctacttgaacgatatttagcttttaatgcTATAATATagcggactaagccgttggcaatttagctgccaattaattttattcgaaggatgtaaacctgctaacgatatttatatataagcattttaatactccgagggaggcctctaacctttagcaagtaagtaaacttccccttcggggcagcaaggcagcagatttaaattctccaaaggaggcagttgatatcagtaaaccccttcgatgactctggcattgatgcaaagcatggggaaactaaagttcctccactgcctccttccccttccctttcgggacgtccccttccccttacgggcaagtaaacttagggattttaatgcaataaataaatttgtccccttacgggacgtcagtggcagttgcgaagtattaatattgtatataaatatagaatgtttacatactccgaaggaggacgtcagtggcagtggtaccgccactgctattttaatactccgaaggagcagtggtggtcccactgccactaaaatttatttgcccgaagacgtcctgccaactgccgaggcaaatgaattttagtggacgtcccttacgggacgtcagtggcagttgcctgccaactgcctccttccccttcgggcaagtaaacttgggagtattaacataggcagtggcggtaccacaataaattaatttgtcctccttccccttcgggcaagtaaacttaggagtatgtaaacattctatatttatatactcccatgctttgccccttaagggacaataaataaatttgtccccttcgggcaaataaatcttagtggcagttgcaaaatattaatatcgtatataaatttggagtatataaataaatttggagtatataaatataggatgttaatactgcggagcagcagtggtggtaccactgccactaaaatttatttgcccgaaggggacgtcctgccaactgccgatatttatatattccctaagtttacttgccccatatttatatattcctaagtttacttgccccatatttatattaggacgtccccttcgggt Expasy server
Sequence from the E. coli genome
The E. coli genome High gene density on both strands of the E. coli genome.
Genes = all DNA sequences that are transcribed into RNA Protein-coding genes 5’ UTR coding region = open reading frames 3’ UTR 5’ - - 3’ Translation start Translation stop protein-coding gene = DNA transcribed into mRNA UTR = untranslated region
Exons and introns in eukaryotic genes 5’ UTR 3’ UTR Features that can be used to find genes in eukaryotic sequences: Codon bias. Exon-intron boundaries. Upstream sequences: in vertebrates CpG islands (in 40-50% of human genes). Figure 5.4 Genomes 3 (© Garland Science 2007)
How to find genes in a genome sequence Protein-coding genes Find open reading frames (protein-coding sequences) Find sequence with a codon bias Find upstream regulatory sequences (e.g. CpG islands) Find exon-intron boundaries Genes coding for functional RNAs Find consensus sequences for tRNAs and ribosomal RNAs Find specific RNA secondary structures (e.g. stem loops) Find upstream regulatory sequences
Figure 5.6b Genomes 3 (© Garland Science 2007)
A typical sequence annotation result Automatic annotation, 15 kb sequence of the human genome containing a tissue factor gene. Location of 6 exons revealed. Figure 5.10 Genomes 3 (© Garland Science 2007)
Verifying the identity of a gene Homology search Experimental techniques Northern hybridization Zoo-blotting
Verifying the identity of a gene Homology search BLAST MSNQVFTTLR AATLAVILGM AGGLAVSPAQ AYPVFAQQNY ANPREANGRI VCANCHLAQK AVEIEVPQAV LPDTVFEAVI ELPYDKQVKQ VLANGKKGDL NVGMVLILPE GFELAPPDRV PAEIKEKVGN LYYQPYSPEQ KNILVVGPVP GKKYSEMVVP ILSPDPAKNK NVSYLKYPIY FGGNRGRGQV YPDGKKSNNT IYNASAAGKI VAITALSEKK GGFEVSIEKA NGEVVVDKIP AGPDLIVKEG QTVQADQPLT NNPNVGGFGQ AETEIVLQNP ARIQGLLVFF SFVLLTQVLL VLKKKQFEKV QLAEMNF BLAST = Basic Local Alignment Search Tool
Case study, yeast genome 6274 ORFs Orphans= genes of unknown function; single orphans = unique genes not found in databases. Additional methods for identifying the function of genes: Comparative genomics. cDNA sequencing. Transposon tagging. Figure 5.28 Genomes 3 (© Garland Science 2007)
Finding the function of a gene (product) Computer based analysis Homology search Experimental analysis Gene inactivation Overexpression
Whole genome studies Tiling assays
Proteomics Isolating and separating proteins Identifying and analyzing proteins
Working with proteins Separating proteins Analyzing proteins and their interactions
Separating proteins on polyacrylamide gels
Immunoblot (Western blot)
Proteins can be sequenced
Complex mixtures of proteins can be analyzed by mass spectrometry MALDI-TOF = Matrix-assisted laser desorption ionization – time of flight
Typical workflow in analysis of proteins by mass spectometry
Liquid chromatography is used to separate peptides before mass spectrometry
Mass spectrum
Mass spectra are compared to theoretical values
Mouse liver proteins Example of spots on a gel obtained after 2D electrophoresis. Spots of interest are cut out and protein identified by mass spectrometry. Figure 6.11 Genomes 3 (© Garland Science 2007)
Protein interaction map of yeast Each dot is a protein. Red dot: essential protein. Figure 6.20a Genomes 3 (© Garland Science 2007)
Nucleic acid protein interactions Electrophoretic mobility shift assay (EMSA)
Nuclease protection footprinting is used to identify the DNA sequence to which a protein binds
In vitro selection assay uses a combinatorial DNA sequence library to identify DNA sequences to which a protein binds.
Chromatin immuno- precipitation (ChIP) Identifies protein-binding sites in vivo
Chromosome conformation capture (3C assay) Identifies DNA sequences that are conformationally linked