- DNA sequencing in the last century - Current technologies (Illumina, Ion Torrent) - New developments (PacBio, Nanopore) Topics.

Slides:



Advertisements
Similar presentations
Ch 17 Gene Expression I: Transcription
Advertisements

High-Throughput Sequencing Technologies
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Next-generation sequencing
Analysis of ChIP-Seq Data
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html Chromatin Immunoprecipitation (ChIP) data.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Greg Phillips Veterinary Microbiology
Transcriptomics Jim Noonan GENE 760.
1 Library Screening, Characterization, and Amplification Screening of libraries Amplification of DNA (PCR) Analysis of DNA (Sequencing) Chemical Synthesis.
Characterization, Amplification, Expression
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Polymerase chain reaction: Starting with VERY SMALL AMOUNTS OF DNA (sometimes a few molecules), one can amplify the DNA enough to detect it by electrophoresis.
1 Characterization, Amplification, Expression Screening of libraries Amplification of DNA (PCR) Analysis of DNA (Sequencing) Chemical Synthesis of DNA.
1 DNA Sequencing Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
Next Generation DNA Sequencing Platforms: Evolving Tools for
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
The impact of next-generation sequencing technology of genetics Elaine R. Mardis – 11 February Washington School of Medicine, Genome Sequencing Center.
DNA Sequencing Today, laboratories routinely sequence the order of nucleotides in DNA. DNA sequencing is done to: Confirm the identity of genes isolated.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
DNA, RNA & Proteins Transcription Translation Chapter 3, 15 & 16.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
High Throughput Sequencing Methods and Concepts
Announcements Lab notebooks due Monday by 5 No Ch. 9 Part 2 homework
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Bioinformatics and Sequencing Relevant to SolCAP
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Next Generation Sequencing Bioinformatics Stephen Taylor Computational Biology Research Group.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
I519 Introduction to Bioinformatics, Fall, 2012
Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
CHAPTER 7 DNA SEQUENCING - INTRODUCTION - SANGER DIDEOXY METHOD - AUTOMATED SEQUENCING - NEXT GENERATION OF SEQUENCING METHODS MISS NUR SHALENA SOFIAN.
Chapter 5: Exploring Genes and Genomes Copyright © 2007 by W. H. Freeman and Company Berg Tymoczko Stryer Biochemistry Sixth Edition.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
(Foundation Block) Dr. Sumbul Fatma
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
Sequencing tutorial Peter HANTZ EMBL Heidelberg.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Some basic molecular biology Summaries of: Replication, Transcription; Translation, Hybridization, PCR Material adapted from Lodish et al, Molecular Cell.
Lecture-5 ChIP-chip and ChIP-seq
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Sanger or Dideoxy DNA Sequencing
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Transcription and The Genetic Code From DNA to RNA.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.
Introduction to Illumina Sequencing
DNA Sequencing First generation techniques
6.3 – Manipulating genomes
Next generation sequencing
Microbial Genomes and techniques for studying them.
Sequencing Technologies
Relationship between Genotype and Phenotype
The Human Genome Project
SOLEXA aka: Sequencing by Synthesis
B3- Olympic High School Bioinformatics
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
High-Throughput Sequencing Technologies
High-Throughput Sequencing Technologies
Next-generation DNA sequencing
ChIP-seq Robert J. Trumbly
Relationship between Genotype and Phenotype
Presentation transcript:

- DNA sequencing in the last century - Current technologies (Illumina, Ion Torrent) - New developments (PacBio, Nanopore) Topics

T Sanger sequencing - Random incorporation of blocked nucleotides  at any position, reaction stops in a small fraction of the reads TTGCACTTGAGTCGT AACGTGAACTCAGCATAGGCTCAGATAGAT A-Reaction: add dATP (elongation) and ddATP (block) Analogous: C-, G-, T-Reaction ddATP - Developed by Fred Sanger in the 70ies ( , 2*Nobel laureate: 1958 – protein structure of insulin, 1980 – sequencing of nucleic acids) - Sequencing by synthesis: DNA polymerase is synthesizing a complementray strand by adding single nucleotides TTGCACTGAGTCG AACGTGACTCAGCATAGGCTCAGATAGAT

TTGCACTTGAGTCG AACGTGAACTCAGCATAGGCTCAGATAGAT A-Reaction: TTGCA TTGCACTTGA C-Reaction: TTGC TTGCAC TTGCACTTGAGTC G-Reaction: TTG TTGCACTTG TTGCACTTGAG TTGCACTTGAGTCG T-Reaction: TT TTGCACT TTGCACTT TTGCACTTGAGT ddNTP Sanger sequencing ladder of DNA fragments  electrophoresis  sequence T G C A

GATTGATAGTTGC CTAACTATCAACGTATAGGCTCAGATAGAT G GA GAT GATT GATTG GATTGA GATTGAT GATTGATA GATTGATAG GATTGATAGT GATTGATAGTT GATTGATAGTTG GATTGATAGTTGC - labeled ddNTPS, capillary sequencing A Sanger sequencing

Pyrosequencing - immobilize DNA on beads, pyrosequencing in microreactors dTTP TTGCACTGAGTCGT AACGTGACTCAGCATAGGCTCAGATAGAT PPi ATP Oxyluciferin + light 454 technology

DNA-loaded beads + primer + polymerase + sulfurylase + luciferase flowgram TTGCACTGAGTCGT AACGTGACTCAGCAAGTCTATTCACCCAC technology Problem: homopolymers difficult to detect

increase throughput: - DNA gel electrophoresis, single genes in few days - capillary electrophoresis, 96 capillaries per machine, human genome in a few years - sequencing on microbeads: 454 technology Parallelisation & Miniaturisation

Illumina sequencing: - sequencing by synthesis - massive parallelisation and miniaturisation by self-organising DNA microarrays on a glass surface - several hundred Gb, >10 9 reads per run Illumina technology

- generate libraries - grow clusters on a flowcell - sequence by addition and imaging of blocked & fluorescence-labeled nucleotides Illumina technology

library preparation: DNA fragments Blunting by Fill-in and exonuclease Phosphorylation Addition of A-overhang Ligation to adapters Illumina technology

cluster generation: 1. flowcell Illumina technology

cluster generation: 1. flowcell 2. hybridize template Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification 5. linearisation Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification 5. linearisation 6. cleave reverse strand Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification 5. linearisation 6. cleave reverse strand 7. block 3‘-ends Illumina technology

cluster generation: 1. flowcell 2. hybridize template 3. immobilize template 4. bridge amplification 5. linearisation 6. cleave reverse strand 7. block 3‘-ends 8. hybridize primer Illumina technology

Imaging & Sequencing: Illumina technology Nucleotide + fluorescent dye + terminator

reversible terminators: Illumina technology

fluorescently labelled clusters: Illumina technology

what can we do with short reads? RNA-seq, identify transcripts, count #reads per transcript  assessment of differential expression problem: reads are too short to establish connectivity of all exons, difficult/impossible to quantify multiple isoforms of a gene Sequencing Applications

Stefan Krebs, Single end: ambiguous mapping Paired end sequencing: read fragment from both ends -> resolve ambiguities Improvements: Paired end Reads

further improvements long jumping mate-pair libraries: circularize large fragment and reads junctions (2-10 kb) resolve large repeats in genome assembly Improvements: Circularization

Third generation Sequencing

- single molecule detection -several kilobases read length -moderate output ( wells) -expensive instrument and high cost per base Pacific Biosciences

Read length distribution

Pacific Biosciences

everything that can be converted to a DNA strand can be sequenced - even long-term data storage by encoding in synthetic DNA is possible BIOLOGICAL APPLICATIONS: sequencing of genomes, transcriptomes, population diversity, composition of microbial communities, ChIPseq, methyl-Seq, translating RNA from ribosomes,... MEDICAL APPLICATIONS: whole genome sequencing, exome sequencing, tumor diagnostics, sequencing of T-cell receptor diversity, identification of pathogens,... FORENSICS, FOOD SAFETY, ARCHEOLOGY, … Applications

Chromatin Immunoprecipitation (ChIP)

mRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Motivation: Regulation of gene expression Transcriptional Post-transcriptional

At which loci does a protein bind the DNA? Are there cell-type or environment-specific variations of binding affinity? Which histone modifications determine chromatin structure? To which motifs does a transcription factor bind? What is the “cis-regulatory code” of a gene? Motivation: Regulation of gene expression DNA Activation Repression x Enhancer Promoter

Sequencing DNA binding protein of interest Antibody Chromatin Immunoprecipitation (ChIP)

Control: input DNA Chromatin Immunoprecipitation (ChIP) Sequencing

ChIP-Seq Analysis Workflow Peak Detection Annotation Motif Analysis Visualization Alignment Chromatin Immunoprecipitation (ChIP) ELAND Bowtie SOAP SeqMap … SISSRs QuEST MACS CisGenome … STAN chromHMM … IGV Ensembl GB UCSC GB … cERMIT HMMer Xxmotif …

ACCAATAATCAGCTAAGCCGTTAGCCACAGATGGAA Protein of interest Chromatin Immunoprecipitation (ChIP) Sonication crosslink site

Read Alignment

Read count genome Expected read count Expected read count = total number of reads * extended fragment length / chr length genome T A T T A A T T A T C C C C A T A T A T G A T A T Read Alignment

Read direction provides extra information Hongkai Ji et al. Nature Biotechnology 26: Read Alignment

The ENCODE Project Goal: Define all functional elements in the human genome How: Lots of groups Lots of assays Lots of cell lines Lots of communication/consortium analysis Standardization of methods, reagents, analysis Genome-wide A lot of money

47 2 Tier 1 cell lines –GM12878 (B cell) –K562 (CML cells) 5 Tier 2 cells –HeLa S3, HepG2, HUVEC, primary keratinocytes, hESC Many Tier 3 cells RNA profiling (Scott Tenenbaum): Inter-cell line differences are greater than inter-lab differences The ENCODE Project

48 RNA-seq RNA-array TF ChIP-seq Histone modif ChIP-seq DNase-seq Bisulfite-seq 1M SNP genotyping Lots of data and data types generated by The ENCODE Project

49 Dynamic Bayesian Networks HMM segmentation PCA analysis Open Chromatin Trans. Factor Chip-seq Histone Mod. Chip-seq RNA Std. Peaks Region callsActive regions …… Biological interpretation Integrative Data Analysis

50 12 Histone modifications 2 Transcription factors GM12878 K562 “Standard” EM Training Posterior Probability Decoding Genome Viterbi Path State FState IState AState CState E Data: Entire ENCODE Consortium Analysis: Jason Ernst/Manolis Kellis 25-state HMM Integrative Data Analysis