RNA-Seq technology and it's application on dosage compensation between the X chromosome and autosomes in mammals 2011-12-05.

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Functional Genomics with Next-Generation Sequencing
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
Microarray technology and analysis of gene expression data Hillevi Lindroos.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
9 Genomics and Beyond Brief Chapter Outline
Transcriptomics Jim Noonan GENE 760.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
High Throughput Sequencing
RNA-Seq Sebastian Groß  Why transcriptomics?  RNA-Seq: a revolutionary tool for transcriptomics  RNA-Seq benefits and comparison with other.
mRNA-Seq: methods and applications
CS 6293 Advanced Topics: Current Bioinformatics
and analysis of gene transcription
Fine Structure and Analysis of Eukaryotic Genes
DNA Technology- Cloning, Libraries, and PCR 17 November, 2003 Text Chapter 20.
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Todd J. Treangen, Steven L. Salzberg
Microarray Technology
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Verna Vu & Timothy Abreo
Invitrogen Corporation 1600 Faraday Ave. Carlsbad, CA USA Tel: FAX: Toll Free Tel:
Scenario 6 Distinguishing different types of leukemia to target treatment.
The iPlant Collaborative
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
While replication, one strand will form a continuous copy while the other form a series of short “Okazaki” fragments Genetic traits can be transferred.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Biases in RNA-Seq data. Transcript length bias Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both.
Lecture 12 RNA – seq analysis.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Next-generation sequencing technology
RNA Quantitation from RNAseq Data
Next generation sequencing
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Next-generation sequencing technology
Gene expression.
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Design and Analysis of Single-Cell Sequencing Experiments
CHAPTER 12 DNA Technology and the Human Genome
Sensitivity of RNA‐seq.
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Hyeshik Chang, Jaechul Lim, Minju Ha, V. Narry Kim  Molecular Cell 
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways  Nathan Archer, Mark D. Walsh, Vahid Shahrezaei,
Volume 14, Issue 7, Pages (February 2016)
Volume 16, Issue 8, Pages (August 2016)
Volume 8, Issue 6, Pages (September 2014)
Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways  Nathan Archer, Mark D. Walsh, Vahid Shahrezaei,
Sequence Analysis - RNA-Seq 2
Volume 8, Issue 6, Pages (September 2014)
Presentation transcript:

RNA-Seq technology and it's application on dosage compensation between the X chromosome and autosomes in mammals 2011-12-05

Outline RNA-Seq: technologies and it's methodologies Application on dosage compensation model

RNA-Seq: technologies and it's methodologies

Transcriptomics methods before RNA-Seq Hybridization-based approaches Genomic tiling microarrays Fluorescently labelled cDNA with microarrays Sequence-based approaches Sanger sequencing of cDNA or EST libraries Serial analysis of gene expression (SAGE) Cap analysis of gene expression (CAGE) Massively parallel signature sequencing (MPSS)

A typical RNA-Seq experiment

Sequencer used for RNA-Seq Illumina IG Applied Biosystems SOLiD Roche 454 Life Science Helicos Biosciences tSMS (has not yet been used for published RNA-Seq studies, data from Jan. 2009)

Direct RNA sequencing using the Helicos approach a | RNA that is polyadenylated and 3′ deoxy-blocked with poly(A) polymerase is captured on poly(dT)-coated surfaces. A 'fill-and-lock' step is performed, in which the 'fill' step is performed with natural thymidine and polymerase, and the 'lock' step is performed with fluorescently labelled A, C and G Virtual Terminator (VT) nucleotides and polymerase. This step corrects for any misalignments that may be present in poly(A) and poly(T) duplexes, and ensures that the sequencing starts in the RNA template rather than the polyadenylated tail. b | Imaging is performed to locate the positions of the templates. Then, chemical cleavage of the dye–nucleotide linker is performed to release the dye and prepare the templates for nucleotide incorporation. c | Incubation of this surface with one labelled nucleotide (C-VT is shown as an example) and a polymerase mixture is carried out. After this step, imaging is performed to locate the templates that have incorporated the nucleotide. Chemical cleavage of the dye allows the surface and DNA templates to be ready for the next nucleotide-addition cycle. Nucleotides are added in the C, T, A, G order for 120 total cycles (30 additions of each nucleotide).

Advantages of RNA-Seq compared with other transcriptomics methods

Quantifying expression levels: RNA-Seq and microarray compared

Challenges for RNA-Seq Library construction Bias in the result from different library construction (RNA fragmentation and cDNA fragmentation) for large RNA Strand-specific libraries are currently laborious to produce Bioinformatic challenges The development of efficient methods to store, retrieve and process large amounts of data Mapping reads to the genome Coverage versus cost

DNA library preparation: RNA fragmentation and DNA fragmentation compared a | Fragmentation of oligo-dT primed cDNA (blue line) is more biased towards the 3' end of the transcript. RNA fragmentation (red line) provides more even coverage along the gene body, but is relatively depleted for both the 5' and 3' ends. Note that the ratio between the maximum and minimum expression level (or the dynamic range) for microarrays is 44, for RNA-Seq it is 9,560. The tag count is the average sequencing coverage for 5,000 yeast ORFs. b | A specific yeast gene, SES1 (seryl-tRNA synthetase), is shown.

Coverage versus depth

Metholologies for RNA-Seq studies Mapping transcription start sites Strand-specific RNA-Seq Characterization of alternative splicing patterns Gene fusion detection Targeted approaches using RNA-Seq Small RNA profiling Direct RNA sequencing Profiling low-quantity RNA samples

Mapping transcription start sites (TSSs)

Mapping transcription start sites (TSSs) Advantages Low quantities of input RNA Pair-end sequencing enables identified TSSs to specific transcripts Pair-end sequencing alleviates the difficulty of aligning single short reads to repeat regions Disadvantages Primer dimers dominates sequencing data sets Dependent on cDNA synthesis or hybridization steps Be challenging for short-lived transcripts

Strand-specific RNA-Seq Adaptors with known orientations are ligated to the ends of RNAs or to first-strand cDNA molecules Direct sequencing of the first-strand cDNA products Selective chemical marking of the second-strand cDNA synthesis products or RNA

Characterization of alternative splicing patterns a | Sequence reads are mapped to genomic DNA or to a transcriptome reference to detect alternative isoforms of an RNA transcript. Mapping is based simply on read counts to each exon and reads that span the exonic boundaries. One infers the absence of the genomic exon in the transcript by virtue of no reads mapping to the genomic location. b | Paired sequence reads provide additional information about exonic splicing events, as demonstrated by matching the first read in one exon and placing the second read in the downstream exon, creating a map of the transcript structure.

Gene fusion detection

Targeted approaches using RNA-Seq

Targeted approaches using RNA-Seq

Small RNA profiling

Direct RNA sequencing a | RNA that is polyadenylated and 3′ deoxy-blocked with poly(A) polymerase is captured on poly(dT)-coated surfaces. A 'fill-and-lock' step is performed, in which the 'fill' step is performed with natural thymidine and polymerase, and the 'lock' step is performed with fluorescently labelled A, C and G Virtual Terminator (VT) nucleotides and polymerase. This step corrects for any misalignments that may be present in poly(A) and poly(T) duplexes, and ensures that the sequencing starts in the RNA template rather than the polyadenylated tail. b | Imaging is performed to locate the positions of the templates. Then, chemical cleavage of the dye–nucleotide linker is performed to release the dye and prepare the templates for nucleotide incorporation. c | Incubation of this surface with one labelled nucleotide (C-VT is shown as an example) and a polymerase mixture is carried out. After this step, imaging is performed to locate the templates that have incorporated the nucleotide. Chemical cleavage of the dye allows the surface and DNA templates to be ready for the next nucleotide-addition cycle. Nucleotides are added in the C, T, A, G order for 120 total cycles (30 additions of each nucleotide).

Profiling low-quantity RNA samples a | Single-molecule DNA and RNA sequencing technologies could be modified for single-cell applications. Cells can be delivered to flow cells using fluidics systems, followed by cell lysis and capture of mRNA species on the poly(dT)-coated sequencing surfaces by hybridization. Standard sequencing runs could take place on channels with a 127.5 mm2 surface area, requiring 2,750 images to be taken per cycle to image the entire channel area. The surface area needed to accommodate ~350,000 mRNA molecules contained in a single cell is ~0.4 mm2; thus, only eight images per cycle would be needed. Sequence analysis can be done with direct RNA sequencing (DRS) or on-surface cDNA synthesis followed by single-molecule DNA sequencing. b | Counter system workflow. Two probes are used for each target site: the capture probe (shown in red) contains a target-specific sequence and a modification that allows the immobilization of the molecules on a surface; the reporter probe contains a different target-specific sequence (shown in blue) and a fluorescent barcode (shown by a green circle) that is unique to each target being examined. After hybridization of the capture and reporter probe mixture to RNA samples in solution, excess probes are removed. The hybridized RNA duplexes are then immobilized on a surface and imaged to identify and count each transcript with the unique fluorescent signals on the capture and reporter probes.

Reference Zhong, W. et al. RNA-Seq a revolutionary tool for transcriptomics. Nature Reviews Genetics 10, 57 (2009). Fatih, O. et al. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics 12, 87 (2011). Jeffrey, A. M. et al. Next-generation transcriptome assembly. Nature Reviews Genetics 12, 671 (2011) Philipp, K. et al. New class of gene-termini-associated human RNAs suggests a novel RNA copying mechanism. Nature 466, 642 (2010).

Application on dosage compensation model

Background Ohno's hypothesis X-linked genes are expressed at twice the level of autosomal genes per active allele to balance the gene dose between the X chromosome and autosomes. Microarray data (X:AA ~ 1)

Abstract from Xiong et al Mammalian cells from both sexes typically contain one active X chromosome but two sets of autosomes. It has previously been hypothesized that X-linked genes are expressed at twice the level of autosomal genes per active allele to balance the gene dose between the X chromosome and autosomes (termed 'Ohno's hypothesis'). This hypothesis was supported by the observation that microarray-based gene expression levels were indistinguishable between one X chromosome and two autosomes (the X to two autosomes ratio (X:AA) ~1). Here we show that RNA sequencing (RNA-Seq) is more sensitive than microarray and that RNA-Seq data reveal an X:AA ratio of ~0.5 in human and mouse. In Caenorhabditis elegans hermaphrodites, the X:AA ratio reduces progressively from ~1 in larvae to ~0.5 in adults. Proteomic data are consistent with the RNA- Seq results and further suggest the lack of X upregulation at the protein level. Together, our findings reject Ohno's hypothesis, necessitating a major revision of the current model of dosage compensation in the evolution of sex chromosomes.

Expression level definition Taking mouse as an example, we mapped all 25-mer RNA-Seq reads to the genome sequence. Only those reads uniquely mapped to exons were considered as valid hits for a given gene. The expression level of a gene is defined by the number of valid hits to the gene divided by the effective length of the gene, which is the total number of 25-mers in the DNA sequences of the exons of the gene that have no other matches anywhere in the genome. For comparisons between tissues or developmental stages, expression levels were normalized by dividing the total number of valid hits in the sample.

Comparison of gene expressions measured by microarray and RNA-Seq Human liver is considered unless otherwise noted. (a) Estimation variation measured by the fold difference of microarray intensities of two same-target probesets or of RNA-Seq signals from two halves of the same gene. (b) Identical to a, except that mouse liver is considered here. (c) Comparison of the internal consistency of RNA-Seq data and microarray data. The expression differences from one-half of the nucleotides (RNA-Seq) or a probeset (microarray) are shown for 1,000 randomly picked gene pairs each with twofold ± 0.01-fold expression difference from the other half of nucleotides (RNA-Seq) or from the other probeset (microarray). The central bold line shows the median, the box encompasses 50% of data points and the error bars include 90% of data points. (d) Pearson's correlation (r) of microarray and RNA-Seq expression signals (gray) and of RNA-Seq signals from two independent experiments (black). A certain fraction of genes (x axis) with the highest expression according to one of the RNA-Seq datasets are examined. Error bars show 95% confidence intervals estimated by bootstrapping. (e) Microarray consistently underestimates expression differences between genes. The microarray expression differences of 1,000 randomly picked gene pairs each with x-fold (x = 2 ± 0.01, 4 ± 0.02, 8 ± 0.04, 16 ± 0.08, 32 ± 0.16, and 64 ± 0.32) RNA-Seq expression difference are shown. The central bold line shows the median, the box encompasses 50% of data points and the error bars include 90% of data points. (f) Relative liver expressions of 55 mouse genes, measured by RNA-Seq, microarray and qRT-PCR.

Comparisons of RNA-Seq gene expression levels between the X chromosome and autosomes in 12 human tissues and 3 mouse tissues (a) The median expression levels of X-linked genes (closed diamonds) and autosomal genes (open circles) are compared. Median expressions of autosomal genes were normalized to 1. Error bars show 95% bootstrap confidence intervals. Sex information is listed in the parantheses after the tissue names (M, male; F, female; NA, unknown). (b) X:AA ratios of median expressions from the human liver when X is compared to individual autosomes. Error bars show 95% bootstrap confidence intervals.

Test upregulation in Ohno's hypothesis In Ohno's hypothesis, upregulation is needed for those X-linked genes that had existed in the genome before the emergence of the X chromosome; X-linked genes that originated de novo on X presumably do not require upregulation.

Test upregulation in Ohno's hypothesis

Comparison of RNA-Seq gene expression levels of the X chromosome and autosomes in C. elegans

Caveats in this RNA-Seq analysis The Illumina sequencing used here may be biased toward certain sequences or nucleotides. Reverse transcription during cDNA library preparation is likely to be less efficient for longer transcripts. GC content may affect RNA-Seq results. A recent study using time-course microarray data excluded lowly expressed genes, which is inappropriate for measuring the absolute value of X:AA ratio.

Main idea Here we contend that the low estimate of the X:AA ratio by Xiong et al. stems from the disproportionate contribution of transcriptionally inactive genes, which are not relevant for the evaluation of dosage compensation mechanisms, to the X chromosome average. We show that when only active genes are considered, the RNA-seq data give X:AA ratios closer to 1, and the observed minor deviation of the X:AA ratio from 1 is within the range expected when taking into account chromosome-to-chromosome variability

Key notes RPKM (the number of associated reads per kilobase of exonic sequence per million of total reads sequenced.) We assert that the effect of a mechanism that regulates transcriptional dosage compensation pertains only to the expression magnitude of transcriptionally active genes. The fraction of undetected (RPKM = 0) genes is substantially higher on the X chromosome than on autosomes, accounting for as much as 40% of all the X-linked genes. Threshold in the analysis (RPKM >= 1 with at least 3 reads)

Fraction of transcriptionally inactive genes on autosomes and X chromosome

The ratio of the median transcription magnitudes of X-linked and autosomal genes The X:AA ratio estimates are shown based on the set of genes with minimal transcription (RPKM ≥ 1 and at least 3 associated reads). Black error bars show the 95% confidence interval (CI) based on bootstrap estimates incorrectly assuming independence of expression levels for neighboring genes (plotted here for reference; not used to make inferences). Red bars show the range around 1 into which the X:AA ratio is expected to fall (95% CI) in the presence of twofold upregulation of the X chromosome, taking into account interchromosomal variation (sampling of contiguous blocks of X-chromosome size from the autosomal portion of the genome). The observed X:AA values (black dots) in all tissues fall within this range, indicating that the observed transcriptional magnitude of X-linked genes is compatible with the presence of twofold upregulation. The blue bars show the range around 0.5 into which the X:AA ratio is expected to fall in the absence of X-chromosome upregulation (50% of the autosomal expression level). The X:AA estimates for the first five samples fall outside of this range, indicating that the X-linked expression magnitude is significantly higher than that expected in the absence of dosage compensation. The X:AA values for other samples are within both the red and blue ranges, indicating that the two hypotheses (X:AA = 1 and X:AA = 0.5) cannot be clearly distinguished based on these individual data sets.

The chr. 10:A and chr. 11:A ratios illustrating chromosome-to-chromosome variability

Mouse RNA-seq data shows a lack of dosage compensation

Dependence of the X:AA estimates on the RPKM threshold Dependence of the X:AA estimates on the RPKM threshold. The tissue-averaged X:AA estimates are shown (black) as a function of the minimal RPKM threshold, from 0 (all genes, including those with undetected expression) to RPKM ≥2. The error bars correspond to the s.e.m. between different tissues. The largest change in the ratio is observed after exclusion of genes with undetected expression (RPKM >0). As the RPKM thresholds increase, the X:AA ratio largely stabilizes above RPKM = 1. The application of a RPKM threshold increases the median expression level and can artificially shift the X:AA ratio closer to 1. The shaded gray region shows the 95% confidence envelope for the hypothetical X chromosome that is expressed at 50% of the autosomal level (see Supplementary Methods). For non-zero RPKM thresholds, the observed X:AA ratios lie outside of this 95% confidence interval, showing that the high X:AA ratios are increased more than is expected from only setting a RPKM threshold.

Discussion