Transcriptomics Jiri Zavadil, PhD Molecular Mechanisms and Biomarkers International Agency for Research on Cancer, Lyon
Transcriptomics - Definitions Transcriptome - the complete set of RNA transcripts produced by the genome at a given time Transcriptome is highly dynamic and complex in comparison to the relatively stable genome Transcriptomics - the global study of gene expression at the RNA level - can include genes for ncRNAs (microRNAs etc)
Biospecimens in I4C Mother-Child and Infant-Child Cohorts Blood spots Cord blood Whole blood Genetic, epigenetic, transcriptomic analyses (nucleic acids) Proteomic analysis, serological and chemical analyses Urine Chemical, proteomic and nucleic acid analysis Tumor cells, tissues
Case for Integrated Omics Analyses The prospective biospecimen collection and retrospective case analysis will yield interconnected results Epigenetics gene regulation RNA and protein markers DNA methylation Histone modification Studied by transcriptomics
Transcriptomics – Applications for I4C Specific gene expression Genes and signatures determined by particular genetic, epigenetic regulatory factors, environmental exposures Exploratory approaches Not hypothesis driven, e.g global gene expression in tumors versus healthy tissues, differential responses to distinct environmental exposures Disease etiology and classification Patterns/signatures rather than single markers can improve knowledge about etiology and diagnosis
DNA Microarray Platforms Illumina BeadArray Affymetrix GeneChip Workflow Reverse transcription, IVT with labeled nucleotides, array hybridization, staining, washing scanning Pros/Cons Rapid and streamlined protocols, standardized analysis; biased target collection, levels but limited sequence information
MicroRNA - TaqMan Low Density Array Quantile Normalization Total RNA Sample ABI 7900 SDS HT miRNA TLDA Array 742 total target miRs Quantile Normalization Pros/Cons Quantitative abundance analysis; biased target collection
Integrated Molecular Profiling By MPS Cancer Genome Sequencing Massively Parallel Sequencing (MPS) - powerful nucleic acid analysis tool providing base-pair resolution information at the genome scale Stratton, MR. Science 331, 1553 (2011)
Massively Parallel Sequencing emPCR ABI SOLiD 5500 Accuracy < 99.99% Throughput/Day <10–15 Gb Throughput/Run <90 Gb or >1.4 B reads (paired-end or mate-paired runs) Samples/Run • 1 genome • 12 exomes • 6 transcriptomes
Massively Parallel Sequencing Bridge amplification, clonal expansion Illumina HiSeq2000/2500 6 human genomes at 30x 64 transcriptomes at 20M mapped reads/sample
mRNA Abundance Analysis RPKM (Reads Per Kilobase per Million mapped reads) FPKM (Fragments Per Kilobase per Million mapped reads) Methods of quantifying gene expression levels from RNA-seq data by normalizing for total read length and the number of sequencing reads or fragments (PE reads). Unnormalized data Scaling Normalization Quantile Normalization -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 log2(RPKM) A1 A2 A3 A1 A2 A3 A1 A2 A3 Equivalent distribution Identical distribution (spread, range and median)
Differential mRNA Abundance Analysis ACSL5 – normalized differential abundance ratio = 8.4
Single Nucleotide Variant Analysis DNA RNA Non-syn SNV/mutation identified at both DNA/RNA levels
Acceptor Splice Sites Mutated in UUC mRNA Splicing Aberrations 5’ Exon N GU------A-----AG Exon N+1 3’ Tumor RNA Tumor DNA The answer is yes, it does although the numbers vary. It is important to control these analyses for DNA contents as we can see here. Normal DNA
Stage-Specific RNA Aberrations in ALL Mutant Allele Frequency 6 matched diagnosis/relapse pediatric ALL samples (n=12) RNA-seq to discover novel mutations specific to relapse disease Targeted amplicon resequencing at ultra-deep coverage Patient ID Mutation Mutant Allele Frequency (17,000-50,000x coverage) Diagnosis Relapse 719515 p.R238W 0.01% 27% 737185 18% 761159 31% 756421 p.R367Q 0.02% 25% 716996 p.K404KD 55% 763368 p.S408R 50% 769886 p. S445F 728610 p. L626F 49% 726584 p. E274Q 19% p. S171I 43% p. M244L 0.38% 47% p. A53V 20%
Solutions for Low Yield Samples Microarray and RNA-seq Transcriptome Profiling Possible with >10 picograms total RNA Degraded samples, RIN scores >2.0 Formalin-fixed, paraffin-embedded (FFPE) samples Whole blood Direct cell lysate from the equivalent of a single or a few cells microRNA Profiling megaplex amplification protocols - 1-350 ng total RNA non- amplification based for 350 – 1000 ng total RNA
The Future of MPS-based OMICS The Economist, 2011 MPS cost goes down, technologies become more advanced and powerful, platforms develop rapidly – a strong case for transcriptomics within integrated omics approaches applied to large cohorts such as I4C.
Considerations for I4C Transcriptomics Low yield samples (blood spots, extracellular microRNAs) might require application of amplification methods Tissue and cell specificity of gene expression (e.g. cord blood vs leukemic clone) – need for carefully matched controls Only genes and RNAs expressed at the time of sampling are detected Depth of coverage needs for RNAseq affect cost-related decisions Specific disease progression stages might mask etiology-associated aberrations Bioinformatics – limited standards for complex data processing and analysis (RNAseq), more benchmarking studies needed using data from consortia-like efforts (FDA’s SEQC). Data storage and access solutions.
Thank you….