Small RNA Analysis Gene 760 Jun Lu, PhD 2013-02-25.

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Gene Expression and Regulation
Control of Gene Expression
LECTURE 17: RNA TRANSCRIPTION, PROCESSING, TURNOVER Levels of specific messenger RNAs can differ in different types of cells and at different times in.
DNA Technology & Gene Mapping Biotechnology has led to many advances in science and medicine including the creation of DNA clones via recombinant clones,
Processing of miRNA samples and primary data analysis
Lecture 4: DNA transcription
Improving miRNA Target Genes Prediction Rikky Wenang Purbojati.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Ribosome Profiling Library Preparation with SOLID Nate Blewett MGL Users Group May 4 th 2015.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Tyson A. Clark, Ph.D. February 11, 2015
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
(CHAPTER 12- Brooker Text)
RNA.
CHAPTER 3 GENE EXPRESSION IN EUKARYOTES (cont.) MISS NUR SHALENA SOFIAN.
mRNA-Seq: methods and applications
 Transcript Processing  Protein Folding  RNAi  Gene Repair.
Control of Gene Expression Eukaryotes. Eukaryotic Gene Expression Some genes are expressed in all cells all the time. These so-called housekeeping genes.
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
1 Genetics Faculty of Agriculture Instructor: Dr. Jihad Abdallah Topic 13:Recombinant DNA Technology.
Identifying and classifying functional small RNAs from pine Ryan Morin BC Genome Sciences Centre (presenting research conducted in the lab of Dr. Peter.
Target mRNA abundance dilutes microRNA and siRNA activity Aaron Arvey ISMB 2010 MicroRNA Mike needs help to degrade all the mRNA transcripts!
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Stefan Aigner Christian Carson Rusty Gage Gene Yeo Crick-Jacobs Center Salk Institute Analysis of Small RNAs in Stem Cell Differentiation.
Invitrogen Corporation 1600 Faraday Ave. Carlsbad, CA USA Tel: FAX: Toll Free Tel:
Chapter 14 – RNA molecules and RNA processing
The iPlant Collaborative
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
Molecular Testing and Clinical Diagnosis
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Control of Gene Expression Chapter Proteins interacting w/ DNA turn Prokaryotic genes on or off in response to environmental changes  Gene Regulation:
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
PPT-1. Experiment Objective: The objective of this experiment is to amplify a DNA fragment by Polymerase Chain Reaction (PCR) and to clone the amplified.
Introduction to RNAseq
SMARTAR: small RNA transcriptome analyzer Geuvadis RNA analysis meeting April 16 th 2012 Esther Lizano and Marc Friedländer Xavier Estivill lab Programme.
Molecular Tools. Recombinant DNA Restriction enzymes Vectors Ligase and other enzymes.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
The genetic engineers toolkit A brief overview of some of the techniques commonly used.
Canadian Bioinformatics Workshops
3rd Internal RECESS workshop Caroline C. Friedel
Eukaryotic Gene Regulation
Gel Electrophoresis Technique for separating DNA molecules based on size Load DNA mixture into gel containing pores of varying sizes Subject DNA to electric.
RNA Quantitation from RNAseq Data
Gene expression from RNA-Seq
Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs
Expression of the Genome
Chapter 20: DNA Technology and Genomics
Design and Analysis of Single-Cell Sequencing Experiments
Expression of the Genome
Chapter 14 Bioinformatics—the study of a genome
One-Step Ligation on RNA Amplification for the Detection of Point Mutations  Lei Zhang, Jingjing Wang, Mia Coetzer, Stephanie Angione, Rami Kantor, Anubhav.
Small RNA Sample Preparation
Ahyeon Son, Jong-Eun Park, V. Narry Kim  Cell Reports 
Small RNA Modifications: Integral to Function and Disease
Expression of the Genome
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
Baekgyu Kim, Kyowon Jeong, V. Narry Kim  Molecular Cell 
Chapter 20: DNA Technology and Genomics
Sequence Analysis - RNA-Seq 2
Relationship between Genotype and Phenotype
Sequence Analysis - RNA-Seq 1
A Splicing-Independent Function of SF2/ASF in MicroRNA Processing
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Small RNA Analysis Gene 760 Jun Lu, PhD 2013-02-25

Overview Small RNA Basics Types of Small RNAs miRNAs and Other Small RNAs Chemical Structures of Small RNAs Non-templated Modification Small RNA Deep Sequencing Other Methods to Quantify miRNAs Data Analysis

Small RNA Basics Types of small RNAs miRNAs and its precursors piRNAs Endogenous and exogenous siRNAs snoRNAs and its derivatives tRNA and its derivatives Transcriptional start site associated small RNAs Enhancer Associated RNAs (eRNAs) Repeat associated small RNAs Many other types of small RNAs (often without deep understanding) Breakdown products from longer RNAs Artificial biochemical products

microRNAs are processed for maturation Primary miRNA Precursor miRNA mature miRNA Ago Proteins Winter et al. Nat Cell Bio 2009

Small RNA Basics miRNAs The same mature miRNA can be produced from multiple loci in the genome Hsa-let-7a-1, chr 9 Hsa-let-7a-2, chr 11 Hsa-let-7a-3, chr 22

Small RNA Basics miRNAs Sequence Isoforms (Length, Position(start, end))

piRNAs PIWI-interacting RNAs Generally larger than miRNAs (~26 to 31 bases; different size range in different species) Khurana et al, JCB 2010

Small RNA Basics Types of small RNAs Rother and Meister, Biochimie 2011

Small RNA Basics Types of small RNAs—artificial Reaction Products Example: HITS-CLIP Chi et al. Nature 2009

Small RNA Basics chemical structures RNaseIII products have 5’ phosphate group, and 3’ OH group But not all small RNAs have the same chemical structure Without 5’ phosphate 5’ Gppp cap instead of 5’ phosphate 2’-OMe modification at 3’ end 5’-P OH-3’

Small RNA Basics Non-templated modifications 3’ Tailing Single or mutliple nucleotide additions, such as U addition at the end Can be based on target as a template—but not the generating locus as a template RNA editing ADAR enzymes A->I->reverse transcribe as if it is G

Overview Small RNA Basics Small RNA Deep Sequencing Ligation-mediated Amplification Illumina Small RNA Library Preparation Considerations when using the Standard Library Prep Protocol Alternative Bench-Level Preparations and Choices in Sequencing Parameters Other Methods to Quantify miRNAs Data Analysis

Small RNA deep Sequencing Ligation-mediated AMPlification miRNAs 5’-P OH-3’ Gel-Purify Product 3’ Adaptor 5’-P B T4 RNA Ligase, ATP 5’-P B 5’ Adaptor Gel-Purify Product OH-3’ T4 RNA Ligase, ATP B RT B PCR

Small RNA deep Sequencing Gel purification to avoid adaptor Dimer B OH-3’ T4 RNA Ligase, ATP B RT-PCR

Small RNA deep Sequencing Use of Pre-adenylated 3’ adaptor miRNAs 5’-P OH-3’ 3’ Adaptor 5’-P B T4 RNA Ligase, ATP Self-circularization Product 5’-P B Pre-adenylated 3’ Adaptor 3’ Adaptor App B T4 RNA Ligase 2 Truncated, no ATP

Small RNA deep Sequencing Current Illumina workflow Total RNA Or Purified Small RNA 5’-P OH-3’ 3’ Adaptor App B T4 RNA Ligase, ATP 5’-P B Tabacco Acid Pyrophosphatase 5’-P B B 5’ Adaptor OH-3’ T4 RNA Ligase, ATP B RT B PCR

Small RNA deep Sequencing Considerations when using standard Lib Preparation Rely on the presence of 5’phosphate (depending on the need of analysis) Use of pyrophosphatase may introduce some capped small RNAs T4 RNA Ligase has some sequence preferences for substrates; T4 RNA Ligase 2 Truncation/mutations may have a different spectrum of sequence preference—sequencing reads do not 100% reflect relative abundance Use of total RNA or purified small RNAs may generate quantitatively different profiles

Small RNA Deep Sequencing alternatives and sequencing parameters Gel purification of small RNAs with a specific size range (use denaturing polyacylamide gel) Phosphatase treat + T4 polynucleotide kinase to capture small RNAs without 5’ phosphorylation Use polyA tailing + RT instead of using a sequence-specific 3’-adaptor Length of sequencing run 50 bases single end sequencing is common on Illumina

Overview Small RNA Basics Small RNA Deep Sequencing Other Methods to Quantify miRNAs Microarray qRT-PCR Data Analysis

Other Methods of miRNA quantification Microarrays Use ligation-mediated amplification to label miRNAs E.g. with a biotinylated primer during PCR Use other labeling techniques (use different criteria) Agilent Method

Other Methods of miRNA quantification qRT-PCR Key-lock-like RT strategy PolyA tailing strategy ABI Method Qiagen Method

Overview Small RNA Basics Small RNA Deep Sequencing Other Methods to Quantify miRNAs Data Analysis Existing Tools Adaptor Removal Mapping Quantification of Expression Small RNAs other than miRNAs

Data Analysis Available Tools miRDeep miRDeep2 miRCat miRAnalyzer miRTools And others

Data Analysis Available Tools—miRDeep2 Run under Unix/Linux environment Perl-based Utilize Bowtie (v1) for mapping and RNAfold for folding RNA structures

Data Analysis STEP 1: Remove Adaptors This is quite unique to small RNA sequencing analysis, because what you sequence is short RNAs Sequencing Primer miRNA 5’ Adaptor 3’ Adaptor 50 bases

Data Analysis STEP 1: Remove Adaptors—Details matter Adaptors were not synthesized to 100% purity! Standard miRDeep2 package allows removing only a single adaptor sequence. Match first 6 bases of the adaptor to each sequence after 18 nt If there is no match, sequentially match 5, 4, 3, 2, 1 of adaptor bases to the end of each read. Some issues of such an algorithm Single adaptor removal may lead to loss of reads and change of size distribution 6nt match may to be short, and may cut off real RNA sequences. Ignored small RNAs less than 18 nt in length, which may be helpful to understand small RNA mechanisms Artificially create reads in the 47, 48, 49 bp range due to non-stringent adaptor matches at the end of reads

Data Analysis STEP 1: Remove Adaptors Single adaptor removal drawbacks Lose ~ 16 % of reads in the following example, can distort size distribution for specific small RNAs TAGCTTATCAGACTGATGTTGACT 533006 reads TAGCTTATCAGACTGATGTTGACTTGGACTTCTCGGGTGCCAAGGAACTC 87857 reads Different ratios of adaptor-variants for different small RNAs, likely a sequence- dependent phenomenon AACCCGTAGATCCGAACTTGTGA 666783 reads AACCCGTAGATCCGAACTTGTGATGGACTTCTCGGGTGCCAAGGAACTCC 69 reads 0.01%

Data Analysis STEP 1: Remove Adaptors Adaptors were not synthesized to 100% purity! Standard miRDeep2 package allows removing only a single adaptor sequence. Single adaptor removal drawbacks Modification 1. allow removing 2 (or more) adaptor sequence variants. 2. use a user-defined length of adaptor for sequence match (e.g. 10nt) 3. no limitation on the size of small RNA to be 18nt or more; instead, give user the option to define it. 4. do not remove end bases if there are only 3 or fewer nt matches to adaptor, again user definable for this cutoff.

Details matter! By removing one extra adaptor variant # of reads Length (Nt)

Data Analysis Mapping Many identical reads for the same RNA, often associated with miRNAs. E.g TCGTACGACTCTTAGCGG x5733052 times in one run (~10% of all reads!) Reducing reads by “collapsing” reads of the same seq can significantly save time in alignment Can reduce seqs by >20 fold—depending on miRNA abundance in cell Can align to different regions on the genome—i.e. not unique in mapping If sequence is too short, it may generate too many hits in the genome Consider non-templated modifications Non-templated tailing in small RNAs Need to distinguish tailing vs. adaptor impurity RNA Editing

Data Analysis Mapping Bowtie or Bowtie2 Mapping to known small-RNA-generating-sequence collections E.g. precursor miRNA collection (downloadable from miRBASE) Or snoRNA collections, or tRNA collections Benefit: can reduce mapping time; can allow all non-unique mapping instances; Can tolerate more mismatches for understanding of non-templated modifications Drawback: can only inform those at known loci Mapping to genome directly Can help interpret modifications vs imperfect mapping conditions Can help identify new small RNA regions

Data Analysis Mapping Hsa-miR-125b-1 Hsa-miR-125b-2 What the mapping cannot tell: If there are RNA editing events, since many small RNAs have defined starting sites, it may be more difficult to differentiate between real RNA editing vs sequencing or PCR introduced errors. If one miRNA can come from multiple loci, it is not possible to differentiate which loci the small RNA come from, even though it is possible to tell the opposite strand. Hsa-miR-125b-1 Hsa-miR-125b-2

Data Analysis Quantification of Expression Problem---how to normalize sequencing data? Can be especially problematic for small RNA data 0 Hour 12 Hour

Data Analysis Quantification of Expression Problem---how to normalize sequencing data? 0 Hour 12 Hour

Data Analysis Quantification of Expression Problem---how to normalize sequencing data? Use total reads to normalize—most commonly used but may introduce artifacts. Assume total/mean miRNA is the same Quantile normalization Use Spike-in controls Spike-in controls are artificial small RNA sequences that can be used as “loading controls” Spiked into initial RNA samples Multiple spike-in RNAs should be used simultaneously to avoid relying on a single sequence to normalize data

Data Analysis Quantification of Expression How to summarize given positional variations Allow some flanking bases for tolerance Depending on the aim of the analysis (e.g. seed sequence)

Data Analysis Small RNAs other Than miRNAs Use transcriptional start site associated small RNA as an example Adaptor removal Collapse reads based on sequence Map to known small RNA generating loci Map the leftover sequences to genome Align the mapped positions relative to transcriptional start sites

Data Analysis Small RNAs other Than miRNAs Use transcriptional start site associated small RNA as an example

Summary Small RNA Basics Variations associated with small RNAs Small RNA Deep Sequencing Biochemical reactions determine interpretation of analysis Other Methods to Quantify miRNAs Useful in validating results Data Analysis Key steps in processing small RNA data Pay attention to details in bench and bioinformatic methods