SMARTAR: small RNA transcriptome analyzer Geuvadis RNA analysis meeting April 16 th 2012 Esther Lizano and Marc Friedländer Xavier Estivill lab Programme.

Slides:



Advertisements
Similar presentations
Small RNA Analysis Gene 760 Jun Lu, PhD
Advertisements

Homology Based Analysis of the Human/Mouse lncRNome
Processing of miRNA samples and primary data analysis
Two short pieces MicroRNA Alternative splicing.
Peter Tsai Bioinformatics Institute, University of Auckland
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
miRNA Discovery and Prediction Algorithms
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
What are microRNAs? Morten Lindow. Fire et al, Nature 1998 Worm embryo under phase contrast In situ staining for mex3 mRNA Mex3 inhibited with anti-sense.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Noble Prize.
MicroRNA genes Ka-Lok Ng Department of Bioinformatics Asia University.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
UTR motifs and microRNA analysis 曾 大 千 助 理 教 授 10/28/2008.
mRNA-Seq: methods and applications
Ongoing microRNA data analyses Geuvadis meeting July 2012 Marc Friedländer and Esther Lizano Xavier Estivill lab Programme for Genes and Disease Center.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Li and Dewey BMC Bioinformatics 2011, 12:323
Geuvadis RNAseq analysis at UNIGE Analysis plans
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Identifying and classifying functional small RNAs from pine Ryan Morin BC Genome Sciences Centre (presenting research conducted in the lab of Dr. Peter.
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
RNAseq analyses -- methods
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
Stefan Aigner Christian Carson Rusty Gage Gene Yeo Crick-Jacobs Center Salk Institute Analysis of Small RNAs in Stem Cell Differentiation.
1 RNA Bioinformatics Genes and Secondary Structure Anne Haake Rhys Price Jones & Tex Thompson.
Welcome Everyone. Self introduction Sun, Luguo ( 孙陆果) Contact me by Professor in School of Life Sciences & National Engineering.
RNA Sequencing I: De novo RNAseq
Sackler Medical School
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
MicroRNAs and Other Tiny Endogenous RNAs in C. elegans Annie Chiang JClub Ambros et al. Curr Biol 13:
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Nature, 2008, Doi: /nature07103 Semrah Kati
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Introduction to RNAseq
Geuvadis achievements and contributions Robert Häsler, functional genomics.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
The iPlant Collaborative
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
For Prediction of microRNA Genes Vertebrate MicroRNA Genes Lee P. Lim, et. al. SCIENCE 2003 The microRNAs of Caenorhabditis elegans Lee P. Lim, et al GENES.
Building Excellence in Genomics and Computational Bioscience miRNA Workshop: miRNA biogenesis & discovery Simon Moxon
Simon v RNA-Seq Analysis Simon v
3rd Internal RECESS workshop Caroline C. Friedel
Figure 1. The overall workflow of RNA-seq QC
From: Emerging Roles for MicroRNAs in Perioperative Medicine
RNA-Seq analysis in R (Bioconductor)
Transcriptomics II De novo assembly
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Kallisto: near-optimal RNA seq quantification tool
MicroRNAs: regulators of gene expression and cell differentiation
Proteomics Informatics David Fenyő
Baekgyu Kim, Kyowon Jeong, V. Narry Kim  Molecular Cell 
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Sequence Analysis - RNA-Seq 2
Schematic representation of a transcriptomic evaluation approach.
RNA-Seq Data Analysis UND Genomics Core.
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

SMARTAR: small RNA transcriptome analyzer Geuvadis RNA analysis meeting April 16 th 2012 Esther Lizano and Marc Friedländer Xavier Estivill lab Programme for Genes and Disease Center for Genomic Regulation (CRG)

Overview of analysis 1: quality control 2: pre-processing 3: length profiling 8: visualization of data 9: summary of results 4 : genome mapping 5: annotation breakdown 6: miRNA de novo discovery 7: profiling of (iso-) miRNAs

1: quality control 1a: sequencing reads are trimmed to 36 nts (to reduce batch effect) 1b: PHRED quality scores are visualized software: FASTX package from Hannon lab

1: quality control (“good” quality example)

2: pre-processing 2a: homo-polymer filtering (if 33 or more of the nts are the same) 2b: quality filtering (if more than 50% of the nts have PHRED 10 or less) 2c: adaptor clipping (searches for first 8 nts of adapter, 1 mismatch allowed) 2d: identical sequences are collapsed (to FASTA format) 2e: length filtering (if clipped sequence is less than 18 nts) software: seqbuster, adrec, mapper.pl, FASTX.

3: length profiling (the “good”)

3: length profiling (the “bad”)

3: length profiling (the “ugly”)

4: miRNA de novo discovery 4a: reads are mapped stringently to the genome 4b: potential miRNA hairpins are excised using mappings as guidelines 4c: for each potential miRNA hairpin a score is assigned based on: – RNA structure and – positions of sequenced RNAs software: miRDeep2

Figure from Friedländer et al., Nature Biotech : miRNA de novo discovery when miRNA precursors are processed by the Dicer protein, three products are released (a) when mapped back to the precursor, these products will fall in a particular pattern (the ‘signature’) in contrast, random degradation will not follow this pattern (b) the fit of sequenced RNA to this model of biogenesis is scored probabilistically by miRDeep

5: profiling of miRNAs and isoforms reads are aligned to precursors allowing 1 mismatch and 1 addition in the 3’ end if a read locates within the boundary of the annotated mature miRNA (plus / minus 3 nts) it is annotated as such if a read maps equally well to more than one miRNA, it is counted towards all of them software: seqbuster, miraligner

seqbuster vs. mirdeep2 profiling

6: genome mapping 6a: reads are mapped to the genome, allowing one mismatch and an infinite number of mappings also included are the unassembled parts of the human genome and genomes of known human viral pathogens 6b: the mappings are converted to nucleotide-resolution intensities e.g. if a nucleotide is covered by a read with a single genome mapping and a read with ten genome mappings, it will be assigned an intensity of = 1.1 software: bowtie and custom source

7: annotation breakdown 7a: intensities are intersected with simple annotation (15 classes) 7b: intensities are intersected with detailed annotation (40 classes) 7c: intensities are intersected with individual gene annotations (>3 million classes) annotations are based on GENCODE version 8, but custom annotations are used for miRNAs, snoRNAs, rRNAs, LINEs, Alus, introns and anti-sense annotations. each nucleotide on each strand has exactly one annotation. This is resolved using a priority hierarchy software: custom source

7: simple annotation breakdown (the “good” and the “ugly”)

8: visualization (the miRNA “onco-cluster”) miRNA precursor hairpins miRNA primary transcript sRNA data peaks

9: summary 10a: summary is given how many reads were: – quality filtered – length filtered – not mapped to the genome – successfully mapped to the genome typically around >80% of the reads are mapped to the genome. software: custom source

Current state of analysis 1: quality control 2: pre-processing 3: length profiling 8: visualization of data 9: summary of results 4 : genome mapping 5: annotation breakdown 6: miRNA de novo discovery 7: profiling of (iso-) miRNAs to do done

Outlook decide what data-sets should be discarded (>80% mapped, <25% miRNA) miRNA prediction and quantification (including custom allele-specific sequences) allele-specific gene expression correlate with genome variants (eQTL analysis) correlate with target mRNAs (TargetScanS v6 target predictions)

Acknowledgements Data generation: Esther Lizano Tuuli Lappalainen Analysis: Marc Friedländer Esther Lizano Source development: Marc Friedländer Lorena Pantano Allele-specific sequences: Tuuli Lappalainen Supervisor: Eulàlia Martí Xavier Estivill Fellowships: MF acknowledges EMBO long-term fellowship