Objectives Genome-wide investigation – to estimate alternate Poly-Adenylation (APA) usage on 3’UTR – to identify polymorphism of Downstream Sequence Elements.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

Ch 17 Gene Expression I: Transcription
Transcriptome Sequencing with Reference
RNAseq analysis Bioinformatics Analysis Team
RNA-seq data analysis Project
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Eukaryotic Gene Finding
Pathogen Informatics 21 st Nov 2014 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics.
RNA-seq Analysis in Galaxy
Eukaryotic Gene Finding
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
RNAseq analyses -- methods
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
NIH Extracellular RNA Communication Consortium 2 nd Investigators’ Meeting May 19 th, 2014 Sai Lakshmi Subramanian – (Primary
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Review of Protein Synthesis. Fig TRANSCRIPTION TRANSLATION DNA mRNA Ribosome Polypeptide (a) Bacterial cell Nuclear envelope TRANSCRIPTION RNA PROCESSING.
Sackler Medical School
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Introduction to RNAseq
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
Genome-wide association study between DSE polymorphism and Poly-A usage in Human population Hiren Karathia Sridhar Hannenhalli.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.
No reference available
GENE REGULATION RESULTS IN DIFFERENTIAL GENE EXPRESSION, LEADING TO CELL SPECIALIZATION Eukaryotic DNA.
Comparative transcriptomics of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Ligate tags SAGE: Procedure Digest with “Tagging enzyme” BsmFI tm Isolate mRNA, RT to cDNA Digest with “Anchoring.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
HOMER – a one stop shop for ChIP-Seq analysis
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
The regulation of Caspase 8 chIP-seq motifs mRNA expression DNA methylation.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Introductory RNA-seq Transcriptome Profiling
C acaatataATGGAGCGTGAGACTTCGTCATCTTCAACTCCTCCGGAGGATCTTGTTACATCGATGATCGGAAAGTTCGTCGCTGTCATGTCTA b acaatataATGGAGCGTGAGACTTCGTCATCTTCAACTCCTCCGGAGGATCTTGTTACATCGATGATCGGAAAGTTCGTCGCTGTCTTGTCTA.
Visualization of genomic data
Genome organization and Bioinformatics
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Additional file 2: RNA-Seq data analysis pipeline
Quality Control & Nascent Sequencing
Presentation transcript:

Objectives Genome-wide investigation – to estimate alternate Poly-Adenylation (APA) usage on 3’UTR – to identify polymorphism of Downstream Sequence Elements (DSEs) motifs Correlation of the APA usage and DSE polymorphisms in Human population

Mechanism of Poly-Adenylation

Annotation status of Poly-A sites on 3’UTR of Human Genome (hg19 – 2009) 37% - Multiple Poly-A points Target of the analysis

Locations of annotated multiple PA locations on 3’UTR PA1 JunctionPA2 JunctionStop Codon PA1 Junction PA2 Junction Stop Codon PAs on same exon PAs on multiple exons r = p = 8.44e Poly-A Location Length of 3’UTR

RNA-Seq processing for Human Samples Sample Fastq files BWA Samtools BAM fileMerged BAM file Samtools Sorted BAM file De-duplicated file Picard tool Indexing the BAM Samtools SAM file Calculate Coverage Bed tools Calculate Relative usage of PAs Python script SymbolGroup of SamplesMaleFemaleDNARNA BRBritish in England and Scotland22 FIFinnish in Finland22 UTUtah residents with Northern and Western European ancestry11 YOYoruba in Ibadan, Nigeria11 Differential Expression of UTR Cuffdiff tools Python script De-novo assembly

Calculate relative usage on 3’UTR PA1 Coverage PA2 Coverage PA1 JunctionPA2 Junction Complete UTR coverage Coverage (Stop codon – PA1 junction) PA1 Usage = Complete (3’ UTR) Coverage (Stop codon – PA1 junction) PA1 Usage = Complete (3’ UTR) Coverage (PA1 junction – PA2 junction) PA2 Usage = Coverage (3’UTR) Coverage (PA1 junction – PA2 junction) PA2 Usage = Coverage (3’UTR) Stop Codon Cleaved 3’UTR

Integrated mode finding and mapping the DSE on Genome Ref genome Sample – 1 RNA-Seq De-novo assembly of downstream RNA fragment Search for DSE motif

Frequency of Poly-A usage in the samples

Inter/Intra group correlation of a PA usage r = 0.8; p = 0.0 r = 0.98; p = 0.0 PA1 usage BR1 – BR2FN1 – FN2 BR1 – FN1

Correlation of different PA usage PA1 – PA2PA2 – PA3 r = ; p = 0.0 r = ; p = 1.06e -33

Differential Expression of complete 3’UTR

Statistics of predicted DSE motifs SamplePA typeMean(Motif Length)Max(Motif Length)Min(Motif Length)Mean(Distance)Max(Distance)Min(Distance) BR-1 Single Multiple BR-2 Single Multiple FN - 1 Single Multiple Find Polymorphism in the DSEs Find Correlation between the PA-usage and DSE polymorphism Pending

Thank you !!

Complete 3’UTR coverage VS Alternate 3’UTR coverage Differential expression of complete 3’UTR usageDifferential expression of PA Usage

Poly Adenylation Usage on 3’UTR PA1 CoveragePA2 Coverage PA1 JunctionPA2 Junction Complete UTR coverage PA1 Coverage Relative PA1 Usage = Longest UTR Coverage PA1 Coverage Relative PA1 Usage = Longest UTR Coverage PA2 Coverage Relative PA2 Usage = Longest UTR Coverage PA2 Coverage Relative PA2 Usage = Longest UTR Coverage Stop Codon Intron Cleaved 3’UTR

DSE statistic SamplePA typeMean(Motif Length)Max(Motif Length)Min(Motif Length)Mean(Distance)Max(Distance)Min(Distance) BR-1 Single Multiple BR-2 Single Multiple FN - 1 Single Multiple

+ strand - strand Gene Strand Template Strand + Read - Read RNA Strand DNA Strand