Genome-wide association study between DSE polymorphism and Poly-A usage in Human population Hiren Karathia Sridhar Hannenhalli.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

Ch 17 Gene Expression I: Transcription
Unit #3 Schedule: Last Class: – Sanger Sequencing – Central Dogma Overview – Mutation Today: – Homework 5 – StudyNotes 8a Due – Transcription, RNA Processing,
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Eukaryotic Gene Finding
RNA Molecules and RNA Processing Functions and Modifications of RNA Molecules.
RNA-seq Analysis in Galaxy
Eukaryotic Gene Finding
Step 1 of Protein Synthesis
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Relationship between Genotype and Phenotype
Day 2! Chapter 15 Eukaryotic Gene Regulation Almost all the cells in an organism are genetically identical. Differences between cell types result from.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Shine-Dalgarno Motif Ribosome binding site located about 13 bases upstream of AUG start codon SD sequence is: 5’-AGGAGGU-3’ Middle GGAG is more highly.
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Expression Analysis of RNA-seq Data
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
RNAseq analyses -- methods
Initiating translation
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Review of Protein Synthesis. Fig TRANSCRIPTION TRANSLATION DNA mRNA Ribosome Polypeptide (a) Bacterial cell Nuclear envelope TRANSCRIPTION RNA PROCESSING.
Sackler Medical School
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Fig b6 Template strand RNA primer Okazaki fragment Overall direction of replication.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Introduction to RNAseq
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
.1Sources of DNA and Sequencing Methods.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 2 Genome Assembly.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.
No reference available
GENE REGULATION RESULTS IN DIFFERENTIAL GENE EXPRESSION, LEADING TO CELL SPECIALIZATION Eukaryotic DNA.
Comparative transcriptomics of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Objectives Genome-wide investigation – to estimate alternate Poly-Adenylation (APA) usage on 3’UTR – to identify polymorphism of Downstream Sequence Elements.
Ligate tags SAGE: Procedure Digest with “Tagging enzyme” BsmFI tm Isolate mRNA, RT to cDNA Digest with “Anchoring.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
HOMER – a one stop shop for ChIP-Seq analysis
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Eukaryotic Gene Regulation
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
SNP and Genomic analysis SNP/genomic signature Clinical sampling Personalized chemotherapy Personalized Targeted therapy Personalized RNA therapy Personalized.
C acaatataATGGAGCGTGAGACTTCGTCATCTTCAACTCCTCCGGAGGATCTTGTTACATCGATGATCGGAAAGTTCGTCGCTGTCATGTCTA b acaatataATGGAGCGTGAGACTTCGTCATCTTCAACTCCTCCGGAGGATCTTGTTACATCGATGATCGGAAAGTTCGTCGCTGTCTTGTCTA.
RNA and Protein Synthesis
Volume 10, Issue 7, Pages (July 2017)
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Daily Warm-Up Dec. 11th -What are the three enzymes involved with replication? What is the function of each? Homework: -Read 13.1 Turn in: -Nothing.
RNA and Protein Synthesis
Gene Sizes Vary Strachan p146 DYSTROPHIN.
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
Additional file 2: RNA-Seq data analysis pipeline
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Gene Structure.
Gene Structure.
Presentation transcript:

Genome-wide association study between DSE polymorphism and Poly-A usage in Human population Hiren Karathia Sridhar Hannenhalli

Transcription & Polyadenylation (Poly-A)

Objectives Genome-wide estimation of alternate Poly-A (PA) usage on 3’UTR Genome-wide Prediction and investigation of polymorphisms in DSE (Downstream Sequence Element) motifs Population-wide correlation study between the PA usage and DSE polymorphisms

Annotation status of Poly-A sites on 3’UTR of Human Genome (hg19 – 2009) 37% - Multiple Poly-A points Target of the analysis

RNA-Seq processing for Human Samples Sample Fastq files BWA Samtools BAM fileMerged BAM file Samtools Sorted BAM file De-duplicated file Picard tool Indexing the BAM Samtools SAM file Calculate Coverage Bed tools Calculate Relative usage of PAs Python script SymbolGroup of SamplesMaleFemaleDNARNA BRBritish in England and Scotland11 FIFinnish in Finland11 UTUtah residents with Northern and Western European ancestry11 YOYoruba in Ibadan, Nigeria11 Differential Expression of UTR Cuffdiff tools Python script De-novo assembly

Genome-wide estimation of alternate Poly-A (PA) usage on 3’UTR PA1 Coverage PA2 Coverage PA1 JunctionPA2 Junction Complete UTR coverage Coverage (Stop codon – PA1 junction) / Distance PA1 Usage = Complete (complete 3’ UTR) / Distance Coverage (Stop codon – PA1 junction) / Distance PA1 Usage = Complete (complete 3’ UTR) / Distance Coverage (Stop codon - PA2 junction) / Distance PA2 Usage = Coverage (complete 3’UTR) / Distance Coverage (Stop codon - PA2 junction) / Distance PA2 Usage = Coverage (complete 3’UTR) / Distance Stop Codon Cleaved 3’UTR

Prediction of DSE Coding Strand of DNA Sample A  RNA-Seq Sample A  DNA-Seq De-novo assembled 3’UTR fragment Prediction of DSE motif Template Strand of DNA

Frequency of Poly-A usage in the samples

Correlation of different PA usage in a Human Sample PA1 – PA2PA2 – PA3 r = ; p = 0.0 r = ; p = 1.06e -33

Correlation of PA usage and corresponding DSE polymorphism

Functional enrichment of Genes associated with Differential PA Usage and Polymorphic for of DSEs in Population

Thank you !!

Differential Expression of complete 3’UTR

Inter/Intra group correlation of a PA usage r = 0.8; p = 0.0 r = 0.98; p = 0.0 PA1 usage BR1 – BR2FN1 – FN2 BR1 – FN1

Statistics of predicted DSE motifs SamplePA typeMean(Motif Length)Max(Motif Length)Min(Motif Length)Mean(Distance)Max(Distance)Min(Distance) BR-1 Single Multiple BR-2 Single Multiple FN - 1 Single Multiple Find Polymorphism in the DSEs Find Correlation between the PA-usage and DSE polymorphism Pending

Alternate Poly-A selection mechanism

Complete 3’UTR coverage VS Alternate 3’UTR coverage Differential expression of complete 3’UTR usageDifferential expression of PA Usage

Poly Adenylation Usage on 3’UTR PA1 CoveragePA2 Coverage PA1 JunctionPA2 Junction Complete UTR coverage PA1 Coverage Relative PA1 Usage = Longest UTR Coverage PA1 Coverage Relative PA1 Usage = Longest UTR Coverage PA2 Coverage Relative PA2 Usage = Longest UTR Coverage PA2 Coverage Relative PA2 Usage = Longest UTR Coverage Stop Codon Intron Cleaved 3’UTR

DSE statistic SamplePA typeMean(Motif Length)Max(Motif Length)Min(Motif Length)Mean(Distance)Max(Distance)Min(Distance) BR-1 Single Multiple BR-2 Single Multiple FN - 1 Single Multiple

+ strand - strand Gene Strand Template Strand + Read - Read RNA Strand DNA Strand

Locations of annotated multiple PA locations on 3’UTR PA1 JunctionPA2 JunctionStop Codon Cleaved 3’UTR PA1 Junction PA2 Junction Stop Codon PAs on same exon PAs on multiple exons r = p = 8.44e Poly-A Location Length of 3’UTR