EDACC Primary Analysis Pipelines Cristian Coarfa Bioinformatics Research Laboratory Molecular and Human Genetics.

Slides:



Advertisements
Similar presentations
Functional Genomics with Next-Generation Sequencing
Advertisements

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Modules 6 and 7: Genboree and Epigenome Comparison Aleksandar Milosavljevic Epigenomics Data Analysis and Coordination Center (EDACC) Presented at the.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Processing of miRNA samples and primary data analysis
ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Copyright OpenHelix. No use or reproduction without express written consent1.
High Throughput Sequencing
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Next generation sequencing Xusheng Wang 4/29/2010.
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Li and Dewey BMC Bioinformatics 2011, 12:323
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Library Preparation Application dependant, using standard molecular biological techniques. Fragment library oligo kit: (per library)$35 GeneAmp dNTP blend:
Detecting enriched regions (Chip- seq, RIP-seq) Statistical evaluation of enriched regions Data displayed in Genome Browser Detection of enriched motifs.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Advanced ChIPseq Identification of consensus binding sites for the LEAFY transcription factor.
Massive Parallel Sequencing
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
NIH Extracellular RNA Communication Consortium 2 nd Investigators’ Meeting May 19 th, 2014 Sai Lakshmi Subramanian – (Primary
To access the wireless network: Please bookmark the following link, which will allow each of you to become set-up as a Rice visitor online:
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Exploring Monoallelic Methylation Using High-throughput Sequencing Cristian Coarfa Ronald Harris Aleksandar Milosavljevic Joe Costello.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Sackler Medical School
EDACC Quality Characterization for Various Epigenetic Assays
Next Generation Sequencing
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
The iPlant Collaborative
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Accessing and visualizing genomics data
Biol 456/656 Molecular Epigenetics Lecture #2 Wed August 26, 2015.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
HOMER – a one stop shop for ChIP-Seq analysis
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Canadian Bioinformatics Workshops
? ? Individual 1Individual 2 1. Questions This is a pedigree for a disease involving a mutation within an imprinted gene. The disease manifests only when.
How to get from a pile of unprocessed data to knowledge: The user’s perspective Guido Jenster, Ph.D. Professor of Experimental Urological Oncology Department.
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
Using command line tools to process sequencing data
Cancer Genomics Core Lab
NGS Analysis Using Galaxy
Next-generation DNA sequencing
ChIP-seq Robert J. Trumbly
Presentation transcript:

EDACC Primary Analysis Pipelines Cristian Coarfa Bioinformatics Research Laboratory Molecular and Human Genetics

Data Levels

Data Types Submitted To EDACC ChIP-Seq Shotgun Bisulfite Sequencing –Methyl-C Reduced Representation Bisulfite Sequencing –RRBS MRE-Seq MeDIP-Seq Chromatin Accessibility small RNA-Seq mRNA-Seq

Read Mapping Common processing step to all pipelines High throughput –Sequence space: Illumina –Color space: SOLID Quick and accurate anchoring Reads size varies bp Short read aligners –1 st generation: Maq, soap Ungapped alignment –2 nd generation: bowtie, bwa, soap 2 Tradeoff speed for sensitivity, good enough for many applications Mapping tools –Robust to indels –Sensitive to variable number of mismatches

Pash 3.0 Positional Hashing Regular reads mapping Bisulfite sequencing mapping Integrate basepair variation with epigenetic variation SAM output, easy integration with other analysis tools Accuracy without sacrificing efficiency

Bisulfite Sequencing Current tools: BSMAP, RMAP-BS, mrsFast, Zoom Pash 3.0 –Integrate mutation discovery with basepair-level methylation discovery –Speedup General approach –Covert C’s to T’s in reads and/or reference –Use mappings, reads and reference to determine methylated sites Pash 3 –Generate and hash all possible kmers for reads –CTT: CCC, CCT, CTC, CTT –Map against forward and reverse complement chromosome strands Superior sensitivity to other tools, without loss of efficiency

Galaxy/Genboree Developed at Penn State University Benefits –Rapid deployment tool –Share pipelines w/ others Alan Harris, Sriram Raghuram –Deployed Galaxy/Genboree –Integration w/ Genboree API for upload/download –Adaptors for LFF file format support –EDACC XML validation tools Sriram Raghuram, Andrew Jackson, Cristian Coarfa –Integration with compute clusters Arpit Tandon, Sriram Raghuram –Deployed analysis tools

Primary Analysis Pipelines Implemented & exposed via Galaxy/Genboree –Read mapping –Bisulfite Sequencing read mapping –Peak calling (ChIP-Seq, MeDIP-Seq) MACS (Harvard), FindPeaks (UBC) –Chromatin accessibility HotSpot (UW) –Small RNA-seq Coming soon –mRNA seq –Expression, alternative splicing –Gene fusion Typical user interaction –Use Galaxy for user input –Submit jobs to a cluster –Upload results to Genboree

Reads Mapping

ChIP-Seq Select uniquely mapping reads Build read density maps –Extend each read 200bp along the mapping strand –Remove monoclonal reads –Generate WIG data –Can be visualized in Genboree and UCSC Peak calling –FindPeaks, MACS Intepret Peaks –Overlap with genomic features of interest: gene promoters, etc

MeDIP-Seq Select uniquely mapping reads Build read density maps Determine methylated CpGs –FindPeaks

Finding methylated CpGs

MeDIP-Seq Signal Visualization

MRE-Seq Select uniquely mapping reads Determine unmethylated CpGs

Bisulfite Sequencing Shotgun Bisulfite Sequencing –Methyl-C –Genome wide Reduced Representation Bisulfite Sequencing –RRBS –Enzyme cocktail Map using Pash Build methylation maps

Bisulfite Sequencing Read Mapping

Methylation Maps Position Strand CHHStatus Methylation Unmethylated TotalReads CG CG CG CG

Small RNA-Seq Trim adapters Map reads onto target genome –up to 100 locations per read Interpret –Overlap w/ miRNAs, piRNAs, sno/scaRNAs

Exercise Download the input MeDIP-Seq file from the workshop wiki Analyze it using FindPeaks in Galaxy –Obtain results in Genboree Lff format Upload the results to Genboree database View the results in a tabular view Find the largest peaks Explore them in the Genboree browser