User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen
Structure of the hands-on session Introduction (Petri) Demo: Introduction to galaxy (Oskari) Demo: fastQC, read trimming (Petri) Hands-on: fastQC, read trimming –Online tutorial Demo: Alignment (Oskari) Hands-on: Alignment –Online tutorial
Introduction Idea of this tutorial is to: –Get familiar with galaxy –Understand the main steps of ChIP-seq analysis –Do each step manually in Galaxy –Motivate you to do NGS analysis yourself …but not to Go through all theory behind sequencing technology Explain all NGS analysis terms Go in depth in data-analysis
NGS, reads, deep sequencing..? Next generation sequencing(NGS) –Sequencing DNA-RNA molecules Read: –Fraction of DNA-RNA sequenced, typically nucleotides Depth: –Refers to the number of times a nucleotide is read during the sequencing process
Gene regulation and genome-wide data Reference genome Gro-seq RNA-seq TF ChIP-seq TF motif SNP from exome sequencing Histone marker ChIP-seq Dnase-seq Input ChIP-seq TF=Transcription factor
NGS analysis pipeline: ChIP-seq Peak calling (Sami Heikkinen)Peak calling Motif detection and functional enrichment (Minna Kaikkonen)Motif detection and functional enrichment RNA-seq (Eija Korpelainen) Isoforms and differential expression Isoforms and differential expression Functional enrichment Exome-seq (Patrick May) Variant Calling and visual explorationVariant Calling and visual exploration Variant annotation Variant prioritization Common steps: FASTQ data and quality control Read trimming and filtering Alignment and Visualization (visualization, integrative analysis, machine learning, clustering etc)
How did we get the data? GEO accession: GSE31477, Homo Sapiens We used SRA database to download ENCODE data: Command line tool: prefetch SRR (input) prefetch SRR (TCF7L2 ChIP-seq) … or manually from SRA database SRR353507( SRR ) SRR ( SRR340079) Converted SRA to fastq format ( SRA Toolkit, fastq-dump tool): fastq-dump path-to-file/file1-2 Extracted only chr3 results to make analysis steps faster You will start with TCF7L2 ChIP-seq and INPUT ChIP-seq chr3 data in fastq format
What is Galaxy?
Today’s hands-on session – Galaxy Logon to a computer with your UEFAD login credentials (or GUEST credentials you have received) Start a browser Go to Logon with your UEFAD credentials
Today’s hands-on session – Galaxy Tutorial: Tools: Quality check 1. NGS: QC and manipulation –FastQC:Read QC –Tips and how ”bad” data looks like Clean-up steps if required (Read trimming and filtering) Mapping to genome 2. NGS: Mapping –Map with Bowtie for Illumina
Today’s hands-on session – Galaxy Tools: extra visualiazation of genome mapping Peak calling 3. NGS: Peaks Calling –MACS Workflow 4. Workflow