Next Generation Sequencing analysis June 6th, 2017
Course instructors Antonio Marco Stuart Newman Vladimir Teif
Course plan 11.00-12.00: Introductory lecture 12.00-12.30: Lunch 12.30-14.00: ChIP-seq practical 14.15-16.00: RNA-seq practical 16.15-18.00: Integrative analysis
1st Generation Sequencing
Microarrays Affimetrix microarrays
2nd (Next) Generation Sequencing Illumina MiSeq
Microarrays and NGS are used for different purposes http://www.genengnews.com/Contributor/ShawnCBakerPhD/5687/
NGS METHODS AND THEIR APPLICATIONS Chromatin domains Hi-C Figure adapted from http://www.scienceinschool.org
NGS data types RNA-seq, GRO-seq, CAGE, SAGE, CLIP-seq, Drop-seq gene expression; non-coding RNA ChIP-seq, MNase-seq, DNase-seq, ATAC-se, etc protein binding; histone modifications chromatin accessibility; nucleosome positioning Bisulfite sequencing (DNA methylation) Hi-C, 3C, 4C, ChIA-PET, etc (Chromatin loops in 3D) Amplicon sequencing targeted regions; philogenomics; metagenomics Whole Genome Sequencing (WGS) de-novo assembly (new species or new analyses) Curated bibliography of NGS methods (~100 methods) can be found at https://liorpachter.wordpress.com/seq/
Where to get NGS data? Do your own experiment Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo Sequence read archive (SRA) https://www.ncbi.nlm.nih.gov/sra European Nucleotide Archive https://www.ebi.ac.uk/ena The Cancer Genome Atlas (TCGA) https://tcga-data.nci.nih.gov/tcga Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/ You also have to upload your data!
How to analyze NGS data? Ask a bioinformatician you need to explain what do you want, and for that you need to understand what/how can be done Do it yourself Command line –> become a bioinformatician Online wrappers –> simpler, but file size limits Example of a convenient online tool: Galaxy http://galaxy.essex.ac.uk/
ChIP-seq experiment workflow 1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA, prepare sequencing library and submit for sequencing Adapted from www.VisiScience.com
ChIP-seq analysis workflow www.utsouthwestern.edu/labs.bioinformatics-core/analysis/chip-seq.png
NGS output after sequencing: .fastq files (FASTQ format)
NGS data after mapping: .bed files (BED format) Bowtie, BWA, ELAND, Novoalign, BLAST, ClustalW TopHat (for RNA-seq)
Data view in genome browsers Jung et al., NAR 2014 UCSC Genome Browser (online) IGV (install on a local computer)
Peak shapes can be different Park P. J., Nature Genetics, 2009
ChIP-seq: reads to peaks/regions MACS2 (universal) HOMER (universal) CISER (histones ) PeakSeq edgeR CisGenome Park P. J., Nature Genetics, 2009
RNA-seq: reads to genes/regions DESeq, edgeR, Cuffdiff
DNA methylation data DMRcaller BISMARK
Intersecting genomic regions BedTools (command line) Galaxy (online)
Genomic features are also regions Is ChIP-seq signal enriched there? Mattout et al., Genome Biology, 2015
Let’s look at many similar regions deepTools 2.0 https://github.com/fidelram/deepTools/wiki/Visualizations
ChIP-seq heat maps for all genes, scaled with respect to their start (TSS) and end (TES) deepTools 2.0 https://github.com/fidelram/deepTools/wiki/Visualizations
Cluster heatmaps deepTools 2.0 https://github.com/fidelram/deepTools/wiki/Visualizations
Comparing cluster heatmaps between two cell conditions NucTools https://homeveg.github.io/nuctools/
Histone modifications around TSS http://www.ie-freiburg.mpg.de/bioinformaticsfac
NGS data integration http://determinedtosee.com/wp-content/uploads/2014/08/jigsaw-puzzle.jpg
Different datasets in several tracks of a genome browser 5mC Gifford et.al., Cell 2013
Heat maps again: Signal from data 1 around regions in data 2 Here: Nucleosome occupancy around bound CTCF in mouse stem cells Vainshtein et.al., BMC Genomics 2017
Correlation analysis: any 2 datasets can be correlated http://homer.salk.edu/homer/ngs/quantification.html
Correlation of regulatory protein binding with gene expression Pavlaki et al., 2016
Gene ontology (GO) analysis Calo et al. (2015) Nature 518, 249–253 DAVID, Gorilla, GREAT, EnrichR
Motif enrichment analysis HOMER, MEME Pavlaki et al., 2016
Motif enrichment analysis MEME-ChIP
Summary of typical analyses: Differential peak calling Differential gene expression Intersection of different signals Correlation of different signals Motif sequence analysis Gene Ontology analysis
Questions?
Computer cluster and Linux NGS data are stored in very large text files NGS analysis is usually performed on a computer cluster using Linux. Why Linux? Because it is free, open-source, and very stable. Plus historic reasons. Linux likes working with large text files :)
WinSCP: Windows file manager
WinSCP: Windows file manager genome.essex.ac.uk
WinSCP: Windows file manager
Putty: Linux command line
Putty: Linux command line genome.essex.ac.uk
Putty: Linux command line
Putty: Linux command line
Learning Linux in 5 minutes There are two options for your work in Linux: Type your commands one by one in Putty Write all commands in a file called “bash file”, then execute this file, and all your commands written there will be executed We have prepared your bash files, you will just need to execute them
5 Linux commands you need cd DirectoryName – change directory less FileName – read file FileName qsub FileName – execute bash file qstat – check progress of all users wc FileName – count lines in FileName
Useful shortcuts To copy/paste from Windows to Putty: Copy [CTRL]+[C], then right-click in Putty to paste it Anywhere in Command Line in Putty: [up], [down] keys - scrolls through command history Auto completion of file/directory names: <something-incomplete> [TAB] When specifying directory name: ".." (dot dot) - refers to the parent directory "~" (Tilda) or "~/" - refers to the home directory
Additional Linux hints All commands, usernames, passwords, file & directory names in Linux are case sensitive. File paths (locations of files) use “/”, not “\”, e.g. /storage/projects/”. Avoid using spaces in filenames
Questions?