Introduction To Next Generation Sequencing (NGS) Data Analysis

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

Peter Tsai Bioinformatics Institute, University of Auckland
DEG Mi-kyoung Seo.
Introduction To Next Generation Sequencing (NGS) Data Analysis
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
RNA-seq Analysis in Galaxy
High Throughput Sequencing
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
Introduction to RNA-Seq and Transcriptome Analysis
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
RNAseq analyses -- methods
Next Generation DNA Sequencing
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
Transcriptome Analysis
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
Introduction To Next Generation Sequencing (NGS) Data Analysis
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Introduction to RNAseq
The iPlant Collaborative
An Introduction to RNA-Seq Transcriptome Profiling with iPlant (
The iPlant Collaborative
No reference available
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Introductory RNA-seq Transcriptome Profiling
GCC Workshop 9 RNA-Seq with Galaxy
RNA Quantitation from RNAseq Data
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Introductory RNA-Seq Transcriptome Profiling
Kallisto: near-optimal RNA seq quantification tool
2nd (Next) Generation Sequencing
Learning to count: quantifying signal
Maximize read usage through mapping strategies
ChIP-seq Robert J. Trumbly
Additional file 2: RNA-Seq data analysis pipeline
Sequence Analysis - RNA-Seq 2
BF528 - Sequence Analysis Fundamentals
Introduction to RNA-Seq & Transcriptome Analysis

RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Introduction To Next Generation Sequencing (NGS) Data Analysis Jenny Wu

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Popular RNA-Seq pipeline: Tuxedo suite vs. Tophat-HTSeq Data visualization with Genome Browsers and R packages. Downstream Pathway analysis ChIP-Seq data analysis workflow and software NGS bioinformatics resources Summary

Why Next Generation Sequencing One can generate hundreds of millions of short sequences (up to 250bp) in a single run in a short period of time with low per base cost. Illumina/Solexa GA II, HiSeq 2500, 3000,X Roche/454 FLX, Titanium Life Technologies/Applied Biosystems SOLiD (200MX8)=1.6 billion DNA fragments can be sequenced in parallel in a single run, to produce a total of 320Gbp(HiSeq 2000) 200M*300/3G=20X Reviews: Michael Metzker (2010) Nature Reviews Genetics 11:31 Quail et al (2012) BMC Genomics Jul 24;13:341.

Why Bioinformatics Informatics (wall.hms.harvard.edu)

Bioinformatics Challenges in NGS Data Analysis “Big Data” (thousands of millions of lines long) Can’t do ‘business as usual’ with familiar tools Impossible memory usage and execution time Manage, analyze, store, transfer and archive huge files Need for powerful computers and expertise Informatics groups must manage compute clusters New algorithms and software are required and often time they are open source Unix/Linux based. Collaboration of IT experts, bioinformaticians and biologists

Basic NGS Workflow Olson et al.

NGS Data Analysis Overview Olson et al.

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy vs. shell scripting ChIP-Seq data analysis workflow and software NGS bioinformatics resources Summary

Terminology Data analysis: Experimental Design: Coverage (sequencing depth): The number of nucleotides from reads that are mapped to a given position. average coverage = read length * # reads/ genome size Paired-End Sequencing: Both end of the DNA fragment is sequenced, allowing highly precise alignment. Multiplexing/Demultiplexing: "barcode" sequences are added to each sample so they can be distinguished in order to sequence large number of samples on one lane. Data analysis: Quality Score: Each called base comes with a quality score which measures the probability of base call error. Mapping: Align reads to reference to identify their origin. Assembly: Merging of fragments of DNA in order to reconstruct the original sequence. Duplicate reads: Reads that are identical. Can be identified after mapping. Multi-reads: Reads that can be mapped to multiple locations equally well.

What does the data look like? Common NGS Data Formats For a full list, go to http://genome.ucsc.edu/FAQ/FAQformat.html

File Formats Reference sequences, reads: Alignments: FASTA FASTQ (FASTA with quality scores) Alignments: SAM (Sequence Alignment Mapping) BAM (Binary version of SAM) Features, annotation, scores: GFF3/GTF(General Feature Format) BED/BigBed WIG/BigWig http://genome.ucsc.edu/FAQ/FAQformat.html

FASTA Format (Reference Seq)

FASTQ Format (Illumina Example) Flow Cell ID Lane Tile Tile Coordinates Barcode Read Record Header @DJG84KN1:272:D17DBACXX:2:1101:12432:5554 1:N:0:AGTCAA CAGGAGTCTTCGTACTGCTTCTCGGCCTCAGCCTGATCAGTCACACCGTT + BCCFFFDFHHHHHIJJIJJJJJJJIJJJJJJJJJJIJJJJJJJJJIJJJJ @DJG84KN1:272:D17DBACXX:2:1101:12454:5610 1:N:0:AG AAAACTCTTACTACATCAGTATGGCTTTTAAAACCTCTGTTTGGAGCCAG @@@DD?DDHFDFHEHIIIHIIIIIBBGEBHIEDH=EEHI>FDABHHFGH2 @DJG84KN1:272:D17DBACXX:2:1101:12438:5704 1:N:0:AG CCTCCTGCTTAAAACCCAAAAGGTCAGAAGGATCGTGAGGCCCCGCTTTC CCCFFFFFHHGHHJIJJJJJJJI@HGIJJJJIIIJGIGIHIJJJIIIIJJ @DJG84KN1:272:D17DBACXX:2:1101:12340:5711 1:N:0:AG GAAGATTTATAGGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGG CCCFFFFFHHHHHGGIJJJIJJJJJJIJJIJJJJJGIJJJHIIJJJIJJJ Read Bases Separator (with optional repeated header) Read Quality Scores NOTE: for paired-end runs, there is a second file with one-to-one corresponding headers and reads. (Passarelli, 2012)

GFF3 and GTF format GFF3 format: GTF format: Khetani RS et al.

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy vs. shell scripting ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

General Data Pipeline

Why QC? Sequencing runs cost money Data analysis costs money and time Consequences of not assessing the Data Sequencing a poor library on multiple runs – throwing money away! Data analysis costs money and time Cost of analyzing data, CPU time $$ Cost of storing raw sequence data $$$ Hours of analysis could be wasted $$$$ Downstream analysis can be incorrect.

How to QC? $ module load fastqc $ fastqc s_1_1.fastq; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/, available on HPC Tutorial : http://www.youtube.com/watch?v=bz93ReOv87Y

FastQC: Example

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy vs. shell scripting ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

Premade Genome Sequence Indexes and Annotation http://ccb.jhu.edu/software/tophat/igenomes.shtml

The UCSC Genome Browser Homepage General information Get genome annotation here! Get reference sequences here! Specific information— new features, current status, etc.

Downloading Reference Sequences

Downloading Reference Annotation

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy vs. shell scripting ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

Sequence Mapping Challenges Alignment (Mapping) is often the first step once analysis-read reads are obtained. The task: to align sequencing reads against a known reference. Difficulties: high volume of data, size of reference genome, computation time, read length constraints, ambiguity caused by repeats and sequencing errors.

How to choose an aligner? There are many short read aligners and they vary a lot in performance(accuracy, memory usage, speed and flexibility etc). Factors to consider : application, platform, read length, downstream analysis, etc. Constant trade off between speed and sensitivity (e.g. MAQ vs. Bowtie2). Guaranteed high accuracy will take longer. Popular choices: Bowtie2, BWA, Tophat2, STAR.

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy vs. shell scripting ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

Application Specific Software

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy vs. shell scripting ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

Two Major Approaches DESeq2, EdgeR, DEXSeq…  1. Gene or Exon level differential expression (DE): DESeq2, EdgeR, DEXSeq… 2. Transcripts assembly : Trinity, Velvet-Oasis, TransABySS, Cufflinks, Scripture…

RNA-Seq Pipeline for DE

RNA-Seq: Spliced Alignment Some reads will span two different exons Need long enough reads to be able to reliably map both sides Use a splice aware aligner! http://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png “Systematic evaluation of spliced alignment programs for RNA-seq data” Nature Methods, 2013

How much sequence do I need? Oversimplified answer:20-50M PE/sample (Human/Mouse) Depends on: Size and complexity of your transcriptome. Goal of experiment: DE, transcript discovery. Tissue type, library type, RNA quality, read length, single-end…

RNA-Seq: Normalization Gene-length bias • Differential expression of longer genes is more significant because long genes yield more reads RNA-Seq normalization methods: Scaling factor based: Total count, upper quartile, median, DESeq, TMM in edgeR Quantile, RPKM (cufflinks) ERCC Normalize by gene length and by number of reads mapped, e.g. RPKM/FPKM (reads/fragments per kilo bases per million mapped reads)

Definition of Expression levels RPKM: Reads Per Kilobase per Million of mapped reads: FPKM: Fragment Per Kilobase per Million of mapped reads (for paired-end reads) Mortazavi, et al. 2008

RNA-Seq: Differential Expression  Discrete vs. Continuous data:  Microarray florescence intensity data: continuous Modeled using normal distribution   RNA-Seq read count data: discrete  Modeled using negative binomial distribution Microarray software can NOT be directly used to analyze RNA-Seq data!

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Popular RNA-Seq pipeline: Tuxedo suite, Tophat2-HTSeq-DESeq Data visualization with Genome Browsers. ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

Popular RNA-Seq DE Pipeline (The Tuxedo Protocol) (The Alternative Protocol)

Classic RNA-Seq (Tuxedo Protocol) http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html Classic RNA-Seq (Tuxedo Protocol) Spliced Read mapping SAM/BAM 2. Transcript assembly and quantification GTF/GFF 3. Merge assembled transcripts from multiple samples 4. Differential Expression analysis

Classic vs. Advanced RNA-Seq workflow

1. Spliced Alignment: Tophat Tophat : a spliced short read aligner for RNA-seq. $ tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C1_R2_thout genome C1_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C2_R1_thout genome C2_R1_1.fq C2_R1_2.fq $ tophat -p 8 -G genes.gtf -o C2_R2_thout genome C2_R2_1.fq C2_R2_2.fq http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

2.Transcript assembly and abundance quantification: Cufflinks Cufflinks: a program that assembles aligned RNA-Seq reads into transcripts, estimates their abundances, and tests for differential expression and regulation transcriptome-wide. $ cufflinks -p 8 -o C1_R1_clout C1_R1_thout/ accepted_hits.bam $ cufflinks -p 8 -o C1_R2_clout C1_R2_thout/ accepted_hits.bam $ cufflinks -p 8 -o C2_R1_clout C2_R1_thout/ accepted_hits.bam $ cufflinks -p 8 -o C2_R2_clout C2_R2_thout/ accepted_hits.bam http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

3. Final Transcriptome assembly: Cuffmerge $ cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt $ more assembies.txt ./C1_R1_clout/transcripts.gtf ./C1_R2_clout/transcripts.gtf ./C2_R1_clout/transcripts.gtf ./C2_R2_clout/transcripts.gtf http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

4.Differential Expression: Cuffdiff CuffDiff: a program that compares transcript abundance between samples. $ cuffdiff -o diff_out -b genome.fa -p 8 –L C1,C2 -u merged_asm/merged.gtf ./C1_R1_thout/accepted_hits.bam, ./C1_R2_thout/accepted_hits.bam ./C2_R1_thout/accepted_hits.bam, ./C2_R2_thout/accepted_hits.bam http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

Alternative Pipeline with HTSeq DESeq2/edgeR Tophat2, $ htseq-count -f bam C1_R1_thout/sorted.bam -s no > hsc/C1_R1.counts

HTSeq Output: Gene Count Table … …

DESeq2 April 21st workshp! http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html

Downstream Analysis Pathway and functional analysis: Gene Ontology over representation Gene Set Enrichment Analysis (GSEA) Signaling Pathway Impact Analysis Software DAVID, GSEA, WGCNA, Blast2go, topGO, BinGO... IPA, GeneGO MetaCore, iPathway Guide

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data file formats, general workflow Data Analysis Pipeline Sequence QC and preprocessing Obtaining and preparing reference Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat/Cufflinks parameters setting, cummeRbund Data Visualization with genome browsers and R packages ChIP-Seq data analysis workflow and software Open source pipeline software with Graphical User Interface Summary

Integrative Genomics Viewer (IGV) http://www.broadinstitute.org/igv Available on HPC. Use ‘module load igv’ and ‘igv’

Visualizing RNA-Seq mapping with IGV http://www.broadinstitute.org/igv/UserGuide Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.Thorvaldsdóttir H et al. Brief Bioinform. 2013

Genomic Data Visualization R packages for plots: ggplot2 ggbio GenomeGraphs

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

Galaxy: Web based platform for analysis of large datasets http://hpc-galaxy.oit.uci.edu/root https://main.g2.bx.psu.edu/ Galaxy: A platform for interactive large-scale genome analysis: Genome Res. 2005. 15: 1451-1455

Outline Goals : Practical guide to NGS data processing Bioinformatics in NGS data analysis Basics: terminology, data formats, general workflow etc. Data Analysis Pipeline Sequence QC and preprocessing Downloading reference sequences: query NCBI, UCSC databases. Sequence mapping Downstream analysis workflow and software RNA-Seq data analysis Concepts: spliced alignment, normalization, coverage, differential expression. Tuxedo suite: Tophat, Cufflinks and cummeRbund Data visualization with Genome Browsers. RNA-Seq pipeline software: Galaxy vs. shell scripting ChIP-Seq data analysis workflow and software Scripting Languages and bioinformatics resources Summary

What is ChIP-Seq? Chromatin-Immunoprecipitation (ChIP)- Sequencing ChIP - A technique of precipitating a protein antigen out of solution using an antibody that specifically binds to the protein. Sequencing – A technique to determine the order of nucleotide bases in a molecule of DNA. Used in combination to study the interactions between protein and DNA.

ChIP-Seq Applications Enables the accurate profiling of Transcription factor binding sites Polymerases Histone modification sites DNA methylation

A View of ChIP-Seq Data Typically reads (35-55bp) are quite sparsely distributed over the genome. Controls (i.e. no pull-down by antibody) often show smaller peaks at the same locations Rozowsky et al Nature Biotech, 2009

ChIP-Seq Analysis Pipeline Sequencing Base Calling Read QC Short read Sequences Short read Alignment Peak Calling Enriched Regions Visualization with genome browser Differential peaks Motif Discovery Combine with gene expression

ChIP-Seq: Identification of Peaks Several methods to identify peaks but they mainly fall into 2 categories: Tag Density Directional scoring In the tag density method, the program searches for large clusters of overlapping sequence tags within a fixed width sliding window across the genome. In directional scoring methods, the bimodal pattern in the strand-specific tag densities are used to identify protein binding sites. Determining the exact binding sites from short reads generated from ChIP-Seq experiments SISSRs (Site Identification from Short Sequence Reads) (Jothi 2008) MACS (Model-based Analysis of ChIP-Seq) (Zhang et al, 2008)

ChIP-Seq: Output A list of enriched locations Can be used: In combination with RNA-Seq, to determine the biological function of transcription factors Identify genes co-regulated by a common transcription factor Identify common transcription factor binding motifs

Resources in NGS data analysis Stackoverflow.com

Languages in Bioinformatics

Summary NGS technologies are transforming molecular biology. Bioinformatics analysis is a crucial part in NGS applications Data formats, terminology, general workflow Analysis pipeline Software for various NGS applications RNA-Seq and ChIP-Seq data analysis Pathway Analysis Data visualization Bioinformatics resources The current generation of DNA sequencing technologies have created massive, basepair resolution datasets that are ideally suited for systems biology studies centered on transcription. Primarily ChiP-seq, RNA-seq, Dnase-seq A new generation of tools to analyze individual datasets exist, with the integrative analysis becoming ever more critical.  Genomics is affecting all fields of biology and will eventually move into medicine. Thank you!