Many Sample Size and Power Calculators Exist On-Line http://homepage.divms.uiowa.edu/~rlenth/Power/
Day 3 Session 16: Questions and follow-up…. James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Day 3 Session 17: Visualization III: Networks James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Pathways vs. Networks Pathway: Network: Both aggregate molecular events across multiple genes . Increases statistical detection threshold by the number of hypotheses tested Pathway: Small scale Well-studied Known linear relationship Easily visualized and interpreted Network: Large scale Integration of multiple studies Hard to visualize and interpret Contain novel information not covered in pathways Creixell et al. (2015) Nature Methods 12: 615 Fleet 2016
Patterns of Regulation in Genomic Data : Guilt by Association Human primary fibroblast cultures Serum starvation and refeeding 9600 transcripts, spotted cDNA array Hierarchical clustering * Genes in common cluster = common molecular regulation? Iyer et al., 1999, Science 283:83 Fleet 2016
Subnetwork construction and clustering Gene set enrichment Network-based modeling Simple but discard known biological network information Creixell et al. (2015) Nature Methods 12: 615 Fleet 2016
Networks Integrate Information Dong and Han (2008) Cell Res 18: 224 Fleet 2016
http://cytoscape.org/ ……an open source software tool for integrating, visualizing, and analyzing data in the context of networks. This does not do primary network building from your dataset. Fleet 2016
Data Format: *.txt or Excel (1st worksheet only) Header Row Identifier G1 Value G1 FC G1 FDR G2 value G2 FC G2 FDR Up to 20 observations per treatment/group * IPA can average Stats done prior to IPA Valid Expression Value type Expected Values Ratio (0, +INF) Fold Change (-INF, -1) (1, +INF) LogRatio (-INF, +INF). p-value (0, 1) FDR, q-value (0, 100) Intensity (0, +INF) RPKM/FPKM (0, +INF) (INF = infinity) Fleet 2016
Ingenuity Network Analysis Expression Dataset Network Network Generation IPA Knowledge Base Genes in Network DEG fit to a probabilistic fit to networks Scored Genes DEG Network Scoring Associated Functions DEG fit involved in a biological function DEG = differentially expressed genes
Wednesday BREAK #1
Day 3 Session 18: Flexible time for reinforcement… James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Day 3 Session 19: Patrick Finnegan Hardware Engineer Purdue University Tour of Data Center and Conte Cluster
Day 3 Session 20: Introduction to NGS James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Integration and interpretation Data Analysis Integration and interpretation Technology SNP Indels Genomics WGS, WES CNV Functional effect of mutation Structural Discovery and Application DGE Network + pathway analysis Transcript-omics RNA-Seq Fusion Splicing Editing Integrative analysis Methyl DNA Epigenomics Bisulfite-Seq ChIP-Seq Histones TF binding Modified from Shyr D, Liu Q. Biol Proced Online. (2013)15,4 Fleet 2016
NGS Sequencing Pipeline Input Fragment Add adapter Fragment library Library preparation Library amplification Parallel sequencing Read 1 Reads Read 2 Voelkerding et al., J Mol Diagn (2010) 12,539-51. Fleet 2016
How Much Sequencing is Enough? Read Length # Reads/Rxn Target Coverage # reads DNA variation 10-30X N/A ChIP-seq 100X RNA-seq (DEG) (rare) 20 million 100+ million https://genohub.com/next-generation-sequencing-guide/ Sims et al. (2014) Nat Rev Genet 15:121 Fleet 2016
Stephen Turner, PhD; Director Stephen Turner, PhD; Director. University of Virginia Bioinformatics Core http://apps.bioconnector.virginia.edu/covcalc/ Fleet 2016
Understanding RNA-seq Fleet 2016
Correlation between RNA-seq and Microarray Analysis Analysis from two different S. cervasisiae papers using the same growth conditions Modified from Wang et al. (2009) Nat Rev Genet 10:57 Tiling Array (log2) RNA-seq (log2) Array (log2) Zhao et al. (2014) PLOS One Activated T cells RNA-seq (log2) Fleet 2016
RNA-seq vs. Microarray: Which is “better”? Issue Microarray RNA-seq Reproducibility High Dynamic Range Modest Wide Sensitivity Low/Medium Accuracy High (but better for FC) Cost Low Complexity of analysis Species Limited to available platforms Any species possible https://bioinfomagician.wordpress.com/2014/01/28/rna-seq-vs-microarray-what-is-the-take/ Fleet 2016
Wednesday BREAK #2
Day 3 Session 21: Visualization IV: IGV James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Fleet 2016
Data for Visualization Lives Here…… Fleet 2016
Types of Files Commonly Used File type Description SAM Tab-delimited text file of sequence alignment data (i.e. primary read data) BAM Binary version of the SAM file Bedgraph Display of continuously valued data (e.g. transcriptome) Wiggle (Wig) bigWig Displays dense continuous data from Wig or bedgraph files for faster viewing BED Tiled data file that defines a feature track TDF Binary tiled data file that has been preprocessed for faster displays in IGV (e.g. for ChIP- and RNA-seq data) narrowPeak Called peaks of signal enrichment based on pooled, normalized data http://www.broadinstitute.org/igv/FileFormats Fleet 2016
IGV Displays Various File Types Mouse Large Intestine DNAse-Seq Data from ENCODE Chromosomal Location BroadPeaks NarrowPeaks bigWig Bam Refseq genes Fleet 2016
Visualizing RNA-seq Data Cell line Tissue ?