Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Transcriptomics Jim Noonan GENE 760.
Metabolomics Bob Ward German Lab Food Science and Technology.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
EB3233 Bioinformatics Introduction to Bioinformatics.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
The Broad Institute of MIT and Harvard Differential Analysis.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Roy Williams PhD Sanford | Burnham Medical Research Institute.
GenePattern Overview caBIG Silver Compatibility review Ted Liefeld Cancer Informatics Program The Broad Institute of MIT and.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
ARCH/VCDE F2F BoF And the Presentation Subtitle Goes Here Ravi Madduri December 2008.
How to get from a pile of unprocessed data to knowledge: The user’s perspective Guido Jenster, Ph.D. Professor of Experimental Urological Oncology Department.
Canadian Bioinformatics Workshops
Pathway Informatics 16th August, 2017
Cancer Genomics Core Lab
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
GSEA-Pro Tutorial Anne de Jong University of Groningen.
Gene expression from RNA-Seq
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Microarray Experiment Design and Data Interpretation
Introduction to Bioinformatics February 13, 2017
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Impact of Formal Methods in Biology and Medicine Final Review
Day 4 Session 22: Questions and follow-up…. James C. Fleet, PhD
Many Sample Size and Power Calculators Exist On-Line
The Omics Dashboard Suzanne Paley Pathway Tools Workshop 2018
Gene Expression Analysis and Proteins
Proteomics Informatics David Fenyő
Gene expression analysis
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
Expression profiling of snoRNAs in normal hematopoiesis and AML
Schedule for the Afternoon
Standards Development for Metabolomics
miRNA expression patterns in stools from healthy subjects.
Dimension reduction : PCA and Clustering
Characterization of R and VR tumors grafted in mice.
Working with RNA-Seq Data
Network biology An introduction to STRING and Cytoscape
Gene Expression Analysis
The Omics Dashboard.
Session 1: WELCOME AND INTRODUCTIONS
Transcriptomics Data Visualization Using Partek Flow Software
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Genomewide profiling of chromatin accessibility in prostate cancer specimens Genomewide profiling of chromatin accessibility in prostate cancer specimens.
Proteomics Informatics David Fenyő
Maria S. Robles, Sean J. Humphrey, Matthias Mann  Cell Metabolism 
Cancer Cell Line Encyclopedia
A, unsupervised hierarchical clustering of the expression of probe sets differentially expressed in the oral mucosa of smokers versus never smokers. A,
Volume 28, Issue 3, Pages e7 (July 2019)
Presentation transcript:

Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries

Day 2 Session 9: Visualization I: Basic techniques James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries

What Data Do You Need? Data Needed Format Reason Raw Array Data CEL file (Affymetrix) Submit to GEO Run analysis from beginning Processed, normalized data MAS5 RMA (*.txt, Excel) Present/Absent call Primary data Filtered to remove low expression genes *.txt Excel Used for statistical analysis Differential Expressed Gene List *.gct For GenePattern For DAVID, Reactome, IPA Fleet 2016

“learning a hidden data concept” Clustering Methods Grouping a large dataset into smaller data sets based upon some similarity “learning a hidden data concept” Marker selection-based: Heat maps Unsupervised learning/Class Discovery: Connectivity-based: Hierarchical Centroid-based: K-means Self organizing maps PCA Karimpour-Fard et al. (2015) Hum Genomics. 2015 Oct 28;9:28 Fleet 2016

Hierarchical Clustering Bottom-up or Agglomerative: Start with all obs in one cluster then split recursively going down the hierarchy https://www.youtube.com/watch?v=2z5wwyv0Zk4 Top-down or Divisive: Bottom-up or Agglomerative: Start with each obs = cluster then merge going up the hierarchy Fleet 2016

Heat Maps https://www.youtube.com/watch?v=f5MhPVeA3Z4 Table of values where the value in a cell is color coded Ordered so similar patterns are next to one another Fleet 2016

K-Means Clustering (2) (1) (3) (4) https://www.youtube.com/watch?v=zHbxbb2ye3E k clusters created by linking each observation to a mean k initial means randomly generated (2) (1) (3) (4) Repeat 2 and 3 until convergence Centroid of each k clusters becomes the new mean Fleet 2016

Self Organizing Maps (SOM) Fleet 2016

Principle Components Analysis (PCA) https://www.youtube.com/watch?v=eJ08Gdl5LH0 ID consecutive factors that are uncorrelated/orthogonal to each other Scores Plot: used for interpreting relationships among observations Scree Plot: Defines the number of principle components worth considering Account for less and less variability Principle Component 2 Principle Component 1 http://documents.software.dell.com/Statistics/Textbook/Principal-Components-Factor-Analysis Fleet 2016

a freely available open-source software package developed at the Broad Institute of MIT and Harvard for the analysis of genomic data. http://www.broadinstitute.org/cancer/software/genepattern# Fleet 2016

GCT File Format for GenePattern Fleet 2016

Tuesday BREAK #1

Day 2 Session 10: Functional Assessment I: Gene Set Enrichment Analysis NIH David James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries

…a collection of annotated gene sets for use with GSEA software …a collection of annotated gene sets for use with GSEA software. 8 collections curated by the Broad Institute. http://software.broadinstitute.org/gsea/msigdb/ Subramanian et al. (2005) Proc Natl Acad Sci 102:15545. Fleet 2016

Data format for Functional Classification https://david.ncifcrf.gov/ Huang et al. (2009) Nat Protocols 4:44. Fleet 2016

Day 2 Session 11: Special Lecture Sean Davis, PhD Staff Scientist National Cancer Institute http://watson.nci.nih.gov/%7Esdavis/

Day 2 Session 12: Visualization II: Intro to Pathway Analysis > Reactome > Pathview James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries

Data Formats for Analysis http://www.reactome.org/ http://onlinelibrary.wiley.com/doi/10.1002/pmic.201100066/full Fleet 2016

Data Format for Visualization http://pathview.uncc.edu/ Data Format for Visualization GeneID DCIS_1 DCIS_2 DCIS_3 10000 -0.3076 -0.1472 -0.0238 10001 0.4159 -0.3348 -0.5131 10002 0.1985 0.0379 0.3419 10003 -0.2316 -0.0966 -0.1047 100048912 -0.0449 -0.0520 0.0364 10004 -0.0876 -0.0503 0.0018 10005 -0.1263 0.4778 -0.1061 10006 0.6503 0.1951 -0.0054 *.txt or *.csv format Fold change or raw data Also available in R as a Bioconductor package http://www.bioconductor.org/packages/release/bioc/html/pathview.html Fleet 2016

Making Sense of Specific Gene Expression Data BioGPS: Tissue Expression Pattern Coremine: Gene Annotation http://biogps.org/#goto=welcome http://www.coremine.com/medical/#search Fleet 2016

Tuesday BREAK #2

Day 2 Session 13: Pathway Analysis continued…. James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries

Day 2 Session 14: Challenges of Metabolomic and Proteomic data James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries

Levels of Metabolic Research What may happen Genomics ~25,000 genes DNA (Genotype) What can happen Phenotype Transcriptomics ~100,000 transcripts What has happened mRNA Metabolomics Lipidomics ~40,000 Metabolic pathway Substrate Product Challenges: Many metabolites Location Chemistry Proteomics ~ a million proteins What is happening Adapted from Navas-Iglesias et al. (2009) Analytical Chemistry, 28, 393. by C. Ferreira (Purdue)

The Increasing Complexity of Omic Analysis + location, interaction, and chemistry differences Genome ~25,000 genes Transcriptome ~100,000 mRNA + miRNA + lncRNA Proteome ~1,000,000 Alternative promoters, splicing, editing Post-translationalmodification Fleet 2016

Proteomics/Metabolomics Workflow Data Filters Intensity Fold Well Designed Study Interpret Experiment Normalized Vetted Data Network building Sample Preparation Statistical Analysis Sample Analysis and QC analysis Pathway and Geneset Analysis Link Peaks to Databases, Quantification, Normalization Differentially Expressed List Clustering and visualization Raw Reads Fleet 2016

Generalized Proteomic Pipeline Steps 1-4 are distinct from the genomics pipeline Walther and Mann (2010) J Cell Biol 190:491. Fleet 2016

MaxQuant File formats MS-based peptide profile wifi RAW .d mzxml http://www.biochem.mpg.de/227318/MaxQuant File formats wifi RAW .d mzxml Link to peptide database Integrate to proteins Modified from Tiago Sobreira, Purdue Fleet 2016

Proteomic Resources http://www.ebi.ac.uk/pride/archive/ http://www.peptideatlas.org/

http://www.metaboanalyst.ca/faces/home.xhtml

Metabolomics Resources http://metabolomicsworkbench.org/ http://metabolomicssociety.org/resources/metabolomics-databases