GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Gene Set Enrichment Analysis (GSEA)
1. Principles and important terminology 2. RNA Preparation and quality controls 3. Data handling 4. Costs 5. Protocols 6. Information for collaboration.
Introduce GeneSpring GX12 Yun Lian GeneSpring Layout.
Microarray Normalization
Microarray Data Preprocessing and Clustering Analysis
Gene Expression Data Analyses (3)
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Pathway analysis Daniel Hurley Pathway analysis: summary A popular buzzword… but what does it mean? A popular buzzword… but what does it mean? How do.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Gene Set Enrichment Analysis (GSEA)
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Networks and Interactions Boo Virk v1.0.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
SRI International Bioinformatics 1 Object Groups & Enrichment Analysis Suzanne Paley Pathway Tools Workshop 2010.
Course on Functional Analysis
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Copyright OpenHelix. No use or reproduction without express written consent1.
UBio Training Courses Micro-RNA web tools Gonzalo
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Input data for analysis Users that have expression values (dataset 1_ chicken affy_foldchane.txt. can upload that file as shown in slide 30.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Cluster validation Integration ICES Bioinformatics.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT.
The Broad Institute of MIT and Harvard Differential Analysis.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
Microarray Data Analysis The Bioinformatics side of the bench.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Canadian Bioinformatics Workshops
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
The Omics Dashboard Suzanne Paley Pathway Tools Workshop 2018
The Omics Dashboard.
Presentation transcript:

GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene Expression Analysis February 2009 Antoni Wandycz Elise Chang Agilent Technologies

GeneSpring January 2009 Agilent Bioinformatics Suite Transcriptome ‘GX 10’ miRNA, QPCR, Exon Metabolome ‘GX 11’ Proteome ‘GX 11’ DNA ‘DNA Analytics’ ChIP, Methyl, CGH DNA RNA Protein CH 2 OH GeneSpring Workgroup Data storage & Computation Share & Collaborate

GeneSpring January 2009 History and Future of GeneSpring GX Released GX 10 GX 7.3 functions miRNA, Exon, QPCR analysis Pathway analysis Support for eArray GX 9 development on avadis platform GX 11 Agilent acquires Silicon Genetics Agilent acquires Stratagene GX 9

GeneSpring January 2009 GeneSpring GX: Multiple-Platform Compatibility Agilent Feature Extraction files (>FE v8.5) Affymetrix CEL, CHP llumina BeadStudio (>v 3.1) ABI SDS, RQ Manager (for QPCR) Custom Formats (ALL 1 & 2-color microarrays).GPR files from AXON Scanners (GenePix software)

GeneSpring January 2009 GeneSpring GX 10 – Key features Guided Workflows New Applications - miRNA, QPCR, Exon & more in future Project-based organization & Translation-on-the-fly Biological Context - Pathway Analysis, GSEA, GO, IPA, etc. Customization - Scripting in Jython and R

GeneSpring January 2009 Pre-determined steps: Normalization QC Statistics GO Pathways GX 10 Key features: Guided Workflows

GeneSpring January 2009 Project-based organization

GeneSpring January 2009 GeneSpring GX 10: Translation (Chap 3 in GX 10 manual) Comparing Platforms i.e. Affymetrix vs. Agilent vs. Spotted Comparing Species i.e. Mouse vs. Human -- Homology Table (NCBI’s Homologene) Comparing Applications: i.e. Gene Expression & QPCR or miRNA

GeneSpring January 2009 Compare platforms, applications, species GX 10 Key features: Translation Homology table displayed

GeneSpring January 2009 Venn Diagram Compare experiments from different platforms, applications, & species

GeneSpring January 2009 GX 10 Key features: Biological Context  GO Analysis (Fx, Process, Location)  GSEA (Gene Set Enrichment Analysis)  Pathway Analysis

GeneSpring January 2009 GeneSpring GX 10: Gene Ontology (GO) Analysis Likelihood that your genes of interest fell into a GO category, just by chance HELP always available

GeneSpring January, 2009 Pathway Analysis in GX 10: Two types of Pathway Analysis in GX 10: 1. ‘Pathway Analysis’ Tool Building networks of related entities 8 Pathway Interaction Databases and NLP 2. ‘Find Significant Pathways’ Tool Entity-list enrichment with known pathways (Step 8 in Guided Workflow) BioPax format pathways (.owl)

GeneSpring January 2009 Overlay Networks with Expression Data/Conditions

GeneSpring January 2009 Cellular Location Overlay of Network

GeneSpring January 2009 ‘Find Similar Pathways’ Tool Analysis performed on all pathways imported into GX 10 Significant enrichment of my genes in particular pathways? Significant pathways are added to experiment

GeneSpring January 2009 e-Seminars & Workshops Recorded Seminars: 1. Introduction to GX Analysis of miRNA & GE data 3. Analysis of QPCR & GE data 4. Alternative Splicing 5. Pathway Analysis

GeneSpring January, 2009 Affymetrix Files Getting Started in GeneSpring GX 10 Advanced Workflow: (To Find Differentially Expressed Genes)

GeneSpring January 2009 Cardiogenomics dataset: Affymetirx data Congestive heart failure (CHF) is a degenerative condition in which the heart no longer functions effectively as a pump. The most common cause of CHF is damage to the heart muscle by not enough oxygen. This is usually due to narrowing of the coronary arteries which take blood to the heart. Idiopathic cardiomyopathy results in weakened hearts due to an unknown cause. Ischemic cardiomyopathy is caused by a lack of oxygen to the heart due to coronary artery disease.

GeneSpring January 2009 Cardiogenomics dataset: Affymetirx data Experimental Goal: To identify the molecular mechanisms underlying congestive heart failure, gene expression profiles were compared between male and female patients with idiopathic, ischemic or non- failing heart conditions. MaleFemale Non-failing2 samples Idiopathic2 samples Ischaemic2 samples CEL files generated by Affymetrix GCOS

GeneSpring January 2009 SAMPLEGENDERCHF ETIOLOGY 1FemaleIdiopathic 2FemaleIdiopathic 3MaleIdiopathic 4MaleIdiopathic 5FemaleIschemic 6FemaleIschemic 7MaleIschemic 8MaleIschemic 9FemaleNon-failing 10FemaleNon-failing 11MaleNon-failing 12MaleNon-failing Experimental Setup in GeneSpring Gender Interpretation Condition 1: Female (Samples 1, 2, 5, 6, 9, 10) Condition 2: Male (Samples 3, 4, 7, 8, 11, 12 ) The selected Interpretation determines how the samples are displayed in the various views and the comparisons that are made in analyses such as statistics. CHF Etiology Interpretation Condition 1: Idiopathic (Samples 1, 2, 3, 4) Condition 2: Ischemic (Samples 5, 6, 7, 8) Condition 3: Non-failing (Samples 9, 10, 11, 12) Gender/CHF Etiology Interpretation Condition 1: Female/Idiopathic (Samples 1, 2) Condition 2: Male/Idiopathic (Samples 3, 4) Condition 3: Female/Ischemic (Samples 5, 6) Condition 4: Male/Ischemic (Samples 7, 8) Condition 5: Female/Non-failing (Samples 9, 10) Condition 6: Male/Non-failing (Samples 11, 12)

GeneSpring January 2009 GeneSpring GX 10 Vocabulary Project – collection of experiments Entity – gene, probe, probeset, exon, etc. Interpretation – samples that are grouped together based on conditions. Technology – A file containing information on array design and biological information (annotation) for all the entities on the array Biological Genome – a collection of all major annotations (NCBI) for any organism; essential for Generic/Custom arrays lacking annotations

GeneSpring January, 2009 Getting Started in GeneSpring Cardiogenomics Experiment: Transcriptional profiling to learn more about molecular mechanisms underlying Congestive Heart Failure (CHF) Sample Data: Myocardial samples from patients with normal hearts and Ischemic & Idiopathic cardiomyopathies (3 Etiologies) Variables: Gender (2) and Etiology (3) Technology: Affymetrix U133Plus2 array

GeneSpring January 2009 Getting Started: Create New Project From Startup screen OR from File/New Project

GeneSpring January 2009 Getting Started with Advanced Analysis Experiment Type: Affymetrix Expression (3 Affy choices!) Workflow Type: Advanced Analysis

GeneSpring January 2009 Select Data for Experiment Select ‘Choose Files’ to load data files found on your computer. Note: ‘Choose Samples’ option is used when creating experiments with samples already loaded into GX 10

GeneSpring January 2009 Sample Upload

GeneSpring January 2009 Summarization Algorithms in GX 10 for CEL Files Summarization of Affymetrix probes and baseline transformation of probeset values.

GeneSpring January 2009 Summarization algorithms in GX 10 BACKGROUND SUBTRACTION NORMALIZATIONPROBE SUMMARIZATION RMA PM based QuantileLog (PM) MAS5 PM-MM based ScalingOne-step Tukey Biweight PLIER PM-MM based QuantileLog (PM) LiWong PM-MM based QuantileLinear (PM) GCRMA PM-MM based QuantileLog (PM) In addition to different calculations, the algorithms differ in the order in which Normalization and Summarization are performed.

GeneSpring January 2009  CEL files are the raw data files that contain signal values for individual probes.  CEL files are preprocessed to generate one value per probeset.  Preprocessing steps are: 1. Background subtraction 2. Normalization 3. Summarization of probeset values  Different preprocessing algorithms are available. DAT File CEL File CDF File + Image Analysis Hybridization & Scanning Array Preprocessing of Affymetrix Arrays CHP GCOS AGCC

GeneSpring January 2009 BoxWhisker plot: Summary of Normalized Intensities

GeneSpring January 2009 Advanced Workflow Experiment Setup Experiment Grouping  Specify parameters/conditions

GeneSpring January 2009 Experiment Grouping  The experimental parameters are added in this window.  For each array, the particular parameter value (condition) is also specified.  Values can be added manually or loaded from a saved file (circled in Red).

GeneSpring January 2009 Advanced Workflow Experiment Setup Create Interpretation  In the Guided Workflow, only one interpretation is automatically provided.  Here, users can create multiple interpretations

GeneSpring January 2009 Grouping and Interpretation 2 experimental variables: CHF Etiology and Gender For this experiment, 3 interpretations could be created: 1)Gender 2)CHF Etiology (Ischemic, Idiopathic, non-failing) 3)CHF Etiology and Gender: This interpretation is automatically created in the Guided Workflow. Example: Gender Only

GeneSpring January, 2009 Creating Interpretations: step 2 of 3

GeneSpring January, 2009 Creating Interpretations: step 3 of 3

GeneSpring January 2009 Advanced Analysis Workflow: Quality Control  QC on Samples and Probes automatically performed in Guided Workflow  Users can specify settings beyond those available in Guided Workflow

GeneSpring January, 2009 Quality Control on Samples

GeneSpring January, 2009 Filter by Expression

GeneSpring January 2009 Advanced Analysis Workflow: Analysis Statistical Analysis Filter on Volcano Plot (both Stats and Fold Change) Fold Change Clustering Find Similar Entities Filter on Parameters PCA

GeneSpring January 2009 Getting Started with Guided Workflow Experiment Type: Agilent Single-color Workflow Type: Guided Workflow

GeneSpring January 2009 Sample Upload

GeneSpring January 2009 BoxWhisker plot: Summary of Normalized Intensities

GeneSpring January 2009 GeneSpring GX 10: Important Menu options: Project:Import/Export project zip Tools:Script Editor/ R Editor Import BioPAX pathways GS7 data migration Options… Annotations:Update Technology Annotations Create Biological Genome Update Pathway Interactions Help:License Manager Update Product

GeneSpring January, 2009 Pathway Analysis To use ‘Find Significant Pathways’ Tool: 1. Download BioPax format (.owl) pathways to your computer 2. Import.owl pathways into GX 10 from Tools and ‘Import BioPax pathways’ option 3. From Workflows menu (in the right margin of GX 10) select ‘Find Similar Pathways’ and choose your Entity List of interest

GeneSpring January, 2009 Performing Pathway Analysis in GX 10: 1. In the Annotations Menu, select ‘Update Pathway Interactions’ from Agilent Server 2. Before choosing an organism, GX 10 must first create a Pathway Database Infrastructure. May take >10 min 3. Once the Infrastructure database is complete, go back to Annotations/Update Pathway Interactions and choose your preferred organism. May take >20 minutes 4. From Workflows menu (in the right margin of GX 10) select ‘Pathway Analysis’ to begin building networks

GeneSpring January 2009 Updating Annotations: Chap 3 in GX 10 pdf manual, pg. 51 Option 2: Update from file Option 1: Update from Agilent Server Option 3 is new in GX10: Update directly from NCBI from GX (Biological Genome)

GeneSpring January 2009 GeneSpring GX 10: Reference pages in Manual Creating/Updating Technologies & Annotations: Chapter 3 in GX 10 pdf manual, pg. 51 From 1) Agilent server; 2) file; 3) NCBI (Biological Genome) GS7 to GS10 Data Migration: Chapter 4 in GX 10 manual, pg. 71 and in Quick Start Guide Translation: Chapter 3.3 in pdf manual (pg 63)

GeneSpring January, 2009 Thank you Technical Support 24 hours/5 days per week (option 6, 2) (Genomics)

GeneSpring January 2009 Automated GX 7 Migration Tool Chapter 4 in GX 10 manual Step1: Prepare for GS7 Migration- tool automatically prepares data for migration (for large # of samples, this step takes time) Step2: Select GS7 genome to migrate to GS10- all experiments, samples, interpretation, gene lists, trees, parameter values, condition values, and classifications will be automatically migrated Step3: Open Project with name corresponding to GX 7 genome to see the migrated data. Note that if genome was assigned a project in GX 7, this name will be the name of the project in GX 10 instead of the name of GX 7 genome

GeneSpring January 2009 GX 10: Biological Context  GO Analysis (Fx, Process, Location)  GSEA (Gene Set Enrichment Analysis)  GSA (Gene Set Analysis)  Pathway Analysis (Interaction DB)  Find Similar Entity Lists  Find Significant Pathways (BioPax.org)  Link to Ingenuity’s IPA  NLP (mine literature)

GeneSpring January 2009 GSEA GSEA interrogates genome-wide expression profiles from samples belonging to two different classes (e.g. normal and tumor) and determines whether genes in an a priori defined gene set correlate with class distinction Reference: Subramanian et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. September 30, 2005,

GeneSpring January 2009 GSEA Method 1.Rank genes based on the correlation between their expression intensities and class distinction Genes that differ most in their expression between the two classes will appear at the top and bottom of the list Assumption is that genes related to the phenotypic distinction of the classes will tend to be found at the top and bottom of the list 2.Calculate enrichment score (ES) to reflect the degree of overrepresentation of genes in a particular gene set at the top and bottom of the entire ranked list 3.Derive p-value for the ES to estimate its significance level 4.Adjust p-value for multiple testing

GeneSpring January 2009 Gene Set Enrichment Analyses

GeneSpring January 2009 Gene Set Enrichment Analyses How is performing GSEA or GSA on GO gene sets different from doing GO Analysis on a list of differentially expressed genes? Statistical analysis can miss genes with small changes relative to noise that, as a group, can have significant impact on the observed difference in phenotype –Use All Entities list as input for GSEA or GSA Instead of looking at only at individual differentially expressed genes, take a genome-wide approach to see if gene sets are associated with the phenotypic class distinction –Enrichment in GO Analysis done with Fisher’s Exact while GSEA/GSA is done with a type of running sum statistics User can specify any Entity List as gene sets in GeneSpring GX

GeneSpring January 2009 Identifiers Necessary for GSEA Technology must contain Gene Symbol Columns that must be marked in custom technology to perform GSEA: Annotation file must contain a column (Column X) containing Gene Symbol –Column X must be marked “Gene Symbol” –Select “Gene Symbol” mark from the drop-down menu while creating Custom technology.

GeneSpring January 2009 Gene Sets GSEA/GSA can use either Broad lists or any Entity Lists in GeneSpring Broad Institute has defined four categories of gene sets: C1- Grouped based on cytogenic location. C2- Functional lists. ~1000 gene lists corresponding to pathways or functional process (if they are both involved in inflammatory response, they can also be in the same list) C3- Regulation lists. Grouped according by promoter analysis. Genes are regulated by the same motif (may or may not know transcription factor). Cases where they simply share same binding motif and therefore assumed to be co- regulated. C4- Proximity to known oncogene and tumor suppresors. For example, all the neighbors of BRCA. C5- GO gene sets. Each category is represented as a gene set except for very broad categories such as Biological Process and categories with less than 10 genes

GeneSpring January 2009 Key Differences Between GSEA and GSA The two algorithms share the same idea, but differ in the way they determine what gene sets are significantly enriched Differs in the GSA "maxmean" statistic: this is the mean of the positive or negative part of gene scores in the gene set, whichever is larger in absolute value. Efron and Tibshirani shows that the method used in GSA is often more powerful than the modified Kolmogorov-Smirnov statistic used in GSEA. GSA uses a somewhat different null distribution for estimation of false discovery rates: it does "restandardization" of the genes, in addition of the permutation of samples (done in GSEA) GSA also can handle more than two conditions (limitation in GSEA)