GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene Expression Analysis February 2009 Antoni Wandycz Elise Chang Agilent Technologies
GeneSpring January 2009 Agilent Bioinformatics Suite Transcriptome ‘GX 10’ miRNA, QPCR, Exon Metabolome ‘GX 11’ Proteome ‘GX 11’ DNA ‘DNA Analytics’ ChIP, Methyl, CGH DNA RNA Protein CH 2 OH GeneSpring Workgroup Data storage & Computation Share & Collaborate
GeneSpring January 2009 History and Future of GeneSpring GX Released GX 10 GX 7.3 functions miRNA, Exon, QPCR analysis Pathway analysis Support for eArray GX 9 development on avadis platform GX 11 Agilent acquires Silicon Genetics Agilent acquires Stratagene GX 9
GeneSpring January 2009 GeneSpring GX: Multiple-Platform Compatibility Agilent Feature Extraction files (>FE v8.5) Affymetrix CEL, CHP llumina BeadStudio (>v 3.1) ABI SDS, RQ Manager (for QPCR) Custom Formats (ALL 1 & 2-color microarrays).GPR files from AXON Scanners (GenePix software)
GeneSpring January 2009 GeneSpring GX 10 – Key features Guided Workflows New Applications - miRNA, QPCR, Exon & more in future Project-based organization & Translation-on-the-fly Biological Context - Pathway Analysis, GSEA, GO, IPA, etc. Customization - Scripting in Jython and R
GeneSpring January 2009 Pre-determined steps: Normalization QC Statistics GO Pathways GX 10 Key features: Guided Workflows
GeneSpring January 2009 Project-based organization
GeneSpring January 2009 GeneSpring GX 10: Translation (Chap 3 in GX 10 manual) Comparing Platforms i.e. Affymetrix vs. Agilent vs. Spotted Comparing Species i.e. Mouse vs. Human -- Homology Table (NCBI’s Homologene) Comparing Applications: i.e. Gene Expression & QPCR or miRNA
GeneSpring January 2009 Compare platforms, applications, species GX 10 Key features: Translation Homology table displayed
GeneSpring January 2009 Venn Diagram Compare experiments from different platforms, applications, & species
GeneSpring January 2009 GX 10 Key features: Biological Context GO Analysis (Fx, Process, Location) GSEA (Gene Set Enrichment Analysis) Pathway Analysis
GeneSpring January 2009 GeneSpring GX 10: Gene Ontology (GO) Analysis Likelihood that your genes of interest fell into a GO category, just by chance HELP always available
GeneSpring January, 2009 Pathway Analysis in GX 10: Two types of Pathway Analysis in GX 10: 1. ‘Pathway Analysis’ Tool Building networks of related entities 8 Pathway Interaction Databases and NLP 2. ‘Find Significant Pathways’ Tool Entity-list enrichment with known pathways (Step 8 in Guided Workflow) BioPax format pathways (.owl)
GeneSpring January 2009 Overlay Networks with Expression Data/Conditions
GeneSpring January 2009 Cellular Location Overlay of Network
GeneSpring January 2009 ‘Find Similar Pathways’ Tool Analysis performed on all pathways imported into GX 10 Significant enrichment of my genes in particular pathways? Significant pathways are added to experiment
GeneSpring January 2009 e-Seminars & Workshops Recorded Seminars: 1. Introduction to GX Analysis of miRNA & GE data 3. Analysis of QPCR & GE data 4. Alternative Splicing 5. Pathway Analysis
GeneSpring January, 2009 Affymetrix Files Getting Started in GeneSpring GX 10 Advanced Workflow: (To Find Differentially Expressed Genes)
GeneSpring January 2009 Cardiogenomics dataset: Affymetirx data Congestive heart failure (CHF) is a degenerative condition in which the heart no longer functions effectively as a pump. The most common cause of CHF is damage to the heart muscle by not enough oxygen. This is usually due to narrowing of the coronary arteries which take blood to the heart. Idiopathic cardiomyopathy results in weakened hearts due to an unknown cause. Ischemic cardiomyopathy is caused by a lack of oxygen to the heart due to coronary artery disease.
GeneSpring January 2009 Cardiogenomics dataset: Affymetirx data Experimental Goal: To identify the molecular mechanisms underlying congestive heart failure, gene expression profiles were compared between male and female patients with idiopathic, ischemic or non- failing heart conditions. MaleFemale Non-failing2 samples Idiopathic2 samples Ischaemic2 samples CEL files generated by Affymetrix GCOS
GeneSpring January 2009 SAMPLEGENDERCHF ETIOLOGY 1FemaleIdiopathic 2FemaleIdiopathic 3MaleIdiopathic 4MaleIdiopathic 5FemaleIschemic 6FemaleIschemic 7MaleIschemic 8MaleIschemic 9FemaleNon-failing 10FemaleNon-failing 11MaleNon-failing 12MaleNon-failing Experimental Setup in GeneSpring Gender Interpretation Condition 1: Female (Samples 1, 2, 5, 6, 9, 10) Condition 2: Male (Samples 3, 4, 7, 8, 11, 12 ) The selected Interpretation determines how the samples are displayed in the various views and the comparisons that are made in analyses such as statistics. CHF Etiology Interpretation Condition 1: Idiopathic (Samples 1, 2, 3, 4) Condition 2: Ischemic (Samples 5, 6, 7, 8) Condition 3: Non-failing (Samples 9, 10, 11, 12) Gender/CHF Etiology Interpretation Condition 1: Female/Idiopathic (Samples 1, 2) Condition 2: Male/Idiopathic (Samples 3, 4) Condition 3: Female/Ischemic (Samples 5, 6) Condition 4: Male/Ischemic (Samples 7, 8) Condition 5: Female/Non-failing (Samples 9, 10) Condition 6: Male/Non-failing (Samples 11, 12)
GeneSpring January 2009 GeneSpring GX 10 Vocabulary Project – collection of experiments Entity – gene, probe, probeset, exon, etc. Interpretation – samples that are grouped together based on conditions. Technology – A file containing information on array design and biological information (annotation) for all the entities on the array Biological Genome – a collection of all major annotations (NCBI) for any organism; essential for Generic/Custom arrays lacking annotations
GeneSpring January, 2009 Getting Started in GeneSpring Cardiogenomics Experiment: Transcriptional profiling to learn more about molecular mechanisms underlying Congestive Heart Failure (CHF) Sample Data: Myocardial samples from patients with normal hearts and Ischemic & Idiopathic cardiomyopathies (3 Etiologies) Variables: Gender (2) and Etiology (3) Technology: Affymetrix U133Plus2 array
GeneSpring January 2009 Getting Started: Create New Project From Startup screen OR from File/New Project
GeneSpring January 2009 Getting Started with Advanced Analysis Experiment Type: Affymetrix Expression (3 Affy choices!) Workflow Type: Advanced Analysis
GeneSpring January 2009 Select Data for Experiment Select ‘Choose Files’ to load data files found on your computer. Note: ‘Choose Samples’ option is used when creating experiments with samples already loaded into GX 10
GeneSpring January 2009 Sample Upload
GeneSpring January 2009 Summarization Algorithms in GX 10 for CEL Files Summarization of Affymetrix probes and baseline transformation of probeset values.
GeneSpring January 2009 Summarization algorithms in GX 10 BACKGROUND SUBTRACTION NORMALIZATIONPROBE SUMMARIZATION RMA PM based QuantileLog (PM) MAS5 PM-MM based ScalingOne-step Tukey Biweight PLIER PM-MM based QuantileLog (PM) LiWong PM-MM based QuantileLinear (PM) GCRMA PM-MM based QuantileLog (PM) In addition to different calculations, the algorithms differ in the order in which Normalization and Summarization are performed.
GeneSpring January 2009 CEL files are the raw data files that contain signal values for individual probes. CEL files are preprocessed to generate one value per probeset. Preprocessing steps are: 1. Background subtraction 2. Normalization 3. Summarization of probeset values Different preprocessing algorithms are available. DAT File CEL File CDF File + Image Analysis Hybridization & Scanning Array Preprocessing of Affymetrix Arrays CHP GCOS AGCC
GeneSpring January 2009 BoxWhisker plot: Summary of Normalized Intensities
GeneSpring January 2009 Advanced Workflow Experiment Setup Experiment Grouping Specify parameters/conditions
GeneSpring January 2009 Experiment Grouping The experimental parameters are added in this window. For each array, the particular parameter value (condition) is also specified. Values can be added manually or loaded from a saved file (circled in Red).
GeneSpring January 2009 Advanced Workflow Experiment Setup Create Interpretation In the Guided Workflow, only one interpretation is automatically provided. Here, users can create multiple interpretations
GeneSpring January 2009 Grouping and Interpretation 2 experimental variables: CHF Etiology and Gender For this experiment, 3 interpretations could be created: 1)Gender 2)CHF Etiology (Ischemic, Idiopathic, non-failing) 3)CHF Etiology and Gender: This interpretation is automatically created in the Guided Workflow. Example: Gender Only
GeneSpring January, 2009 Creating Interpretations: step 2 of 3
GeneSpring January, 2009 Creating Interpretations: step 3 of 3
GeneSpring January 2009 Advanced Analysis Workflow: Quality Control QC on Samples and Probes automatically performed in Guided Workflow Users can specify settings beyond those available in Guided Workflow
GeneSpring January, 2009 Quality Control on Samples
GeneSpring January, 2009 Filter by Expression
GeneSpring January 2009 Advanced Analysis Workflow: Analysis Statistical Analysis Filter on Volcano Plot (both Stats and Fold Change) Fold Change Clustering Find Similar Entities Filter on Parameters PCA
GeneSpring January 2009 Getting Started with Guided Workflow Experiment Type: Agilent Single-color Workflow Type: Guided Workflow
GeneSpring January 2009 Sample Upload
GeneSpring January 2009 BoxWhisker plot: Summary of Normalized Intensities
GeneSpring January 2009 GeneSpring GX 10: Important Menu options: Project:Import/Export project zip Tools:Script Editor/ R Editor Import BioPAX pathways GS7 data migration Options… Annotations:Update Technology Annotations Create Biological Genome Update Pathway Interactions Help:License Manager Update Product
GeneSpring January, 2009 Pathway Analysis To use ‘Find Significant Pathways’ Tool: 1. Download BioPax format (.owl) pathways to your computer 2. Import.owl pathways into GX 10 from Tools and ‘Import BioPax pathways’ option 3. From Workflows menu (in the right margin of GX 10) select ‘Find Similar Pathways’ and choose your Entity List of interest
GeneSpring January, 2009 Performing Pathway Analysis in GX 10: 1. In the Annotations Menu, select ‘Update Pathway Interactions’ from Agilent Server 2. Before choosing an organism, GX 10 must first create a Pathway Database Infrastructure. May take >10 min 3. Once the Infrastructure database is complete, go back to Annotations/Update Pathway Interactions and choose your preferred organism. May take >20 minutes 4. From Workflows menu (in the right margin of GX 10) select ‘Pathway Analysis’ to begin building networks
GeneSpring January 2009 Updating Annotations: Chap 3 in GX 10 pdf manual, pg. 51 Option 2: Update from file Option 1: Update from Agilent Server Option 3 is new in GX10: Update directly from NCBI from GX (Biological Genome)
GeneSpring January 2009 GeneSpring GX 10: Reference pages in Manual Creating/Updating Technologies & Annotations: Chapter 3 in GX 10 pdf manual, pg. 51 From 1) Agilent server; 2) file; 3) NCBI (Biological Genome) GS7 to GS10 Data Migration: Chapter 4 in GX 10 manual, pg. 71 and in Quick Start Guide Translation: Chapter 3.3 in pdf manual (pg 63)
GeneSpring January, 2009 Thank you Technical Support 24 hours/5 days per week (option 6, 2) (Genomics)
GeneSpring January 2009 Automated GX 7 Migration Tool Chapter 4 in GX 10 manual Step1: Prepare for GS7 Migration- tool automatically prepares data for migration (for large # of samples, this step takes time) Step2: Select GS7 genome to migrate to GS10- all experiments, samples, interpretation, gene lists, trees, parameter values, condition values, and classifications will be automatically migrated Step3: Open Project with name corresponding to GX 7 genome to see the migrated data. Note that if genome was assigned a project in GX 7, this name will be the name of the project in GX 10 instead of the name of GX 7 genome
GeneSpring January 2009 GX 10: Biological Context GO Analysis (Fx, Process, Location) GSEA (Gene Set Enrichment Analysis) GSA (Gene Set Analysis) Pathway Analysis (Interaction DB) Find Similar Entity Lists Find Significant Pathways (BioPax.org) Link to Ingenuity’s IPA NLP (mine literature)
GeneSpring January 2009 GSEA GSEA interrogates genome-wide expression profiles from samples belonging to two different classes (e.g. normal and tumor) and determines whether genes in an a priori defined gene set correlate with class distinction Reference: Subramanian et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. September 30, 2005,
GeneSpring January 2009 GSEA Method 1.Rank genes based on the correlation between their expression intensities and class distinction Genes that differ most in their expression between the two classes will appear at the top and bottom of the list Assumption is that genes related to the phenotypic distinction of the classes will tend to be found at the top and bottom of the list 2.Calculate enrichment score (ES) to reflect the degree of overrepresentation of genes in a particular gene set at the top and bottom of the entire ranked list 3.Derive p-value for the ES to estimate its significance level 4.Adjust p-value for multiple testing
GeneSpring January 2009 Gene Set Enrichment Analyses
GeneSpring January 2009 Gene Set Enrichment Analyses How is performing GSEA or GSA on GO gene sets different from doing GO Analysis on a list of differentially expressed genes? Statistical analysis can miss genes with small changes relative to noise that, as a group, can have significant impact on the observed difference in phenotype –Use All Entities list as input for GSEA or GSA Instead of looking at only at individual differentially expressed genes, take a genome-wide approach to see if gene sets are associated with the phenotypic class distinction –Enrichment in GO Analysis done with Fisher’s Exact while GSEA/GSA is done with a type of running sum statistics User can specify any Entity List as gene sets in GeneSpring GX
GeneSpring January 2009 Identifiers Necessary for GSEA Technology must contain Gene Symbol Columns that must be marked in custom technology to perform GSEA: Annotation file must contain a column (Column X) containing Gene Symbol –Column X must be marked “Gene Symbol” –Select “Gene Symbol” mark from the drop-down menu while creating Custom technology.
GeneSpring January 2009 Gene Sets GSEA/GSA can use either Broad lists or any Entity Lists in GeneSpring Broad Institute has defined four categories of gene sets: C1- Grouped based on cytogenic location. C2- Functional lists. ~1000 gene lists corresponding to pathways or functional process (if they are both involved in inflammatory response, they can also be in the same list) C3- Regulation lists. Grouped according by promoter analysis. Genes are regulated by the same motif (may or may not know transcription factor). Cases where they simply share same binding motif and therefore assumed to be co- regulated. C4- Proximity to known oncogene and tumor suppresors. For example, all the neighbors of BRCA. C5- GO gene sets. Each category is represented as a gene set except for very broad categories such as Biological Process and categories with less than 10 genes
GeneSpring January 2009 Key Differences Between GSEA and GSA The two algorithms share the same idea, but differ in the way they determine what gene sets are significantly enriched Differs in the GSA "maxmean" statistic: this is the mean of the positive or negative part of gene scores in the gene set, whichever is larger in absolute value. Efron and Tibshirani shows that the method used in GSA is often more powerful than the modified Kolmogorov-Smirnov statistic used in GSEA. GSA uses a somewhat different null distribution for estimation of false discovery rates: it does "restandardization" of the genes, in addition of the permutation of samples (done in GSEA) GSA also can handle more than two conditions (limitation in GSEA)