Presentation is loading. Please wait.

Presentation is loading. Please wait.

GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene.

Similar presentations


Presentation on theme: "GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene."— Presentation transcript:

1 GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene Expression Analysis February 2009 Antoni Wandycz Elise Chang Agilent Technologies

2 GeneSpring January 2009 Agilent Bioinformatics Suite Transcriptome ‘GX 10’ miRNA, QPCR, Exon Metabolome ‘GX 11’ Proteome ‘GX 11’ DNA ‘DNA Analytics’ ChIP, Methyl, CGH DNA RNA Protein CH 2 OH GeneSpring Workgroup Data storage & Computation Share & Collaborate

3 GeneSpring January 2009 History and Future of GeneSpring 200420092006200520072008 GX 7.3.1 Released GX 10 GX 7.3 functions miRNA, Exon, QPCR analysis Pathway analysis Support for eArray GX 9 development on avadis platform GX 11 Agilent acquires Silicon Genetics Agilent acquires Stratagene GX 9

4 GeneSpring January 2009 GeneSpring GX: Multiple-Platform Compatibility Agilent Feature Extraction files (>FE v8.5) Affymetrix CEL, CHP llumina BeadStudio (>v 3.1) ABI SDS, RQ Manager (for QPCR) Custom Formats (ALL 1 & 2-color microarrays).GPR files from AXON Scanners (GenePix software)

5 GeneSpring January 2009 GeneSpring GX 10 – Key features Guided Workflows New Applications - miRNA, QPCR, Exon & more in future Project-based organization & Translation-on-the-fly Biological Context - Pathway Analysis, GSEA, GO, IPA, etc. Customization - Scripting in Jython and R

6 GeneSpring January 2009 Pre-determined steps: Normalization QC Statistics GO Pathways GX 10 Key features: Guided Workflows

7 GeneSpring January 2009 Project-based organization

8 GeneSpring January 2009 GeneSpring GX 10: Translation (Chap 3 in GX 10 manual) Comparing Platforms i.e. Affymetrix vs. Agilent vs. Spotted Comparing Species i.e. Mouse vs. Human -- Homology Table (NCBI’s Homologene) Comparing Applications: i.e. Gene Expression & QPCR or miRNA

9 GeneSpring January 2009 Compare platforms, applications, species GX 10 Key features: Translation Homology table displayed

10 GeneSpring January 2009 Venn Diagram Compare experiments from different platforms, applications, & species

11 GeneSpring January 2009 GX 10 Key features: Biological Context  GO Analysis (Fx, Process, Location)  GSEA (Gene Set Enrichment Analysis)  Pathway Analysis

12 GeneSpring January 2009 GeneSpring GX 10: Gene Ontology (GO) Analysis Likelihood that your genes of interest fell into a GO category, just by chance HELP always available

13 GeneSpring January, 2009 Pathway Analysis in GX 10: Two types of Pathway Analysis in GX 10: 1. ‘Pathway Analysis’ Tool Building networks of related entities 8 Pathway Interaction Databases and NLP 2. ‘Find Significant Pathways’ Tool Entity-list enrichment with known pathways (Step 8 in Guided Workflow) BioPax format pathways (.owl)

14 GeneSpring January 2009 Overlay Networks with Expression Data/Conditions

15 GeneSpring January 2009 Cellular Location Overlay of Network

16 GeneSpring January 2009 ‘Find Similar Pathways’ Tool Analysis performed on all pathways imported into GX 10 Significant enrichment of my genes in particular pathways? Significant pathways are added to experiment

17 GeneSpring January 2009 e-Seminars & Workshops www.genespring.comwww.genespring.com Recorded Seminars: 1. Introduction to GX 10 2. Analysis of miRNA & GE data 3. Analysis of QPCR & GE data 4. Alternative Splicing 5. Pathway Analysis

18 GeneSpring January, 2009 Affymetrix Files Getting Started in GeneSpring GX 10 Advanced Workflow: (To Find Differentially Expressed Genes)

19 GeneSpring January 2009 Cardiogenomics dataset: Affymetirx data Congestive heart failure (CHF) is a degenerative condition in which the heart no longer functions effectively as a pump. The most common cause of CHF is damage to the heart muscle by not enough oxygen. This is usually due to narrowing of the coronary arteries which take blood to the heart. Idiopathic cardiomyopathy results in weakened hearts due to an unknown cause. Ischemic cardiomyopathy is caused by a lack of oxygen to the heart due to coronary artery disease.

20 GeneSpring January 2009 Cardiogenomics dataset: Affymetirx data Experimental Goal: To identify the molecular mechanisms underlying congestive heart failure, gene expression profiles were compared between male and female patients with idiopathic, ischemic or non- failing heart conditions. MaleFemale Non-failing2 samples Idiopathic2 samples Ischaemic2 samples CEL files generated by Affymetrix GCOS

21 GeneSpring January 2009 SAMPLEGENDERCHF ETIOLOGY 1FemaleIdiopathic 2FemaleIdiopathic 3MaleIdiopathic 4MaleIdiopathic 5FemaleIschemic 6FemaleIschemic 7MaleIschemic 8MaleIschemic 9FemaleNon-failing 10FemaleNon-failing 11MaleNon-failing 12MaleNon-failing Experimental Setup in GeneSpring Gender Interpretation Condition 1: Female (Samples 1, 2, 5, 6, 9, 10) Condition 2: Male (Samples 3, 4, 7, 8, 11, 12 ) The selected Interpretation determines how the samples are displayed in the various views and the comparisons that are made in analyses such as statistics. CHF Etiology Interpretation Condition 1: Idiopathic (Samples 1, 2, 3, 4) Condition 2: Ischemic (Samples 5, 6, 7, 8) Condition 3: Non-failing (Samples 9, 10, 11, 12) Gender/CHF Etiology Interpretation Condition 1: Female/Idiopathic (Samples 1, 2) Condition 2: Male/Idiopathic (Samples 3, 4) Condition 3: Female/Ischemic (Samples 5, 6) Condition 4: Male/Ischemic (Samples 7, 8) Condition 5: Female/Non-failing (Samples 9, 10) Condition 6: Male/Non-failing (Samples 11, 12)

22 GeneSpring January 2009 GeneSpring GX 10 Vocabulary Project – collection of experiments Entity – gene, probe, probeset, exon, etc. Interpretation – samples that are grouped together based on conditions. Technology – A file containing information on array design and biological information (annotation) for all the entities on the array Biological Genome – a collection of all major annotations (NCBI) for any organism; essential for Generic/Custom arrays lacking annotations

23 GeneSpring January, 2009 Getting Started in GeneSpring Cardiogenomics Experiment: Transcriptional profiling to learn more about molecular mechanisms underlying Congestive Heart Failure (CHF) Sample Data: Myocardial samples from patients with normal hearts and Ischemic & Idiopathic cardiomyopathies (3 Etiologies) Variables: Gender (2) and Etiology (3) Technology: Affymetrix U133Plus2 array

24 GeneSpring January 2009 Getting Started: Create New Project From Startup screen OR from File/New Project

25 GeneSpring January 2009 Getting Started with Advanced Analysis Experiment Type: Affymetrix Expression (3 Affy choices!) Workflow Type: Advanced Analysis

26 GeneSpring January 2009 Select Data for Experiment Select ‘Choose Files’ to load data files found on your computer. Note: ‘Choose Samples’ option is used when creating experiments with samples already loaded into GX 10

27 GeneSpring January 2009 Sample Upload

28 GeneSpring January 2009 Summarization Algorithms in GX 10 for CEL Files Summarization of Affymetrix probes and baseline transformation of probeset values.

29 GeneSpring January 2009 Summarization algorithms in GX 10 BACKGROUND SUBTRACTION NORMALIZATIONPROBE SUMMARIZATION RMA PM based QuantileLog (PM) MAS5 PM-MM based ScalingOne-step Tukey Biweight PLIER PM-MM based QuantileLog (PM) LiWong PM-MM based QuantileLinear (PM) GCRMA PM-MM based QuantileLog (PM) In addition to different calculations, the algorithms differ in the order in which Normalization and Summarization are performed.

30 GeneSpring January 2009  CEL files are the raw data files that contain signal values for individual probes.  CEL files are preprocessed to generate one value per probeset.  Preprocessing steps are: 1. Background subtraction 2. Normalization 3. Summarization of probeset values  Different preprocessing algorithms are available. DAT File CEL File CDF File + Image Analysis Hybridization & Scanning Array Preprocessing of Affymetrix Arrays CHP GCOS AGCC

31 GeneSpring January 2009 BoxWhisker plot: Summary of Normalized Intensities

32 GeneSpring January 2009 Advanced Workflow Experiment Setup Experiment Grouping  Specify parameters/conditions

33 GeneSpring January 2009 Experiment Grouping  The experimental parameters are added in this window.  For each array, the particular parameter value (condition) is also specified.  Values can be added manually or loaded from a saved file (circled in Red).

34 GeneSpring January 2009 Advanced Workflow Experiment Setup Create Interpretation  In the Guided Workflow, only one interpretation is automatically provided.  Here, users can create multiple interpretations

35 GeneSpring January 2009 Grouping and Interpretation 2 experimental variables: CHF Etiology and Gender For this experiment, 3 interpretations could be created: 1)Gender 2)CHF Etiology (Ischemic, Idiopathic, non-failing) 3)CHF Etiology and Gender: This interpretation is automatically created in the Guided Workflow. Example: Gender Only

36 GeneSpring January, 2009 Creating Interpretations: step 2 of 3

37 GeneSpring January, 2009 Creating Interpretations: step 3 of 3

38 GeneSpring January 2009 Advanced Analysis Workflow: Quality Control  QC on Samples and Probes automatically performed in Guided Workflow  Users can specify settings beyond those available in Guided Workflow

39 GeneSpring January, 2009 Quality Control on Samples

40 GeneSpring January, 2009 Filter by Expression

41 GeneSpring January 2009 Advanced Analysis Workflow: Analysis Statistical Analysis Filter on Volcano Plot (both Stats and Fold Change) Fold Change Clustering Find Similar Entities Filter on Parameters PCA

42 GeneSpring January 2009 Getting Started with Guided Workflow Experiment Type: Agilent Single-color Workflow Type: Guided Workflow

43 GeneSpring January 2009 Sample Upload

44 GeneSpring January 2009 BoxWhisker plot: Summary of Normalized Intensities

45 GeneSpring January 2009 GeneSpring GX 10: Important Menu options: Project:Import/Export project zip Tools:Script Editor/ R Editor Import BioPAX pathways GS7 data migration Options… Annotations:Update Technology Annotations Create Biological Genome Update Pathway Interactions Help:License Manager Update Product

46 GeneSpring January, 2009 Pathway Analysis To use ‘Find Significant Pathways’ Tool: 1. Download BioPax format (.owl) pathways www.biopax.org to your computer 2. Import.owl pathways into GX 10 from Tools and ‘Import BioPax pathways’ option 3. From Workflows menu (in the right margin of GX 10) select ‘Find Similar Pathways’ and choose your Entity List of interest www.biopax.org

47 GeneSpring January, 2009 Performing Pathway Analysis in GX 10: 1. In the Annotations Menu, select ‘Update Pathway Interactions’ from Agilent Server 2. Before choosing an organism, GX 10 must first create a Pathway Database Infrastructure. May take >10 min 3. Once the Infrastructure database is complete, go back to Annotations/Update Pathway Interactions and choose your preferred organism. May take >20 minutes 4. From Workflows menu (in the right margin of GX 10) select ‘Pathway Analysis’ to begin building networks

48 GeneSpring January 2009 Updating Annotations: Chap 3 in GX 10 pdf manual, pg. 51 Option 2: Update from file Option 1: Update from Agilent Server Option 3 is new in GX10: Update directly from NCBI from GX (Biological Genome)

49 GeneSpring January 2009 GeneSpring GX 10: Reference pages in Manual Creating/Updating Technologies & Annotations: Chapter 3 in GX 10 pdf manual, pg. 51 From 1) Agilent server; 2) file; 3) NCBI (Biological Genome) GS7 to GS10 Data Migration: Chapter 4 in GX 10 manual, pg. 71 and in Quick Start Guide Translation: Chapter 3.3 in pdf manual (pg 63)

50 GeneSpring January, 2009 Thank you www.genespring.com Technical Support 24 hours/5 days per week informatics_support@agilent.com 1-800-227-9770 (option 6, 2) elise_chang@agilent.com antoni.wandycz@agilent.com chris_gates@agilent.com (Genomics) www.genespring.com informatics_support@agilent.com antoni.wandycz@agilent.com

51 GeneSpring January 2009 Automated GX 7 Migration Tool Chapter 4 in GX 10 manual Step1: Prepare for GS7 Migration- tool automatically prepares data for migration (for large # of samples, this step takes time) Step2: Select GS7 genome to migrate to GS10- all experiments, samples, interpretation, gene lists, trees, parameter values, condition values, and classifications will be automatically migrated Step3: Open Project with name corresponding to GX 7 genome to see the migrated data. Note that if genome was assigned a project in GX 7, this name will be the name of the project in GX 10 instead of the name of GX 7 genome

52 GeneSpring January 2009 GX 10: Biological Context  GO Analysis (Fx, Process, Location)  GSEA (Gene Set Enrichment Analysis)  GSA (Gene Set Analysis)  Pathway Analysis (Interaction DB)  Find Similar Entity Lists  Find Significant Pathways (BioPax.org)  Link to Ingenuity’s IPA  NLP (mine literature)

53 GeneSpring January 2009 GSEA GSEA interrogates genome-wide expression profiles from samples belonging to two different classes (e.g. normal and tumor) and determines whether genes in an a priori defined gene set correlate with class distinction Reference: Subramanian et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. September 30, 2005, 10.1073

54 GeneSpring January 2009 GSEA Method 1.Rank genes based on the correlation between their expression intensities and class distinction Genes that differ most in their expression between the two classes will appear at the top and bottom of the list Assumption is that genes related to the phenotypic distinction of the classes will tend to be found at the top and bottom of the list 2.Calculate enrichment score (ES) to reflect the degree of overrepresentation of genes in a particular gene set at the top and bottom of the entire ranked list 3.Derive p-value for the ES to estimate its significance level 4.Adjust p-value for multiple testing

55 GeneSpring January 2009 Gene Set Enrichment Analyses

56 GeneSpring January 2009 Gene Set Enrichment Analyses How is performing GSEA or GSA on GO gene sets different from doing GO Analysis on a list of differentially expressed genes? Statistical analysis can miss genes with small changes relative to noise that, as a group, can have significant impact on the observed difference in phenotype –Use All Entities list as input for GSEA or GSA Instead of looking at only at individual differentially expressed genes, take a genome-wide approach to see if gene sets are associated with the phenotypic class distinction –Enrichment in GO Analysis done with Fisher’s Exact while GSEA/GSA is done with a type of running sum statistics User can specify any Entity List as gene sets in GeneSpring GX

57 GeneSpring January 2009 Identifiers Necessary for GSEA Technology must contain Gene Symbol Columns that must be marked in custom technology to perform GSEA: Annotation file must contain a column (Column X) containing Gene Symbol –Column X must be marked “Gene Symbol” –Select “Gene Symbol” mark from the drop-down menu while creating Custom technology.

58 GeneSpring January 2009 Gene Sets GSEA/GSA can use either Broad lists or any Entity Lists in GeneSpring Broad Institute has defined four categories of gene sets: C1- Grouped based on cytogenic location. C2- Functional lists. ~1000 gene lists corresponding to pathways or functional process (if they are both involved in inflammatory response, they can also be in the same list) C3- Regulation lists. Grouped according by promoter analysis. Genes are regulated by the same motif (may or may not know transcription factor). Cases where they simply share same binding motif and therefore assumed to be co- regulated. C4- Proximity to known oncogene and tumor suppresors. For example, all the neighbors of BRCA. C5- GO gene sets. Each category is represented as a gene set except for very broad categories such as Biological Process and categories with less than 10 genes

59 GeneSpring January 2009 Key Differences Between GSEA and GSA The two algorithms share the same idea, but differ in the way they determine what gene sets are significantly enriched Differs in the GSA "maxmean" statistic: this is the mean of the positive or negative part of gene scores in the gene set, whichever is larger in absolute value. Efron and Tibshirani shows that the method used in GSA is often more powerful than the modified Kolmogorov-Smirnov statistic used in GSEA. GSA uses a somewhat different null distribution for estimation of false discovery rates: it does "restandardization" of the genes, in addition of the permutation of samples (done in GSEA) GSA also can handle more than two conditions (limitation in GSEA)


Download ppt "GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene."

Similar presentations


Ads by Google