January 20081 MSCL Analyst’s Toolbox Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson Jan 2008 Mathematical and Statistical Computing Laboratory.

Slides:



Advertisements
Similar presentations
Copyright © 2008, SAS Institute Inc. All rights reserved. Discovering Meaningful Patterns in Genomics Data with JMP Genomics Jordan Hiller JMP Genomics.
Advertisements

EGAN Tutorial: Loading Network Data October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center
13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.
EGAN tutorial: Loading experiment results October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
1. Principles and important terminology 2. RNA Preparation and quality controls 3. Data handling 4. Costs 5. Protocols 6. Information for collaboration.
Dahlia Nielsen North Carolina State University Bioinformatics Research Center.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Gene Expression Data Analyses (3)
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
June Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Microsoft Project 2010 ® Tutorial 6: Sharing Project Information with Other People & Applications.
NGS data analysis CCM Seminar series Michael Liang:
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
NHLBI Genomics Core Facility. Kim Woodhouse Hangxia Qiu, Ph.D Tony Cooper Xiuli Xu, Ph.D Bio-Informatics Nalini Raghavachari, Ph.D Wet lab Peter Munson,
Copyright OpenHelix. No use or reproduction without express written consent1.
3/24/2005 TIGP 1 Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005.
UBio Training Courses Micro-RNA web tools Gonzalo
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
January MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Copyright OpenHelix. No use or reproduction without express written consent1.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
The iPlant Collaborative
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Canadian Bioinformatics Workshops
Overview and Demo of CaIntegrator2 A Tool for Publishing and Analyzing Integrated Study Data.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
CellExpress Tutorial A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Regulatory Genomics Lab
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Chapter 4 Application Software
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
Tutorial 7 – Integrating Access With the Web and With Other Programs
Regulatory Genomics Lab
Chapter 3 Database Management
Extend Excel with Smartlist Designer
SRI Bioinformatics Research Group
Regulatory Genomics Lab
Presentation transcript:

January MSCL Analyst’s Toolbox Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson Jan 2008 Mathematical and Statistical Computing Laboratory Division of Computational Bioscience

January Course Outline Day 1 MSCL Analyst’s Toolbox and JMP™ overview MSCL Toolbox Concepts JMP™fundamentals Lunch Affymetrix ExpressionConsole™, processing.cel files, exporting data MSCL Toolbox Demo –Data input –Basic Analysis (Master File, Final File, Data normalization, QC, PCA, ) –Gene selection, statistical tests (p-values, FDR) –Annotation Day 2 Statistical Topics (PCA, Data normalization, FDR) MSCL Analyst’sToolbox Demo (cont.) –Complex Analysis (2-way ANOVA, blocked ANOVA) –Data Visualization

January Topics not included Exon Array Analysis -- coming soon! SNP chip Resequencing analysis, ChIP-Chip, copy number 2-color or spotted cDNA array analysis complete JMP tutorial JMP on Mac, Linux JMP scripting language Data management commands in JMP: Stack, Split, Concatenate, Sort

January Why use JMP? Interactive graphics facilitates data exploration, discovery of features Powerful, > 2,00,000 rows by 100s of columns (currently, 2 GB limit) Scripting language -- object oriented, allows matrix manipulation Connects to database servers including NIHLIMS or local GCOS JMP is also general purpose statistics pack Good technical support for JMP from: (919) or No direct cost to individual NIH users* (centrally supported in most NIH ICs) MSCL Analyst's Toolbox is FREE, adds tools for microarray studies

January 20085

6 MSCL Analyst’s Toolbox Features Menu driven Automated gene annotations Web link-out** Highly interactive, intuitive user interface Analysis pipeline, based on years of experience Familiar parametric analysis, e.g. ANOVA Exploratory Data Analysis Adaptable to new designs, analyses (e.g. Exon chips, SNP chips) Powerful, handles largest Affy chips, probe-level analysis Up to hundreds of chips at once PC, Mac or Linux desktops Support available through MSCL

January MSCL Analyst’s Toolbox Capabilities Connects to the central NIHLIMS database or local GCOS databases Reads in Pivot Tables from Affymetrix EC™ or GCOS™ Visualizes Principal Components Analyzes simple experiments (paired, unpaired T-tests) Analyzes complex experiments (multiple treatments, time series, linear trends, slope changes between treatments) Compensates for “batch” effects Selects and annotates significant genes Manages multiple gene lists (intersection, union, Venn diagrams) Multivariate, Cluster, Discriminant, Neural net analysis Uses dynamic visualization tools

January How to obtain: JMP – –Find your desktop support person at –JMP technical support from (919) The MSCL Analyst's Toolbox –Download from –Help offered on collaborative basis by MSCL – questions to:

January NIH Bioinformatics Cooperative

January Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes files Xform PCA Master Final MSCL Toolbox Data Pipeline:

January Data sources: NIHLIMS database via ODBC connection Local GCOS database via ODBC connection GCOS pivot table EC pivot table (NEW support for this option) Excel spread sheet Text files

January Data Input or data fetch DCEG/NCI Publish DB MSCL Publish DB client files client workstation Analyze (MAS) Process DB.dat files.cel files.chp files.rpt files Import(LM) Export(LM) Import Publish(MAS) ODBC access DMT Partek GeneSpring archive(LM) delete(LM) assume ownership(LM) Fluidics PlatformScanner CCMD Publish DB A-SCAN NIHLIMS database EC™ or GCOS™ MAS5™.txt

January Gene Expression Data Matrix Expression Matrix 116 Samples 1 20,000 Genes Gene Annotations Sample information

January Annotations for each gene Probe Set ID Genbank ID Unigene ID, Title Entrez Gene ID Cytogenetic map location Physical map location HUGO gene symbol, synonyms Functional relevance Associated literature references... GO terms for molecular process, biological function or cellular component Gene Annotations 1 20,000 Genes

January Annotation Files: Affymetrix annotations for each probeset have been downloaded and formatted for MSCL Toolbox, available at affylims.cit.nih.gov Annotations are updated quarterly Annotation tables may be JOINed by ProbeSetID Probe Set ID Gene Title Gene Symbol UnigeneID Transcript ID Ensembl Entrez Gene Representative Public ID First SwissProt Genome Alignment Chromosome Genome Alignment Start Address Genome Alignment Stop Address Genome Alignment Strand Chromosomal Location FinalAnnot. Final-Annot

January Annotating Genes Netaffx, reformatted Your data file “JOIN” on ProbeSetID

January Information about the Sample (transposed into MasterFile) 1 16 S amples Information about each Sample Clinical information (human) Diagnosis Demographic information Treatment (in vivo, in vitro) in designed experiment Tissue of origin Cell culture, strain, passage Sampling date/time RNA preparation protocol Operator/batch/lot/laboratory information QC information (rawQ, scale factor, 3/5-actin, 3/5-GAPDH, etc)

January Table formats JMP usually deals with a single Table, but… TWO tables are needed for MSCL Analyst’s Toolbox: 1. "Master File" layout –Each ROW represents a chip –Columns define treatment, replicate number, etc. 2. "Final" layout –COLUMNs correspond to chips (rows in Master File) –Each ROW is a probe set, unique identifier is probe set ID Tables are LINKED by “Shortnames” field in Master

January Linked Table Formats Master File -- one row per chip Final File -- one row per probe set

January Naming Convention for Final File Columns (prefixes) Data type: AD-, SG-, PA- Data transform: L-, Lmed-, GL-, S10- Statistical results: p-, FDR-, mean-, SFC- Column Naming Tips: –Avoid punctuation, hyphen, period, slash, etc. –Avoid spaces, use underscore “_” instead –Shorter is better –Toolbox utility available for trimming column names Column Name ITEM_NAME SG-33NH SG-33TH S10-33NH S10-33TH PA-33NH PA-33TH SFC-7 SFC-11 p-slope&cent2 FDR slope&cent2

January Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final

January Data Transformation and Normalization

January Log(x/median x) transform (“Lmed”)

January Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final

January Principal Components Analysis PC 1(38%) PC 2(12%)

January Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final

January Analysis Scripts ANOVA1 T-test, unequal variance Paired t-test Consistency test ANOVA1 with blocking ANOVA2 with interaction terms (unbalanced data allowed) ANOVA2 with blocking Linear regression ANCOVA with blocking (balanced data case) ANCOVA2 with blocking (balanced data case) Other tests are easily added (requires scripting)

January Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final

January Log(FoldChange)=“LFC” FoldChange = treated / control Log(FoldChange) = Log(treated / control) = Log(treated) - Log(control) Rule of Thumb for Base10 Logarithms: Log10(2-fold change) = 0.3 Log10(10-fold change) = 1 Log10(0.1-fold change) = -1

January Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final

January Volcano Plot Significance of change Magnitude of change, Log Scale Selection Regions

January Interpreting Gene Lists FinalAnnot. Filter (FDR<10%) GeneList Significant Terms Ingenuity™, GeneGo™

January GO-SCAN- Gene Ontology Annotations Gene Ontology for Significant Collection of Annotations: GO-SCAN is a bioinformatics tool that selects and presents relevant Gene Ontology (GO) annotations for a gene "hit" list from an Affymetrix microarray experiment.

January Ingenuity Pathway Analysis (Doug Joubert, NIH Library)