Presentation is loading. Please wait.

Presentation is loading. Please wait.

GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007.

Similar presentations


Presentation on theme: "GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007."— Presentation transcript:

1 GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

2 a platform for integrative genomics Client User Interfaces Pipeline EnvironmentModule Repository Module Integrator Desktop Programming Web all_aml_trainall_aml_test Preprocess Class Neighbors Weighted Voting Cross-Val SOM Clustering Preprocess Weighted Voting Train/Test SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Golub and Slonim et. al 1999 KNN SVM SOM GSEA NMF PCA

3 Features Automatic Module Integration  Add new modules without writing code  Supports any command line callable code (language independent) Multiple user interfaces  Desktop client  Web client  Programmatic interfaces to Java, MATLAB, R Local and Distributed Computing  Laptop  Client/Server  Compute farm  Public server (1/2008) Interoperability  caBIG  caArray  caGrid  geWorkbench  Cytoscape Analytic Reproducibility  Easy, rapid sharing of methodologies via pipelines  Versioning using Life Sciences Identifier (LSID)  Executable history of all sessions  Automatic pipeline generation from result files  Executable research documents Comprehensive Module Repository  ~90 modules: analysis, visualization, pipelines  Expression, proteomic, sequence, variation (SNP), and whole genome association data  Construction of context-sensitive, flexible analytic workflows  Module suites

4 Gene Expression Analysis  Differential Marker Analysis  Gene Neighbors  caArray Retriever  GEO Download  Expression File Creator  Threshold  Variation Filter  MAGE-ML Import  MAGE-TAB Import…

5 SNP Analysis  Copy Number Estimation  Smoothing  LOH determination  Batch Correction  SNPViewer  SNPFileCreator  X Chromosome Correction  GISTIC pipeline (soon…)

6 Statistical Methods & Machine Learning Analyses Prediction K-Nearest Neighbors (KNN) Weighted Voting (WV) Support Vector Machines (SVM) Probabilistic Neural Networks (PNN) Classification and Regression Trees (CART) ClusteringHierarchicalk-MeansSOMConsensus Pathway Analysis GSEAARACNECytoscape Other Statistical Methods Missing value imputation Kolmogorov-Smirnov score Non-negative Matrix Factorization (NMF) Principal Components Analysis (PCA)

7 Module Integrator  Add modules and visualizers without writing code  Share custom analysis tasks  Integrate your own or “third- party” tools easily  Add tools to a common repository

8 Pipelines for reproducible research all_aml_trainall_aml_test Preprocess Class Neighbors Weighted Voting Cross-Val SOM Clustering Preprocess Weighted Voting Train/Test SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Golub and Slonim et. al 1999 Users can design workflows where the input to any module is the output of any previous module Users can start with a result and automatically generate the workflow that created it Input data, parameters, and code (optionally) are packaged with a pipeline Every version of a module or pipeline is retained and uniquely identified Pipelines and modules are exportable/importable and can be shared among GenePattern users

9 as a Visualization & Analysis Engine http://www.broad.mit.edu/mmgp Portal GenePattern LSF Worker Nodes GenePattern SNPViewer visualizer (running as applet) Run GenePattern Analyses

10 Using MAGE-ML today

11 MAGE-TAB use tomorrow  Ideally  Be able to automatically find raw/derived bioassay data when parsing MAGE-TAB files Use MAGE-TAB like our native (tab-delimited) data formats, GCT, RES in (almost) any GenePattern analysis module Not require user interaction to specify Assays or quantitation types ? MGED-Ontology for common data transform protocols (eg RMA, MAS5) in addition to free text  Sub-optimal but still good  Have an interactive viewer to convert from MAGE-TAB to a native format (e.g. MAGE-ML import viewer) Human interaction required…

12 More MAGE-TAB thoughts  Define structure/format for keeping multiple MAGE- TAB files together  IDF, ADF, SDRF, raw data files -> package together as ZIP? tgz? Sub directories in the zip? (defined)  Does MAGE-TAB support for multiple Arrays in one file?  Useful & MAGE-ML allows this now (but I don’t like it for automated processing) E.g. E-GEOD-995.mageml.tgz from ArrayExpress

13 More MAGE-TAB thoughts  Persistent identifiers  For protocols, samples etc Allow use of SDRF, data matrix (eg in GP with persistent references to external entities) Array details, experiment design, etc  Question?  Should we consider MAGE-TAB DAG to record data processing pipelines (provenance - HLA)? e.g. a protocol for each module execution added to MAGE-TAB file outputs File growth issues… Record all analysis for a publication Add additional SDRF file at each step

14 Release Information  Initially released in March, 2004  Current version 3.0, released April 2007  3.1 due Feb 08  Currently 5900+ users, 500+ organizations, ~90 countries Availability  Freely available  Windows, Mac OS, and Unix platforms Resources  http://www.genepattern.org  User workshops, documentation, email help desk, online user forum  Reich et al. (2006) Nature Genetics GenePattern is a winner of the 2005 BioIT World Best Practices Award Collaborations  caBIG  MAGNet NCBC  NCIBI NCBC


Download ppt "GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007."

Similar presentations


Ads by Google