Download presentation
Presentation is loading. Please wait.
Published byTerence Norton Modified over 9 years ago
1
GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007
2
a platform for integrative genomics Client User Interfaces Pipeline EnvironmentModule Repository Module Integrator Desktop Programming Web all_aml_trainall_aml_test Preprocess Class Neighbors Weighted Voting Cross-Val SOM Clustering Preprocess Weighted Voting Train/Test SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Golub and Slonim et. al 1999 KNN SVM SOM GSEA NMF PCA
3
Features Automatic Module Integration Add new modules without writing code Supports any command line callable code (language independent) Multiple user interfaces Desktop client Web client Programmatic interfaces to Java, MATLAB, R Local and Distributed Computing Laptop Client/Server Compute farm Public server (1/2008) Interoperability caBIG caArray caGrid geWorkbench Cytoscape Analytic Reproducibility Easy, rapid sharing of methodologies via pipelines Versioning using Life Sciences Identifier (LSID) Executable history of all sessions Automatic pipeline generation from result files Executable research documents Comprehensive Module Repository ~90 modules: analysis, visualization, pipelines Expression, proteomic, sequence, variation (SNP), and whole genome association data Construction of context-sensitive, flexible analytic workflows Module suites
4
Gene Expression Analysis Differential Marker Analysis Gene Neighbors caArray Retriever GEO Download Expression File Creator Threshold Variation Filter MAGE-ML Import MAGE-TAB Import…
5
SNP Analysis Copy Number Estimation Smoothing LOH determination Batch Correction SNPViewer SNPFileCreator X Chromosome Correction GISTIC pipeline (soon…)
6
Statistical Methods & Machine Learning Analyses Prediction K-Nearest Neighbors (KNN) Weighted Voting (WV) Support Vector Machines (SVM) Probabilistic Neural Networks (PNN) Classification and Regression Trees (CART) ClusteringHierarchicalk-MeansSOMConsensus Pathway Analysis GSEAARACNECytoscape Other Statistical Methods Missing value imputation Kolmogorov-Smirnov score Non-negative Matrix Factorization (NMF) Principal Components Analysis (PCA)
7
Module Integrator Add modules and visualizers without writing code Share custom analysis tasks Integrate your own or “third- party” tools easily Add tools to a common repository
8
Pipelines for reproducible research all_aml_trainall_aml_test Preprocess Class Neighbors Weighted Voting Cross-Val SOM Clustering Preprocess Weighted Voting Train/Test SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Golub and Slonim et. al 1999 Users can design workflows where the input to any module is the output of any previous module Users can start with a result and automatically generate the workflow that created it Input data, parameters, and code (optionally) are packaged with a pipeline Every version of a module or pipeline is retained and uniquely identified Pipelines and modules are exportable/importable and can be shared among GenePattern users
9
as a Visualization & Analysis Engine http://www.broad.mit.edu/mmgp Portal GenePattern LSF Worker Nodes GenePattern SNPViewer visualizer (running as applet) Run GenePattern Analyses
10
Using MAGE-ML today
11
MAGE-TAB use tomorrow Ideally Be able to automatically find raw/derived bioassay data when parsing MAGE-TAB files Use MAGE-TAB like our native (tab-delimited) data formats, GCT, RES in (almost) any GenePattern analysis module Not require user interaction to specify Assays or quantitation types ? MGED-Ontology for common data transform protocols (eg RMA, MAS5) in addition to free text Sub-optimal but still good Have an interactive viewer to convert from MAGE-TAB to a native format (e.g. MAGE-ML import viewer) Human interaction required…
12
More MAGE-TAB thoughts Define structure/format for keeping multiple MAGE- TAB files together IDF, ADF, SDRF, raw data files -> package together as ZIP? tgz? Sub directories in the zip? (defined) Does MAGE-TAB support for multiple Arrays in one file? Useful & MAGE-ML allows this now (but I don’t like it for automated processing) E.g. E-GEOD-995.mageml.tgz from ArrayExpress
13
More MAGE-TAB thoughts Persistent identifiers For protocols, samples etc Allow use of SDRF, data matrix (eg in GP with persistent references to external entities) Array details, experiment design, etc Question? Should we consider MAGE-TAB DAG to record data processing pipelines (provenance - HLA)? e.g. a protocol for each module execution added to MAGE-TAB file outputs File growth issues… Record all analysis for a publication Add additional SDRF file at each step
14
Release Information Initially released in March, 2004 Current version 3.0, released April 2007 3.1 due Feb 08 Currently 5900+ users, 500+ organizations, ~90 countries Availability Freely available Windows, Mac OS, and Unix platforms Resources http://www.genepattern.org User workshops, documentation, email help desk, online user forum Reich et al. (2006) Nature Genetics GenePattern is a winner of the 2005 BioIT World Best Practices Award Collaborations caBIG MAGNet NCBC NCIBI NCBC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.