Download presentation
Presentation is loading. Please wait.
Published byBeverly Underwood Modified over 8 years ago
1
GenePattern Overview caBIG Silver Compatibility review http://www.genepattern.org/ Ted Liefeld Cancer Informatics Program The Broad Institute of MIT and Harvard
2
Contents: Overview of GenePattern caBIG/GenePattern Functionality Architecture and API Object Model
3
Overview of GenePattern
4
Research teams comprise members from many disciplines and levels of computational sophistication. The research environment is dynamic and heterogeneous, with new tools being developed quickly in many different forms. The number of available research tools is growing exponentially. Challenges in Genomic Analysis
5
Users have differing levels of computational sophistication “Impedance mismatch” between users and interfaces Users spend more time learning tools than doing research Effects of Multi-Disciplinary Teams
6
Tools are developed in a variety of environments (Java, Perl, MATLAB, R, etc.) Tools are developed in a variety of environments (Java, Perl, MATLAB, R, etc.) Programming skills required of users Slow acceptance of new tools and methods in silico research is not reproducible Developers reinvent the wheel Unable to combine different tools in a methodology Effects of Dynamic Research Env.
7
caBIG promotes Standards based applications, infrastructure and data sets Facilitating interoperability, collaboration and data sharing to speed cancer research GenePattern provides a standards-based caBIG compatible environment supporting; multidisciplinary biomedical research the rapid development, deployment and integration of new analytic techniques Addressing the problems
8
Comprehensive module repository Interfaces accessible to all levels of user Ability to chain tasks into reproducible pipelines for reproducible in silico research Ability to add new tools without programming Local or distributed computing A platform for integrative genomics
9
PreprocessDataset extract breast samples A platform for integrative genomics Graphical EnvironmentPipeline Environment Programming Environment Bicluster Heat Map Prediction Results # source("D:/CGP2003/GenePattern_modules/Golub_et_al_1999.R", echo = TRUE) # GenePattern # # Molecular Classification of Cancer: Class Prediction by Gene Expression # # Summary: This R/GenePattern script implements the supervised prediction metho # in Golub et al 1999, Science 286:531-537 (1999). # Load and set up GenePattern commands and server source("http://wilkins.wi.mit.edu:7070/gp/GenePattern.R", echo = FALSE, print.ev server <- SOAPServer("http://wilkins.wi.mit.edu", "/axis/servlet/AxisServlet", 7 source(paste("http://", server@host, ":", server@port, "/gp/getAllTaskWrappers.j # Neighborhood analysis MS.out <- MarkerSelection("data.filename" = "http://www-genome.wi.mit.edu/mpr/pu "class.filename" =“” "pred.results.file" = "pred.results", "data.results.file" = "data.results", "num.permutations" = "25", file.show(MS.out$pred.results) file.show(MS.out$data.results.gct) data <- read.table(MS.out$pred.results, header=T, sep="\t", skip=14) Analysis Task Manager Marker Selection Analysis Task WV Analysis Task SOM Analysis Task Transpose Analysis Task GenePattern remote data source Threshold impose a baseline and a ceiling HeatMapViewer project data as a heat map GeneNeighbors compute nearest neighbors of cyclin D1 in breast cells SelectFeaturesRows get expression data for breast neighbors in ovary cells SelectFeaturesColumns extract ovary samples Module Repository Task Integrator KNN WV SVM SOM PCA NMF FWER PCA
10
Modules are publicly hosted at the Broad Institute Users download modules from the module repository onto their own server Users check for new and updated modules and install them automatically GenePattern Module RepositoryUser’s GenePattern installation Module Repository
11
Analysis Modules Algorithm or other operation that processes data and creates result files, e.g. hierarchical clustering Visualizers Self-contained application that shows a graphical representation of data and allows user interaction, e.g. Heat Map Viewer Pipelines ( workflows ) Sequence of analysis tasks and visualizers that can be run, shared, and edited as a single entity Module Types
12
~90 Modules (10/07) Proteomics: SELDI, MALDI, LC-MS Noise Removal, Peak Detection, Peak Matching, Plot Spectra, ProteoArray Clustering, Prediction, Statistical Methods SOM, Hierarchical, Consensus, kNN, Weighted Voting, SVM, Missing value imputation, Kolomogorov-Smirnov score, NMF, PCA Marker Selection Class Neighbors, Gene Neighbors, Comparative (FWER, Q-value, FDR) Preprocessing/Utilities Threshold, Variation Filter, Transpose, Merge Dataset, Split Dataset, etc Data Conversion and Retrieval caArrayImportViewer, mzXML Import, MAGE-ML Import, GEO Download, Expression File Creator Visualizers Heat Map, Hierarchical Clustering, SOM, PCA, Feature Summary, Prediction Results, Gene List Significance, Comparative Marker Selection Annotation GeneCruiser, Affymetrix Chip Probe Conversion Pipelines Golub and Slonim, Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression, Nature 1999 Lu, Getz, Miska et. al. MicroRNA Expression Profiles Classify Human Cancers, Nature 2005 External modules adapted from: Bioconductor MeV (TIGR) Fred Hutchinson Cancer Center
13
Capture all steps in an analysis (esp. those omitted in published results) Re-run a methodology with different inputs Adapt an analysis for new uses Encapsulate complete in silico analyses in a single wrapper Maintain reproducibility regardless of future changes in code Reproducible Research via Pipelines
14
Add tasks and visualizers without writing code, via a Web-based form Tasks can be written in any language Once created, modules are usable by other users of a GenePattern server Edits are automatically versioned, so a pipeline can specify which version of a module to run Task Integrator Features
15
Users can run any module or pipeline as a routine call in a programming language. Pipelines can be converted to equivalent code. Libraries available for Java, R, and MATLAB, (Perl soon). Programming Language Environments
16
caBIG/GenePattern Functionality GenePattern integrates with caGrid in two ways; As a client As a service provider Three services available and published (IndexService and GME) PreprocessDataset Consensus Clustering Comparative Marker Selection
17
Architecture GenePattern Clients Graphical Client SOAP Analysis Task Manager PPD Algorithm Consensus Clustering Algorithm CMS Algorithm Web Browser Client HTTP GenePattern Engine caGrid SOAP caGrid Clients caGrid caGRID proxy caGrid Services caGrid Services caGrid Services caGrid Client caGrid
18
APIs caGrid service APIs for all three services Analysis Services Exposes API over caGRID Generated using caGRID tools Domain objects returned BioAssay (MAGE) Array (STAT-ML)
19
APIs Continued Security Not implemented, anonymous connections permitted
20
Object Model Analysis Parameters
21
Object Model Output Types (other than Mage/StatML)
22
Object Model STAT-ML
23
Object Model MAGE (partial)
24
Object Model Interfaces
25
Currently used by over 5300 researchers in over 500 commercial and non-profit organizations internationally. Adapted for use in analytical chemistry, metabolomics, quantum chemistry, and other analysis areas. Many resources exist to help users email help desk online user forum on-line tutorial, FAQ, and documentation. Frequent workshops providing individual instruction in using GenePattern. GenePattern is a winner of the 2005 BioIT World Best Practices Award Community
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.