GenePattern Overview caBIG Silver Compatibility review Ted Liefeld Cancer Informatics Program The Broad Institute of MIT and.

Slides:



Advertisements
Similar presentations
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Advertisements

Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
NYU Microarray Database (NYUMAD)
CaGrid Service Metadata Scott Oster - Ohio State
Developing modules in GenePattern for gene expression analysis Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP GenePattern 2.0 Nature Genetics.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
The cancer Biomedical Informatics Grid™ (caBIG™): In Vivo Imaging Workspace Projects Fred Prior, Ph.D. Mallinckrodt Institute of Radiology Washington University.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Integromics: a grid-enalbled platform for integration of advanced bioinformatics tools and data Luca Corradi Luca Corradi BIO-Lab,
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
INTRODUCTION TO WEB DATABASE PROGRAMMING
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
©2013 Lavastorm Analytics. All rights reserved.1 Lavastorm Analytics Engine 5.0 New Feature Overview.
WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
NA-MIC National Alliance for Medical Image Computing NA-MIC Software Engineering Bill Lorensen GE Research NA-MIC Engineering Core PI.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Fundamentals of Database Chapter 7 Database Technologies.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
CHAPTER TEN AUTHORING.
The Broad Institute of MIT and Harvard Classification / Prediction.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
GeWorkbench Highlights caBIG ® Molecular Analysis Tools Knowledge Center AACR Annual Meeting, April 3, 2011.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
A collaborative tool for sequence annotation. Contact:
Introduction to caIntegrator caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
The Broad Institute of MIT and Harvard Differential Analysis.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Session V: Life Science Identifiers - Use Cases, Future Directions.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Introduction. Internet Worldwide collection of computers and computer networks that link people to businesses, governmental agencies, educational institutions,
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
MATLAB Distributed, and Other Toolboxes
Introduction to R Programming with AzureML
Fred Prior, Ph.D. Mallinckrodt Institute of Radiology
EPANET-MATLAB Toolkit An Open-Source Software for Interfacing EPANET with MATLAB™ Demetrios ELIADES, Marios KYRIAKOU, Stelios VRACHIMIS and Marios POLYCARPOU.
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
Gordon Erlebacher Florida State University
The Student’s Guide to Apache Spark
Cancer Cell Line Encyclopedia
Presentation transcript:

GenePattern Overview caBIG Silver Compatibility review Ted Liefeld Cancer Informatics Program The Broad Institute of MIT and Harvard

Contents: Overview of GenePattern caBIG/GenePattern Functionality Architecture and API Object Model

Overview of GenePattern

Research teams comprise members from many disciplines and levels of computational sophistication. The research environment is dynamic and heterogeneous, with new tools being developed quickly in many different forms. The number of available research tools is growing exponentially. Challenges in Genomic Analysis

Users have differing levels of computational sophistication “Impedance mismatch” between users and interfaces Users spend more time learning tools than doing research Effects of Multi-Disciplinary Teams

Tools are developed in a variety of environments (Java, Perl, MATLAB, R, etc.) Tools are developed in a variety of environments (Java, Perl, MATLAB, R, etc.) Programming skills required of users Slow acceptance of new tools and methods in silico research is not reproducible Developers reinvent the wheel Unable to combine different tools in a methodology Effects of Dynamic Research Env.

caBIG promotes Standards based applications, infrastructure and data sets Facilitating interoperability, collaboration and data sharing to speed cancer research GenePattern provides a standards-based caBIG compatible environment supporting; multidisciplinary biomedical research the rapid development, deployment and integration of new analytic techniques Addressing the problems

Comprehensive module repository Interfaces accessible to all levels of user Ability to chain tasks into reproducible pipelines for reproducible in silico research Ability to add new tools without programming Local or distributed computing A platform for integrative genomics

PreprocessDataset extract breast samples A platform for integrative genomics Graphical EnvironmentPipeline Environment Programming Environment Bicluster Heat Map Prediction Results # source("D:/CGP2003/GenePattern_modules/Golub_et_al_1999.R", echo = TRUE) # GenePattern # # Molecular Classification of Cancer: Class Prediction by Gene Expression # # Summary: This R/GenePattern script implements the supervised prediction metho # in Golub et al 1999, Science 286: (1999). # Load and set up GenePattern commands and server source(" echo = FALSE, print.ev server <- SOAPServer(" "/axis/servlet/AxisServlet", 7 source(paste(" ":", "/gp/getAllTaskWrappers.j # Neighborhood analysis MS.out <- MarkerSelection("data.filename" = " "class.filename" =“” "pred.results.file" = "pred.results", "data.results.file" = "data.results", "num.permutations" = "25", file.show(MS.out$pred.results) file.show(MS.out$data.results.gct) data <- read.table(MS.out$pred.results, header=T, sep="\t", skip=14) Analysis Task Manager Marker Selection Analysis Task WV Analysis Task SOM Analysis Task Transpose Analysis Task GenePattern remote data source Threshold impose a baseline and a ceiling HeatMapViewer project data as a heat map GeneNeighbors compute nearest neighbors of cyclin D1 in breast cells SelectFeaturesRows get expression data for breast neighbors in ovary cells SelectFeaturesColumns extract ovary samples Module Repository Task Integrator KNN WV SVM SOM PCA NMF FWER PCA

Modules are publicly hosted at the Broad Institute Users download modules from the module repository onto their own server Users check for new and updated modules and install them automatically GenePattern Module RepositoryUser’s GenePattern installation Module Repository

Analysis Modules Algorithm or other operation that processes data and creates result files, e.g. hierarchical clustering Visualizers Self-contained application that shows a graphical representation of data and allows user interaction, e.g. Heat Map Viewer Pipelines ( workflows ) Sequence of analysis tasks and visualizers that can be run, shared, and edited as a single entity Module Types

~90 Modules (10/07) Proteomics: SELDI, MALDI, LC-MS Noise Removal, Peak Detection, Peak Matching, Plot Spectra, ProteoArray Clustering, Prediction, Statistical Methods SOM, Hierarchical, Consensus, kNN, Weighted Voting, SVM, Missing value imputation, Kolomogorov-Smirnov score, NMF, PCA Marker Selection Class Neighbors, Gene Neighbors, Comparative (FWER, Q-value, FDR) Preprocessing/Utilities Threshold, Variation Filter, Transpose, Merge Dataset, Split Dataset, etc Data Conversion and Retrieval caArrayImportViewer, mzXML Import, MAGE-ML Import, GEO Download, Expression File Creator Visualizers Heat Map, Hierarchical Clustering, SOM, PCA, Feature Summary, Prediction Results, Gene List Significance, Comparative Marker Selection Annotation GeneCruiser, Affymetrix Chip Probe Conversion Pipelines Golub and Slonim, Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression, Nature 1999 Lu, Getz, Miska et. al. MicroRNA Expression Profiles Classify Human Cancers, Nature 2005 External modules adapted from: Bioconductor MeV (TIGR) Fred Hutchinson Cancer Center

Capture all steps in an analysis (esp. those omitted in published results) Re-run a methodology with different inputs Adapt an analysis for new uses Encapsulate complete in silico analyses in a single wrapper Maintain reproducibility regardless of future changes in code Reproducible Research via Pipelines

Add tasks and visualizers without writing code, via a Web-based form Tasks can be written in any language Once created, modules are usable by other users of a GenePattern server Edits are automatically versioned, so a pipeline can specify which version of a module to run Task Integrator Features

Users can run any module or pipeline as a routine call in a programming language. Pipelines can be converted to equivalent code. Libraries available for Java, R, and MATLAB, (Perl soon). Programming Language Environments

caBIG/GenePattern Functionality GenePattern integrates with caGrid in two ways; As a client As a service provider Three services available and published (IndexService and GME) PreprocessDataset Consensus Clustering Comparative Marker Selection

Architecture GenePattern Clients Graphical Client SOAP Analysis Task Manager PPD Algorithm Consensus Clustering Algorithm CMS Algorithm Web Browser Client HTTP GenePattern Engine caGrid SOAP caGrid Clients caGrid caGRID proxy caGrid Services caGrid Services caGrid Services caGrid Client caGrid

APIs caGrid service APIs for all three services Analysis Services Exposes API over caGRID Generated using caGRID tools Domain objects returned BioAssay (MAGE) Array (STAT-ML)

APIs Continued Security Not implemented, anonymous connections permitted

Object Model Analysis Parameters

Object Model Output Types (other than Mage/StatML)

Object Model STAT-ML

Object Model MAGE (partial)

Object Model Interfaces

Currently used by over 5300 researchers in over 500 commercial and non-profit organizations internationally. Adapted for use in analytical chemistry, metabolomics, quantum chemistry, and other analysis areas. Many resources exist to help users help desk online user forum on-line tutorial, FAQ, and documentation. Frequent workshops providing individual instruction in using GenePattern. GenePattern is a winner of the 2005 BioIT World Best Practices Award Community