INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.

Slides:



Advertisements
Similar presentations
Easily retrieve data from the Baan database
Advertisements

The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
INTRODUCTION We connect, in a complete pipeline, an ontology-based environment for proteomics spectra management with a distributed complete validation.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
ORACLE Lecture 1: Oracle 11g Introduction & Installation.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Microarray GEO – Microarray sets database
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
DESCRIPTION: AutomN is concerned with automating the tedious task of protein interaction pathway discovery using only protein sequences as input. AutomN.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
INTRODUCTION We present an integrated computational platform for the analysis of time varying microarray data obtained from dynamic stimulus-response experiments.
Gene Expression Omnibus (GEO)
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
INFSO-RI Enabling Grids for E-sciencE BioDCV: a grid-enabled complete validation setup for functional profiling EGEE User Forum.
Gene expression analysis
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
Tracker data quality monitoring based on event display M.S. Mennea – G. Zito University & INFN Bari - Italy.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Bioinformatics Core Facility Guglielmo Roma January 2011.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Data mining. Data mining, at its core, is the transformation of large amounts of data into meaningful patterns and rules.
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Enabling Grids for E-sciencE ITC-irst for NA4 biomed meeting at EGEE conference: Ginevra 2006 BioDCV - Features 1.Application for analysis of microarray.
LOGO/ICON Keval Mehta School of Informatics Master of Science in Bioinformatics Andrews Dalkilic Team Dr. Mehmet Dalkilic, Dr. Justen Andrews, Dr. John.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Introduction to Oncomine Xiayu Stacy Huang. Oncomine is a cancer-specific microarray database and has a web-based data-mining platform aimed at facilitating.
ArrayExpress Ugis Sarkans EMBL - EBI
Biological data representation and data mining Xin Chen
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
1 Survey of Profiles from Other Domains XMSF Profile SG 13 January 2004 Curt Blais and NPS MV3250 (Introduction to XML, 1st Quarter 2005) Katherine L.
A web portal for management of biological data and applications
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Data Warehousing and Data Mining
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Presentation transcript:

INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed at providing graphical interaction between systems for gene-profiling and MIAME-compliant databases. The prototype has been developed to support outlier analysis and semi- supervised class discovery in microarray data experiments. The module is designed to integrate the newly developed PostgreSQL porting of the GUS/RAD platform [1,2] with a display automatically built in Scalable Vector Graphics (SVG). The display organizes the graphical outputs from a predictive classification system, supporting query construction and retrieval of MIAME annotation linked to automatically or manually selected curves. THE PROTOTYPE This first version provides an interface to sample-tracking curves (profiles of classification errors of single samples as a function of gene panel sizes), as derived from the ERFE-SVM gene ranking system [3]. We automatically cluster these curves according to a Dynamic Time Warping (DTW) metric [4], obtaining hypotheses on the potential presence of outliers and of subtypes. The analysis is a by-product of the ERFE-SVM complete cross-validation set-up, which is run on a Open Mosix Linux cluster facility. Scripts based on the trellis (lattice) graphics library of the R computing environment are interfaced to the classification system. The SVG directives providing the interactive display are also directly built by R, according to an adaptation of the RSVG driver package. FEATURES The user may pick up one or more curves from the display, or consider indication from unsupervised hierarchical clustering (from the standard R clustering package), and construct specific queries. In particular, given a potential outlier sample [5], the user may retrieve information on the biomaterial, or on the experimental conditions. We plan to fit the new module within the RAD (RNA Abundance Database) schema and to further support the interaction with the classification setup. The prototype is currently interfaced to a standalone PostgreSQL database, and a few elementary features have been implemented in order to covariate the selected samples with phenotype information possibly present in the dataset. REFERENCES [1] Manduchi, E., Pizarro, A., Stoeckert, C. (2001). RAD (RNA Abundance Database): an infrastructure for array data analysis. Proc. SPIE, vol 4266, pp [2] Manduchi E. et al. RAD and the RAD Study-Annotator: an approach to collection, organization, and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics, 20(4): [3] Furlanello, C., Serafini, M., Merler, S., and Jurman, G. (2003). Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics, 54(4). [4] Aach, J. and Church, G. M. (2001). Aligning gene expression time series with time warping algorithms. Bioinformatics, 17(6): [5] Furlanello, C., Merler, S., Jurman, G., and Serafini, M. Unsupervised Discovery from Gene Tracking with RFE Classification Systems. ISMB/ECCB Interfacing predictive models with MIAME compliant databases Cesare Furlanello, Maria Serafini, Silvano Paoli, Giuseppe Jurman ITC-irst, Trento, Italy -- MGED 7 September 8-10, 2004 Toronto, ON, Canada DATA In this example, the prototype is connected to PostgreSQL data tables. Microarray data: mouse model of Myocardial Infarction from the Cardiogenomics PGA - Genomics of Cardiovascular Development, Adaptation, and Remodeling - NHLBI Program for Genomic Applications, Harvard Medical School. In its final version, GUS/RAD will become its natural interface to the data. The development of the PostgreSQL porting of GUS is on its way. The MPBA group at ITC-irst is a member of the team involved in the project. (a)Gene profiling tasks require intensive computational resources. Our E-RFE system for gene profiling [2] is currently implemented on a high-throughput computing facility, the MPA-HTC Linux Cluster. Discovery of outlier patterns and of potential subtypes, and analysis of gene importance may be derived as a by- product of the computation (e.g. as needed by a complete validation setup to avoid selection bias). QUESTIONS 1.Interact with the resources (Cluster+Algorithms) for understanding and refining machine learning results 2.Provide access to the gene profiling algorithms and their outcomes through a web service 3.Connect to MIAME-compliant information to support investigation and discovery Build query Zoom on plot Choose the cluster you are interested in and display the curves for the selected cluster Selection of sample-tracking curves is obtained from DTW-based clustering. Curves from selected cluster are added to the sample analysis area and are ready for query. Query the Database for info on the selected (blue) sample, or for all those listed in the working area or displayed in the image: Browse through the samples, then select/remove the current curve from the working area Save in JPG format the selected (blue) curve or all those displayed in the working area Interface: Profile Browser, Working Area, Query Tools (b)EXAMPLE: Interfacing to sample-tracking profiles. We study the influence of gene panel sizes on predictive classification error, on a sample-by-sample basis. Errors are accumulated on multiple replicated runs in which the sample is in test, and plotted for increasing panel sizes. Specific sample-tracking profiles may be investigated to discover patterns (potential outliers, subtypes). How to automate the discovery of patterns and interconnect the investigation to experimental, biological and clinical data about the microarray? Automating discovery: DTW-based clustering Scalable Vector Graphic SVG is a language for describing two- dimensional graphics and graphical applications in XML. SVG 1.1 is a W3C Recommendation and forms the core of the current SVG developments.