SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies.

Slides:



Advertisements
Similar presentations
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.
Advertisements

Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Statistics Tools in GeneSpring The Center for Bioinformatics UNC at Chapel Hill Jianping Jin Ph.D. Bioinformatics Scientist Phone: (919)
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Packard BioScience. Packard BioScience What is ArrayInformatics?
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Introduction to Bioinformatics - Tutorial no. 12
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
1/27 Ensemble Visualization for Cyber Situation Awareness of Network Security Data Lihua Hao 1, Christopher G. Healey 1, Steve E. Hutchinson 2 1 North.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Copyright OpenHelix. No use or reproduction without express written consent1.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Copyright OpenHelix. No use or reproduction without express written consent1.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Flat clustering approaches
Copyright OpenHelix. No use or reproduction without express written consent1.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
© Copyright Mistras Group Inc MISTRAS GROUP CONFIDENTIAL Noesis Noesis specializes in Acoustic Emission (AE) data analysis including real-time software.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Getting GO annotation for your dataset
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
GPX: Interactive Exploration of Time-series Microarray Data
Dimension reduction : PCA and Clustering
Causal Models Lecture 12.
CSE572: Data Mining by H. Liu
Presentation transcript:

SeqExpress: Introduction

Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies and clusters. Analysis Tools  Cluster analysis, refinement and validation.  Using mixture modelling.  Graphs and Hierarchies. Data Tools  Data Import/Export tools (Remote access of GEO, local access of tab separated and MAGE format).  Data Integration: optional underlying data and annotation database.  Data Manipulation.

SeqExpress: Visualisation Tools

Visualisations Data Visualisation:  Gene Expression;  Gene Variance;  Gene Function/Ontology; and  Chromosome Features. Analysis Visualisations:  Hierarchies/Graphs;  Probabilistic Methods; and  Cluster Comparison.

Gene Expression Also: Histograms, Annotation lists and Gene Tables Scatter PlotsParallel Plots

Gene Variance Gene Spectrums Gene Clouds

Gene Ontology Visualisations TreeMaps Graphs Tables

Chromosome Feature Visualisations

Data Analysis Probability Models Dendrograms Cluster Comparison

Example: Viewing Clusters A cluster has been selected in the gene tab. The genes are then selected in a scatter plot, a parallel plot and the histogram.

Example: Gene Function Selection The binding term has been selected from the results of an ontology term search. The binding term is then automatically selected in the Function tab, as well as the open Tree Map visualisation. All genes that have been annotated with the binding term are also selected in the parallel plot.

Example: Genome Location A combined expression profile and location-based cluster analysis has been performed and the results viewed. The parallel plot shows the similar expression profiles, whilst the two genome views show the locale of the genes. The genome view in the middle is set to auto-zoom, and so shows the locale in detail.

Example: Data Analysis A series of models have been generated, and the genes with a high probability of belonging to one of the models has been selected in the model viewer. The corresponding location of the genes and their expression profiles are then shown

Summary Number of visualisations available to support variety of tasks: Expression Ontology (plus pathway and protein-protein interaction) Location Hierarchies Cluster comparison Variance Probability-theory Visualisations inter-linked

SeqExpress: Analysis Tools

Analysis Tools 1: Clusters, Hierarchies and Concepts Clustering:  Distance based  Refinement (ontology or model based).  Validation (C-Index) Hierarchies: SDD*, Hierarchical Projection:  Covariance*: eigen(covar(A)) or A=USV T  Co-occurrence*: P(g,e)=P(g)ΣP(e|z)P(z|g) *Used for global/enterprise-wide information retrieval

Cluster Distances Expression Function Location Pearson, Cosine Euclidian, Manhattan. Information theory: 2*N3/(N1+N2+2*N3) Intra gene distance distance to feature

SAGE: Semi Discrete Decomposition Immunity to outliers Uses local density Describes both experiments and genes Hierarchical description Stencils means that fold-in possible Highly scalable

Analysis Tools 2: Models and Graphs Graphs: Two factor analysis using (1)Graph Connectivity and (2) Edge Length. Models: N-factor analysis using product rule: P(A,B|C)=P(A|BC)*P(B|C). Multi-factor analysis to identify complex features within the data (e.g. genes which have both a similar expression profile and are located on the same part of a chromosome)

Models: Discovery Different models can be found, and altered using energy parameters and tempering.

Spline (beta 0.1) Linear (beta 0.6) Cosine (beta 1.1)Normal (beta 0.1)

Models: Usage Clusters generation: High probabilities equate to cluster membership. Fitting data: Use normal tissues to fit models to genes, use disease tissues to fit genes to models. Changed behaviour equates to likelihood of model transition. Combining models: complex feature identification (given feature X on condition Y).

Graph: Discovery Graph connectivity equates to:  MST of expression values  Sub-graphs of the gene ontology  Chromosome relationship Edge Distance equates to:  Expression distance  Network (ontology) distance  Linear chromosomal distance Graph partitioned:  regular (using Metis)  irregular (Min/Max)

Analysis: Summary Desktop analysis. Number of techniques available. Techniques can be customised for different data sets (e.g. organism, array type). Borrows heavily from Information Retrieval. Probabilistic techniques show most promise.

SeqExpress: Data Tools

Data Analysis Data Import/Export tools:  Remote access of GEO (one click access),  Import tab separated and MAGE format.  Export tab separated and Bioconductor format Data Integration: data and annotation database.  Automatic and configurable annotation mapping (e.g. SAGE tag to locuslink (entrez gene?) to unigene) Data Manipulation: transformation, filtering and constraining

Data Integration: GEO

Data Integration: Annotation Builder

SeqExpress: Summary

Summary Written in C#, is free and runs under windows. Not associated with any academic institution, funding body or commercial organisation. Development is still ongoing. Plan to develop to the Expression Application Class Specification. Looking for employment in Seattle…