SeqExpress: Introduction
Features Visualisation Tools Data: gene expression, gene function and gene location. Analysis: probability models, hierarchies and clusters. Analysis Tools Cluster analysis, refinement and validation. Using mixture modelling. Graphs and Hierarchies. Data Tools Data Import/Export tools (Remote access of GEO, local access of tab separated and MAGE format). Data Integration: optional underlying data and annotation database. Data Manipulation.
SeqExpress: Visualisation Tools
Visualisations Data Visualisation: Gene Expression; Gene Variance; Gene Function/Ontology; and Chromosome Features. Analysis Visualisations: Hierarchies/Graphs; Probabilistic Methods; and Cluster Comparison.
Gene Expression Also: Histograms, Annotation lists and Gene Tables Scatter PlotsParallel Plots
Gene Variance Gene Spectrums Gene Clouds
Gene Ontology Visualisations TreeMaps Graphs Tables
Chromosome Feature Visualisations
Data Analysis Probability Models Dendrograms Cluster Comparison
Example: Viewing Clusters A cluster has been selected in the gene tab. The genes are then selected in a scatter plot, a parallel plot and the histogram.
Example: Gene Function Selection The binding term has been selected from the results of an ontology term search. The binding term is then automatically selected in the Function tab, as well as the open Tree Map visualisation. All genes that have been annotated with the binding term are also selected in the parallel plot.
Example: Genome Location A combined expression profile and location-based cluster analysis has been performed and the results viewed. The parallel plot shows the similar expression profiles, whilst the two genome views show the locale of the genes. The genome view in the middle is set to auto-zoom, and so shows the locale in detail.
Example: Data Analysis A series of models have been generated, and the genes with a high probability of belonging to one of the models has been selected in the model viewer. The corresponding location of the genes and their expression profiles are then shown
Summary Number of visualisations available to support variety of tasks: Expression Ontology (plus pathway and protein-protein interaction) Location Hierarchies Cluster comparison Variance Probability-theory Visualisations inter-linked
SeqExpress: Analysis Tools
Analysis Tools 1: Clusters, Hierarchies and Concepts Clustering: Distance based Refinement (ontology or model based). Validation (C-Index) Hierarchies: SDD*, Hierarchical Projection: Covariance*: eigen(covar(A)) or A=USV T Co-occurrence*: P(g,e)=P(g)ΣP(e|z)P(z|g) *Used for global/enterprise-wide information retrieval
Cluster Distances Expression Function Location Pearson, Cosine Euclidian, Manhattan. Information theory: 2*N3/(N1+N2+2*N3) Intra gene distance distance to feature
SAGE: Semi Discrete Decomposition Immunity to outliers Uses local density Describes both experiments and genes Hierarchical description Stencils means that fold-in possible Highly scalable
Analysis Tools 2: Models and Graphs Graphs: Two factor analysis using (1)Graph Connectivity and (2) Edge Length. Models: N-factor analysis using product rule: P(A,B|C)=P(A|BC)*P(B|C). Multi-factor analysis to identify complex features within the data (e.g. genes which have both a similar expression profile and are located on the same part of a chromosome)
Models: Discovery Different models can be found, and altered using energy parameters and tempering.
Spline (beta 0.1) Linear (beta 0.6) Cosine (beta 1.1)Normal (beta 0.1)
Models: Usage Clusters generation: High probabilities equate to cluster membership. Fitting data: Use normal tissues to fit models to genes, use disease tissues to fit genes to models. Changed behaviour equates to likelihood of model transition. Combining models: complex feature identification (given feature X on condition Y).
Graph: Discovery Graph connectivity equates to: MST of expression values Sub-graphs of the gene ontology Chromosome relationship Edge Distance equates to: Expression distance Network (ontology) distance Linear chromosomal distance Graph partitioned: regular (using Metis) irregular (Min/Max)
Analysis: Summary Desktop analysis. Number of techniques available. Techniques can be customised for different data sets (e.g. organism, array type). Borrows heavily from Information Retrieval. Probabilistic techniques show most promise.
SeqExpress: Data Tools
Data Analysis Data Import/Export tools: Remote access of GEO (one click access), Import tab separated and MAGE format. Export tab separated and Bioconductor format Data Integration: data and annotation database. Automatic and configurable annotation mapping (e.g. SAGE tag to locuslink (entrez gene?) to unigene) Data Manipulation: transformation, filtering and constraining
Data Integration: GEO
Data Integration: Annotation Builder
SeqExpress: Summary
Summary Written in C#, is free and runs under windows. Not associated with any academic institution, funding body or commercial organisation. Development is still ongoing. Plan to develop to the Expression Application Class Specification. Looking for employment in Seattle…