Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics & Bioinformatics Unit
Biostatistics & Bioinformatics Unit (BBU) Bioinformatics resource for Institutions across Wales Backing of the Higher Education Funding Council for Wales - £1.5 million grant through the Research Capacity Development Fund UWCM, Cardiff University, Aberystwyth 13 new posts in statistics & bioinformatics MSc/Postgraduate Diploma/Postgraduate Certificate: Bioinformatics Genetic Epidemiology and Bioinformatics
Brief Overview of Microarray Bioinformatics Introduce My Microarray Research Interests My Microarray Analysis Software
Experimental Design Differential Gene Expression Hybridisation Data Pattern Discovery Class Prediction Annotation Normalisation Bioinformatics in Microarray Experiment
Normalization Remove non-biological influences on data (systematic variation) 3 categories of Normalisation Normalisation – transform data to make more like a normal distribution log, lowess, linlog Standardisation – expand or contract distribution so data from different experiments can be compared calculate Z-scores Centralisation – move distribution so its centered around expected mean mean / median / mean trimmed centering
Experimental Design Differential Gene Expression Hybridisation Data Pattern Discovery Class Prediction Annotation Normalisation Bioinformatics in Microarray Experiment
With Replicates Parametric tests t-test (ANOVA) J. Comput. Biol : Bayesian t-test Bioinformatics : Mixture modelling & bootstrapping(SAM) P.N.A.S : Regression modelling Genome Res : All give similar results but SAM reduces false positives Non Parametric Tests Wilcoxon rank sum test Bioinformatics : Non-parametric t-test Bioinformatics : Ideal discriminator method Bioinformatics : low false positive rate but less power Find Differentially Expressed Genes Is fold change significant?
Experimental Design Differential Gene Expression Hybridisation Data Pattern Discovery Class Prediction Annotation Normalisation Bioinformatics in Microarray Experiment
Pattern Discovery & Class Prediction Explore how genes or samples group: Clustering Hierarchical Cluster AnalysisHIERARCHY K-Means Self Organising Maps (SOM)PARTITION Fuzzy ART Principal Components Analysis (PCA) Multidimensional Scaling (MDS)REDUCTION Correspondence Analysis (CoA) Assign genes to known groupings: Classification logistic regression neural networks linear discriminant analysis
Hierarchical Cluster Analysis
Partitioning Clustering Methods Need To Tell Methods Number of Clusters Genes Partitioned into Clusters What are Relationships Between Clusters? K-Means & SOM
2D & 3D Mapping Methods CoA MDS PCA Data Projected onto 2 or 3 Dimensions But….What are Cluster Boundaries?
Experimental Design Differential Gene Expression Hybridisation Data Pattern Discovery Class Prediction Annotation Normalisation Bioinformatics in Microarray Experiment
Online Tools: ARROGANT DAVID DRAGON EASE FANTOM GoMiner MatchMiner Onto-Express RESOURCERER Affymetrix GO Databases: Gene Ontology OMIM LocusLink UniGene LocusLink Annotation
My Research Interests Pattern Discovery Algorithm Development Biologist-Friendly Software Tools Take - 2D & 3D Mapping Methods Methods - Define Cluster Boundaries Make FUZZY EAS-IEAS-I 2D & 3D Visualisation Tools
Cluster Boundaries CoA MDS PCA
Fuzzy Clustering Differs to standard clust by assigning membership of a gene to all clusters Allows you to see the association of each gene within a cluster Can calculate the number of clusters in Partitioning methods (Fuzzy ART) Helps Combine Clusters Helps to clear Ambiguity
Fuzzy Mapping Add Membership values of each gene to clusters
Fuzzy Partitioning K-Means & SOM
Need for Comprehensive Pattern Discovery Software Suite Fuzzy Data Analysis Suite Visualisation Tools to explore data Easy to use Free Microarray Pattern Discovery BBUnit Web based version Service by BBU Increase traffic to BBU web site Establish BBU for microarray Cross platform
INTERFACE Normalisation Differential Gene Expression Pattern Discovery Utilities Log Normalise Mean Centre Median centre T test ANOVA Regression Hierarchical Cluster Analysis SOM K-Means Fuzzy Art PCA MDS CoA Fuzzy C-Means
Contact
Pete Kille Alan Clarke Gareth Hughes(EASI team) Karen Reed(Data) Lesley Jones(Data, & EASI Collaborator) BBU Acknowledgements