Sporulation in Bacillus

Slides:



Advertisements
Similar presentations
BioInformatics (3).
Advertisements

Basic Gene Expression Data Analysis--Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Unsupervised learning
UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Clustering CMPUT 466/551 Nilanjan Ray. What is Clustering? Attach label to each observation or data points in a set You can say this “unsupervised classification”
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
Introduction to Bioinformatics Algorithms Clustering.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Introduction to Bioinformatics - Tutorial no. 12
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Gene expression & Clustering (Chapter 10)
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
CHAPTER 1: Introduction. 2 Why “Learn”? Machine learning is programming computers to optimize a performance criterion using example data or past experience.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning: Clustering
Functional genomics + Data mining
Unsupervised Learning: Clustering
Chapter 15 – Cluster Analysis
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Machine Learning Clustering: K-means Supervised Learning
Data Mining, Neural Network and Genetic Programming
Gene expression.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Image from Gene-Chips (Micorrrays) Statistics for microarray analysis (SMA)
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Assessing Hierarchical Modularity in Protein Interaction Networks
Hierarchical clustering approaches for high-throughput data
Lecture 22 Clustering (3).
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Clustering BE203: Functional Genomics Spring 2011 Vineet Bafna and Trey Ideker Trey Ideker Acknowledgements: Jones and Pevzner, An Introduction to Bioinformatics.
Clustering.
Multivariate Statistical Methods
GPX: Interactive Exploration of Time-series Microarray Data
Dimension reduction : PCA and Clustering
StatQuest!
Text Categorization Berlin Chen 2003 Reference:
Gene Expression Analysis
Clustering.
Machine Learning and Data Mining Clustering
Presentation transcript:

Sporulation in Bacillus Dormant spore Growth Stage VI,VII. Maturation, Cell lysis Germination Stage V. Spore Coat Polar division Medial division Stage IV. Cortex Vegetative cycle Stage II. Asymmetric cell division Stage III. Engulfment After Errington, 2004

There is a hierarchy of gene expression during sporulation Sporulation gene expression is temporally regulated by a transcription factor cascade sF sG sK sE Spo0A sA There is a hierarchy of gene expression during sporulation

Which genes are controlled by which transcription factor?? sE sG sK sF Spo0A sA What if we knock-out a transcription factor gene?!

Which genes are controlled by which transcription factor?? sE sF sG sK Spo0A sA What if we knock-out a transcription factor gene?!

B. Subtilis spotted dsDNA microarray Contains ~4100 B. subtilis genes as PCR products

High speed spotting robot

Microarray hybridization

Raw microarray data is hard to interpret!

Image Analysis & Data Visualization Cy5 Cy3 Cy5 Cy3 log2 Cy3 Cy5 Underexpressed Overexpressed 8 4 2 fold

Experimental Design Spo0A Spo0A sA sA

Introduction to Clustering “An intelligent being cannot treat every object it sees as a unique entity unlike anything else in the universe. It has to put objects in categories so that it may apply its hard-won knowledge about similar objects encountered in the past, to the object at hand.” Steven Pinker, from How the Mind Works, 1997

Class prediction using supervised learning Classification by gene expression required a training set i.e. we had a priori knowledge of the system.

Clustering is an unsupervised method for data exploration microarrays Genes No training set or preconceived notions about the data labels are required. The data will reveal its natural structure to us

We start with many nodes, and end up with only one! Agglomerative Hierarchical Clustering We start with many nodes, and end up with only one!

Hierarchies are ubiquitous in biology N. Pace, SCIENCE, 1997

Clustering Terminology Clustering Dendrogram Genes Gene names “pseudogenes” Edge length is proportional to “distance” between connected genes or nodes

Clustering Reveals the "Molecular Logic" of Gene Expression Genes Experiments

Similarity Metrics In order to implement a clustering algorithm, we require some quantitative concept comparing the behaviour of two genes across some set of conditions Are they behaving similarly, or differently?

between two coordinates Euclidian Distance 2 3 Y (1,4) What is the distance between two coordinates In 2D space? (3,1) X From Pythagoras, distance = sqrt(22 + 32)

d = Dx2 + Dy2+Dz2 Euclidian Distance How about objects in 3D space? (2,4,1) X (0,0,0) d = Dx2 + Dy2+Dz2 Z

d = |X Y| = S (xi - yi)2 Euclidian Distance X = (xi, xi+1, xi+2,…,xn) It turns out that the Euclidian distance generalizes to N-dimensional space.. d = |X Y| = S N (xi - yi)2 i = 1 X = (xi, xi+1, xi+2,…,xn) Y = (yi, yi+1, yi+2,…,yn) These look an awful lot like a list in Perl, or a line of gene expression data, yes? One way to conceptualize an individual gene expression vector as therefore as a coordinate in some high-dimensional space. If we have two such vectors, then we can use the Euclidian distance to ask “How far apart are they?”

S r = (xi - ux ) (yi - uy ) Nsysx Pearson Correlation Coefficient Kellie introduced the Pearson as a true correlation Measure that varies in the range -1 to 1

S S S S S S S r = xi yi N (xi yi ) - ( ) ( ) N (xi )2 -( xi )2 N Pearson Correlation Coefficient computational form N S i = 1 N S i = 1 N S i = 1 xi yi N (xi yi ) - ( ) ( ) r = N S i = 1 N S i = 1 N S i = 1 N S i = 1 N (xi )2 -( xi )2 N (yi )2 -( yi )2 Incredibly, this form makes our lives easier if we want to implement a Pearson() subroutine in Perl!

Strategies for clustering Single linkage clustering Similarity between the clusters is defined as the similarity of the closest pair of observations between the two groups

Strategies for clustering Complete linkage clustering Similarity between the clusters is defined as the similarity of the farthest pair of observations between the two groups

Strategies for clustering Average linkage clustering Nodes are represented by the average of vectors from the two component nodes, and the average pairwise distance within the newly formed cluster is thus minimized

S Average Linkage Clustering X = (1, 4, 2,-1) Y = (3, 2,-2,-3) Once we have decided that two genes (or nodes) should join to make a new node, how do we define the contents of the new node? X = (1, 4, 2,-1) Y = (3, 2,-2,-3) Avg(X,Y) = (2, 3, 0,-2) This makes life easy: avg( avg(I,J), avg(K,L) ) = avg(I,J,K,L)

Cluster implements various flavours of clustering algorithms, Cluster and TreeView by Mike Eisen http://rana.lbl.gov/EisenSoftware.htm Cluster implements various flavours of clustering algorithms, While TreeView provides a graphical output of the files produced by Cluster