Sporulation in Bacillus

Sporulation in Bacillus
Dormant spore Growth Stage VI,VII. Maturation, Cell lysis Germination Stage V. Spore Coat Polar division Medial division Stage IV. Cortex Vegetative cycle Stage II. Asymmetric cell division Stage III. Engulfment After Errington, 2004

There is a hierarchy of gene expression during sporulation
Sporulation gene expression is temporally regulated by a transcription factor cascade sF sG sK sE Spo0A sA There is a hierarchy of gene expression during sporulation

Which genes are controlled by which transcription factor??
sE sG sK sF Spo0A sA What if we knock-out a transcription factor gene?!

Which genes are controlled by which transcription factor??
sE sF sG sK Spo0A sA What if we knock-out a transcription factor gene?!

B. Subtilis spotted dsDNA microarray
Contains ~4100 B. subtilis genes as PCR products

High speed spotting robot

Microarray hybridization

Raw microarray data is hard to interpret!

Image Analysis & Data Visualization
Cy5 Cy3 Cy5 Cy3 log2 Cy3 Cy5 Underexpressed Overexpressed 8 4 2 fold

Experimental Design Spo0A Spo0A sA sA

Introduction to Clustering
“An intelligent being cannot treat every object it sees as a unique entity unlike anything else in the universe. It has to put objects in categories so that it may apply its hard-won knowledge about similar objects encountered in the past, to the object at hand.” Steven Pinker, from How the Mind Works, 1997

Class prediction using supervised learning
Classification by gene expression required a training set i.e. we had a priori knowledge of the system.

Clustering is an unsupervised method for data exploration
microarrays Genes No training set or preconceived notions about the data labels are required. The data will reveal its natural structure to us

We start with many nodes, and end up with only one!
Agglomerative Hierarchical Clustering We start with many nodes, and end up with only one!

Hierarchies are ubiquitous in biology
N. Pace, SCIENCE, 1997

Clustering Terminology
Clustering Dendrogram Genes Gene names “pseudogenes” Edge length is proportional to “distance” between connected genes or nodes

Clustering Reveals the "Molecular Logic" of Gene Expression
Genes Experiments

Similarity Metrics In order to implement a clustering algorithm, we require some quantitative concept comparing the behaviour of two genes across some set of conditions Are they behaving similarly, or differently?

between two coordinates
Euclidian Distance 2 3 Y (1,4) What is the distance between two coordinates In 2D space? (3,1) X From Pythagoras, distance = sqrt( )

d = Dx2 + Dy2+Dz2 Euclidian Distance How about objects in 3D space?
(2,4,1) X (0,0,0) d = Dx2 + Dy2+Dz2 Z

d = |X Y| = S (xi - yi)2 Euclidian Distance X = (xi, xi+1, xi+2,…,xn)
It turns out that the Euclidian distance generalizes to N-dimensional space.. d = |X Y| = S N (xi - yi)2 i = 1 X = (xi, xi+1, xi+2,…,xn) Y = (yi, yi+1, yi+2,…,yn) These look an awful lot like a list in Perl, or a line of gene expression data, yes? One way to conceptualize an individual gene expression vector as therefore as a coordinate in some high-dimensional space. If we have two such vectors, then we can use the Euclidian distance to ask “How far apart are they?”

S r = (xi - ux ) (yi - uy ) Nsysx Pearson Correlation Coefficient
Kellie introduced the Pearson as a true correlation Measure that varies in the range -1 to 1

S S S S S S S r = xi yi N (xi yi ) - ( ) ( ) N (xi )2 -( xi )2 N
Pearson Correlation Coefficient computational form N S i = 1 N S i = 1 N S i = 1 xi yi N (xi yi ) - ( ) ( ) r = N S i = 1 N S i = 1 N S i = 1 N S i = 1 N (xi )2 -( xi )2 N (yi )2 -( yi )2 Incredibly, this form makes our lives easier if we want to implement a Pearson() subroutine in Perl!

Strategies for clustering
Single linkage clustering Similarity between the clusters is defined as the similarity of the closest pair of observations between the two groups

Complete linkage clustering Similarity between the clusters is defined as the similarity of the farthest pair of observations between the two groups

Average linkage clustering Nodes are represented by the average of vectors from the two component nodes, and the average pairwise distance within the newly formed cluster is thus minimized

S Average Linkage Clustering X = (1, 4, 2,-1) Y = (3, 2,-2,-3)
Once we have decided that two genes (or nodes) should join to make a new node, how do we define the contents of the new node? X = (1, 4, 2,-1) Y = (3, 2,-2,-3) Avg(X,Y) = (2, 3, 0,-2) This makes life easy: avg( avg(I,J), avg(K,L) ) = avg(I,J,K,L)

Cluster implements various flavours of clustering algorithms,
Cluster and TreeView by Mike Eisen Cluster implements various flavours of clustering algorithms, While TreeView provides a graphical output of the files produced by Cluster

Sporulation in Bacillus

Similar presentations

Presentation on theme: "Sporulation in Bacillus"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sporulation in Bacillus

Similar presentations

Presentation on theme: "Sporulation in Bacillus"— Presentation transcript:

Similar presentations

About project

Feedback