Next
A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University
Expression energy for each gene (M=4,376) and for each voxel (N=51,533) Spatial Transcriptome
Clusters of Correlated Gene Expression
Gene Finder User navigates to voxel-of-interest in reference atlas volume and a fixed threshold AGEA correlation map appears Get a gene list from ABA is returned.
Large-scale data analysis How much structure is present across space and across genes? How would brain segment using ISH gene expression patterns ? Is there structure in patterns of expression of localized genes ? What do we learn from expression of genes in disorders?
Combine gene volumes into a large matrix Decompose the voxel x gene matrix M using the singular value decomposition (SVD) Any m x n matrix M can be decomposed: where u n (x) are the spatial modes, and v n (g) are the gene modes, n 2 is the energy in the n th mode Large-scale Correlation
Quality control → set of 3041 genes Combine gene volumes into a large matrix Decompose the voxel x gene matrix using singular value decomposition (SVD) voxels modes xx genes s.v.’s M ≈M ≈ “weight” spatial pattern gene pattern Large-scale Correlation
10 SVD A = USV T A (m by n) is any rectangular matrix (m rows and n columns) U (m by n) is an “orthogonal” matrix S (n by n) is a diagonal matrix V (n by n) is another orthogonal matrix Such decomposition always exists All matrices are real; m ≥ n
Why SVD ? Product of three orthogonal matrices Each voxel is a vector in 3041 dimensional space Looking for directions in vector space with most variability Each “mode” has 3 items – spatial pattern – voxels that are correlated gene pattern – weighing how much a gene expresses this mode a weight – how much of overall variability in data does this mode account for SVD modes are ordered – Each successive mode accounts for as much of the variability remaining in the data as is possible
N=271 before we get to 90% of the variance N=67 before we get to 80% of the variance Principal modes (SVD) Cerebral cortex Olfactory areas Hippocampus Retrohippocampal Striatum Pallidum Thalamus Hypothalamus Midbrain Pons Medulla Cerebellum All LH brain voxels plotted as projections on first 3 modes
Singular Value Decomposition
14 SVD for microarray data (Alter et al, PNAS 2000)
15 A = USV T A is any rectangular matrix (m ≥ n) Row space: vector subspace generated by the row vectors of A Column space: vector subspace generated by the column vectors of A The dimension of the row & column space is rank of the matrix A: r (≤ n) A is a linear transformation that maps a vector x in row space into vector Ax in column space
16 A = USV T U is an “orthogonal” matrix (m ≥ n) Column vectors of U form orthonormal basis for column space of A: U T U=I u 1, …, u n in U are eigenvectors of AA T – AA T =USV T VSU T =US 2 U T “Left singular vectors”
17 A = USV T V is an orthogonal matrix (n by n) Column vectors of V form an orthonormal basis for the row space of A: V T V=VV T =I v 1, …, v n in V are eigenvectors of A T A – A T A =VSU T USV T =VS 2 V T “Right singular vectors”
18 A = USV T S is diagonal matrix of non-negative singular values Sorted from largest to smallest Singular values are non-negative square root of corresponding eigenvalues of A T A and AA T
19 AV = US Means each Av i = s i u i Note A is linear map from row to column space A maps orthonormal basis {v i } in row space into orthonormal basis {u i } in column space Each component of u i is projection of a row onto vector v i
20 SVD of A (m by n): recap A = USV T = (big-"orthogonal")(diagonal)(sq- orthogonal) u 1, …, u m in U are eigenvectors of AA T v 1, …, v n in V are eigenvectors of A T A s 1, …, s n in S are nonnegative singular values of A AV = US means each Av i = s i u i “Every A is diagonalized by 2 orthogonal matrices”
21 Recap
22 Eigengenes
23 Copyright ©2000 by the National Academy of Sciences Alter, Orly et al. (2000) Proc. Natl. Acad. Sci. USA 97, Genes Ranking – Correlation of Top 2 Eigengenes
24 Copyright ©2000 by the National Academy of Sciences Alter, Orly et al. (2000) Proc. Natl. Acad. Sci. USA 97, Normalized elutriation expression in the subspace associated with the cell cycle
25 SVD in Row Space x y v1v1 x y This line segment that goes through origin approximates the original data set The projected data set approximates the original data set x y s1u1v1Ts1u1v1T A
26 SVD = PCA? Centered data x y x’ y’
27 x y x’ y’ y’’ x’’ Translation is not a linear operation, as it moves the origin ! PCA SVD SVD = PCA?
Returning Back …
Quality control → set of 3041 genes Combine gene volumes into a large matrix Decompose the voxel x gene matrix using singular value decomposition (SVD) voxels modes xx genes s.v.’s M ≈M ≈ “weight” spatial pattern gene pattern Large-scale Correlation
Interpreting gene modes Spatial modes are easily visualized. Attempt to annotate eigenmodes using Gene Ontology (GO) annotations: Each GO term partitions gene list into two subsets: IN genes: Genes annotated by that GO term OUT genes: Genes not annotated by that GO term Each singular vector associates each subset above with a set of amplitudes Compare these amplitudes, asking whether ‘IN’ genes have larger magnitudes than ‘OUT’ genes use K-S test to test whether the amplitude distributions are different
Component annotations