Supplemental Material

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Clustering II.
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
What is Cluster Analysis?
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
DATA MINING LECTURE 8 Clustering The k-means algorithm
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
A Big Thanks Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University Dr. Luis Ibanez Open Source Proponent,
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Medical Imaging Dr. Mohammad Dawood Department of Computer Science University of Münster Germany.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Gene expression analysis
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Computational Biology
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Mining: Basic Cluster Analysis
Semi-Supervised Clustering
More on Clustering in COSC 4335
Clustering CSC 600: Data Mining Class 21.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
John Nicholas Owen Sarah Smith
Dimension reduction : PCA and Clustering
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Presentation transcript:

Supplemental Material

A Big Thanks Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University

The Process Construction and representation of the Anatomic Gene Expression Atlas (AGEA).

Allen Reference Atlas

3D Nissl volume comes from rigid reconstruction Each section reoriented to match adjacent images as closely as possible A 1.5T low resolution 3D average MRI volume used to ensure reconstruction is realistic Reoriented Nissl section down-sampled, converted to grayscale Isotropic 25μm grayscale volume.

Anatomy 208 large structures and structural groupings extracted Projected & smoothed onto 3D atlas volume to for structural annotation Additional decomposition of cortex into an intersection of 202 regions and areas

The Process Construction and representation of the Anatomic Gene Expression Atlas (AGEA).

InSitu Hybridization or ISH Each gene ISH series is reconstructed from serial sections (200 μm spacing) Coronal section Sagittal section

Why ISH ? Phenotypic properties in cells result of unique combination of expressed gene products Gene expression profiles => define cell types.

6 genes on 1 brain Each gene on 56 sections 2 sections are for Nissl

8 genes on 1 brain Each gene on 20 Sections.

ISH – Tissue Preparation & Imaging Process Sectioning Staining (Non-isotopic digoxigenine (DIG)) Washing Imaging

ISH – Probe Preparation

Traditional Approach vs. ISH Histology One gene at a time For 20,000 genes need x (5 or 14) slides ~1year DNA microarrays & SAGE - Applied to large brain region Cannot differentiate neuronal subtypes Kamme, F et. al. J. Neurosci (2003) Sugino, K. et. al. Nature Neurosci (2006) in situ hybridization measures expression & preserves spatial information for single gene Finer resolution – cellular but not single cell Data can be used to analyze Gene expression Gene regulation CNS function (spatial) Cellular phenotype (spatial)

Reproducibility For multiple genes, inbred mouse strain used Although different mice used for different genes, expression for under same environmental conditions are reproducible.

Is ISH Reproducible? Primary Source of variation comes from Riboprobes Day-to-day variability Biological variability in brains Still with inbred mice, variation between brains is significant.

Processing Expression StatisticsReconstruction – 3D Data accessed by standard coord system – 200^3 μm voxels Ontology of Allen Reference Atlas used to label individual voxels

Grid Based Nearest Plane

Registration - Key Volumes iteratively registered to AB atlas using affine and locally nonlinear warping Registration good to ~200 microns Local deformation field example

3D Annotation

Lower dimensional data volumes Analyze binned expression volumes at 200 µm 3 resolution  ~31,000 image series (mostly single hemisphere, sagittal series)  4,104 unique genes available from coronally sectioned brains Each volume is 67 x 41 x 58 voxels (about 50k brain voxels)  Comparable to fMRI resolution

Data normalization Background correction & Registration Intensity normalization – Correct background from negative control Registration - Map the image to the reference atlas Smoothed Expression Energy  Sum of intensities of expressing cells / # of cells in the voxel  An average over many cells of diverse types

ISH Signal (c) Coronal plane in situ hybridization (ISH) image of gene tachykinin 2 (Tac2) from the Allen Brain Atlas showing enriched expression in the bed nucleus of the stria terminalis (BST). The box represents a 1-mm2 square. (d) Enlarged expression mask view of boxed area in c depicting gene expression levels color coded by ISH signal intensity (red, higher expression level; green/blue, lower expression level).

Measurements p is a image pixel in voxel C |C| is the total number of pixels in C M(p) - expression segmentation mask 1 (“expressing” pixel) or 0 (“non expressing” pixel) I(p) grayscale value of ISH image intensity Gray = 0.3*Red *Green *Blue.

Per Gene Signature Prox1 Coronal section Sagittal section Prox1 volume maximum intensity projections Raw ISH Expression Energy

Expression measures  expression density = sum of expressing pixels / sum of all pixels in division  expression intensity = sum of expressing pixel intensity / sum of expressing pixels  expression energy = sum of expressing pixel intensity / sum of all pixels in division –== density x intensity Recap - Measurements

MetaData Each voxel can be connected to a node in a hierarchical brain atlas / ontology, and also to Waxholm space Raw Nissl sections from the same brain (with 200 μm spacing) can also be obtained Each gene has specific probe sequence used, various identifiers to link to gene information (we’ve used Entrez ID)

Deriving Insights

Large-scale data analysis How much structure is present across space and across genes? How would the brain segment on the basis of gene expression patterns (as opposed to Nissl, etc.)? Is there structure in the patterns of expression of highly localized genes? What can we learn from the expression patterns of genes implicated in disorders? see Bohland et al. (2009) Methods; Ng et al. (2009) Nature Neuroscience.

Genome-wide Analysis of Expression 70.5% genes expressed in less than 20% cells

Notes Well-established genes for different cells identified For 12 major brain regions, 100 top genes.

Cell-Specific Genes Gene Ontology enrichment analysis useful Oligodendrocyte-enriched genes => myelin production.

Heterogeneity

Functional Compartments Genes with regional expression provides substrates for functional differences

Tools from AGEA Correlation mode – View navigate 3-D spatial relationship maps Clusters mode – Explore transcriptome based spatial organization Gene Finder mode - Search for genes with local regionality

Expression energy for each gene (M=4,376) and for each voxel (N=51,533) For each voxel find Pearson’s correlation coefficient between seed voxel and other voxel using expression vectors of length M Compute 51,533 three-dimensional correlation maps Web viewer for easy navigation between maps and within each 3-D map Correlation values as 24-bit false color using a blue-to-red (“jet”) color scale Spatial Transcriptome

Clusters of Correlated Gene Expression Classical definition of brain regions Overall Morphology Cellular Cytoarchitecture Ontological Development Functional Connectivity

Hierarchical clustering – Voxels are spatially organized as a binary tree Each node is collection of voxels and has 0 or 2 branches Initially 51,533 voxels assigned to root node of the tree. Final tree has103,065 nodes with a maximum depth of 53 levels and 51,533 leaf nodes (one for each voxel in the brain). At each bifurcation an ordering is assigned to each child to enable the definition a global “depth first” ordering for all leaf nodes. Clusters of Correlated Gene Expression

46 Clustering Analysis

Hierarchical Clustering

Notes

Microarray Data Analysis Unsupervised Analysis – clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis K-means Hierarchical Clustering Biclustering CLICK Self-Organizing Maps DBSCAN OPTICS DENCLUE …

Up regulated genes Down regulated genes Differentially Regulated Genes

Clusters ?

Clustering Analysis Group genes that show a similar temporal expression pattern. Group samples/genes that show a similar expression pattern.

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized Clustering Analysis

Clusters ? How many clusters? Four ClustersTwo Clusters Six Clusters

Clustering Algorithms K-means and its variants Hierarchical clustering

K-means Clustering Partitional clustering approach Each cluster is associated with a centroid (center point) Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified The basic algorithm is very simple

Choosing Initial Centroids

Limitations - Differing Sizes Original Points K-means (3 Clusters)

Limitations : Differing Density Original Points K-means (3 Clusters)

Limitations : Non-globular Shapes Original Points K-means (2 Clusters)

Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that records the sequences of merges or splits

Agglomerative Clustering More popular hierarchical clustering technique Basic algorithm is straightforward Compute the proximity matrix Let each data point be a cluster Repeat Merge the two closest clusters Update the proximity matrix Until only a single cluster remains Key operation is the computation of the proximity of two clusters Different approaches to defining the distance between clusters distinguish the different algorithms

In The Beginning... Start with clusters of individual points and a proximity matrix p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix

Intermediate Step After some merging steps, we have some clusters C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix

Intermediate Step We want to merge the two closest clusters (C2 and C5) and update the proximity matrix. C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix

After Merging The question is “How do we update the proximity matrix?” C1 C4 C2 U C5 C3 ? ? ? ? ? C2 U C5 C1 C3 C4 C2 U C5 C3C4 Proximity Matrix

Inter-Cluster Similarity – p1 p3 p5 p4 p2 p1p2p3p4p Similarity? MIN MAX Group Average Distance Between Centroids Proximity Matrix

Inter-Cluster Similarity – p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix MIN MAX Group Average Distance Between Centroids

Inter-Cluster Similarity – p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix MIN MAX Group Average Distance Between Centroids

– p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix MIN MAX Group Average Distance Between Centroids Inter-Cluster Similarity

p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix MIN MAX Group Average Distance Between Centroids 

Hierarchical: MIN Nested ClustersDendrogram

Hierarchical Clustering: MAX Nested ClustersDendrogram

Hierarchical Clustering: Group Average Nested ClustersDendrogram

Complexity: Time & Space O(N 2 ) space since it uses the proximity matrix. – N is the number of points. O(N 3 ) time in many cases – There are N steps and at each step the size, N 2, proximity matrix must be updated and searched – Complexity can be reduced to O(N 2 log(N) ) time for some approaches

Microarray Data Analysis Unsupervised Analysis – clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis KNN Decision tree Neuro nets SVM LDA Naïve Bayes …

Microarray Data Analysis Unsupervised Analysis – clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis Apriori Algorithm FP-Growth Algorithm CARPENTER …

Microarray Data Analysis Unsupervised Analysis – clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis PCA SVD Scatter Plot Gene Pies …

Next

Finding enriched genes Seeding with known structure-specific genes. Oligodendrocyte (Mbp, Mobp, Cnp1) Choroid-plexus (Col8a2, Lbp, Msx1) Find the genes with similar expression patterns.