Integrative Analysis of multiple large-scale molecular biological data

Slides:



Advertisements
Similar presentations
Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Advertisements

Latent Semantic Analysis
Data Visualization in Molecular Biology Alexander Lex July 29, 2013.
Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Multiple-view Reconstruction from Points and Lines
COMP322/S2000/L221 Relationship between part, camera, and robot (cont’d) the inverse perspective transformation which is dependent on the focal length.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Linear Simultaneous Equations
Basic Concepts for Ordination Tanya, Nick, Caroline.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
Non Negative Matrix Factorization
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Enlargements Objectives To be able to: Enlarge shapes given a scale factor and centre of enlargement. Find centres of enlargement.
Canonical Correlation Analysis and Related Techniques Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.
Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Mesh Coarsening zhenyu shu Mesh Coarsening Large meshes are commonly used in numerous application area Modern range scanning devices are used.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Analyzing Expression Data: Clustering and Stats Chapter 16.
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
Non-iterative JIVE for Data Integration Qing Feng Joint Work with Jan Hannig, J.S. Marron Mar. 24 th,
High-throughput genomic profiling of tumor-infiltrating leukocytes
Singular Value Decomposition and its applications
Cluster Analysis of Gene Expression Profiles
PREDICT 422: Practical Machine Learning
Semi-Supervised Clustering
Enlargements Objectives To be able to:
Angle Relationships.
LSI, SVD and Data Management
Structure from motion Input: Output: (Tomasi and Kanade)
Singular Value Decomposition
1 Department of Engineering, 2 Department of Mathematics,
Estimating 2-view relationships
Scale-Space Representation of 3D Models and Topological Matching
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Let Maths take you Further…
Principal Component Analysis
Clustering.
Projective Transformations for Image Transition Animations
Dilations Objectives:
Discrete Mathematics Lecture 12: Graph Theory
Feature space tansformation methods
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Lecture 13: Singular Value Decomposition (SVD)
Interpretation of Similar Gene Expression Reordering
Data Analysis – Part1: The Initial Questions of the AFCS
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
K-Medoid May 5, 2019.
Straight Line Graphs Lesson Objectives:
HER-2/neu mRNA detection by gene expression profiling
Canonical Correlation Analysis and Related Techniques
Structure from motion Input: Output: (Tomasi and Kanade)
Hierarchical Clustering
Cancer Cell Line Encyclopedia
Clustering.
Comparing drug sensitivity predictions from different data types in melanoma and endometrial cancer cell lines. Comparing drug sensitivity predictions.
Global analysis of the chemical–genetic interaction map.
Presentation transcript:

Integrative Analysis of multiple large-scale molecular biological data Sri Priya Ponnapalli Genomic Signal Processing Laboratory The University of Texas at Austin

Project Objectives Specimen Under Analysis : The National Cancer Institute’s 60 cell lines (NCI60). Dataset #1 RNA Expression profiles, [ Ross et al, 2000 ] Dataset #2 Proteomic profiles, [ Nishizuka et al , 2003 ] Dataset #3 Drug Activity Levels, [ Scherf et al, 2000 ] Perceive relationships between three datasets, each containing a different attribute of the NCI60 – genome-scale expression, sensitivities to more than 70,000 chemical compounds and chemotherapeutics, and proteomic profiles. CHIEF OBJECTIVE : DEVELOP A METHOD TO ANALYSE THE RELATIONSHIPS BETWEEN MULTIPLE DATASETS.

Initial Analysis : SVD All three datasets were processed using Singular value decomposition [ Alter et al, 2000]. The results look interesting but as you can see, it is difficult to interpret them very well, let alone integrate the SVD results of all three datasets. Plot of the First 5 sorted Eigengenes Tumor Samples Eigengenes

Analysis using GSVD Every pair of datasets was then processed using Generalized Singular value decomposition. Dataset1=U1E1X Dataset2=U2E2X If a dataset is thought to represent a line, the GSVD of two datasets represent the point of intersection of these lines. i.e. It highlights the similarities and dissimilarities between the two datasets. This simple fact suggests a method to study the similarities and differences between multiple datasets.

Consider the case of finding similarities and dissimilarities between 3 pairs of datasets ( this can be extended to multiple datasets). These 3 datasets maybe thought of representing 3 lines. Any two non-parallel lines intersect at a point. Three non-parallel lines form a triangle ( unless they all have a common point in which case all three vertices of the triangle converge to that point). To goal is to express the three datasets in the form Dataset1=U1E1X Dataset2=U2E2X Dataset3=U3E3X

If we compute the GSVD of every two datasets ( find the points of intersection of every two lines), we get three matrices that each correspond to a vertex of a triangle. We want a matrix that best approximates these three matrices i.e. a point that is closest to all three vertices simultaneously. This point would be the centroid of the triangle. Given the co-ordinates of the vertices, the centroid may be easily computed. All these results have to be interpreted in terms of matrices. This may be easily done by considering the distances between matrices as defined by the Frobenius distance.

This method is an approximation, but the best possible approximation. It minimizes the error between the original dataset and the dataset obtained by the product of the three matrices. It has been tried on the three datasets under study and the results look promising. Please read the paper for further details.