Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: More NCI 60 Data.

Slides:



Advertisements
Similar presentations
3D Geometry for Computer Graphics
Advertisements

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Chapter 5 Orthogonality
Computer Graphics Recitation 5.
3D Geometry for Computer Graphics
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Chapter 3 Determinants and Matrices
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
3D Geometry for Computer Graphics
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Orthogonality and Least Squares
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
Dirac Notation and Spectral decomposition
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Separate multivariate observations
5.1 Orthogonality.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Orthogonal Matrices and Spectral Representation In Section 4.3 we saw that n  n matrix A was similar to a diagonal matrix if and only if it had n linearly.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Length and Dot Product in R n Notes: is called a unit vector. Notes: The length of a vector is also called its norm. Chapter 5 Inner Product Spaces.
SVD(Singular Value Decomposition) and Its Applications
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Inner Product Spaces Euclidean n-space: Euclidean n-space: vector lengthdot productEuclidean n-space R n was defined to be the set of all ordered.
Chapter 5: The Orthogonality and Least Squares
CHAPTER FIVE Orthogonality Why orthogonal? Least square problem Accuracy of Numerical computation.
Modern Navigation Thomas Herring
Object Orie’d Data Analysis, Last Time
CPSC 491 Xin Liu Nov 17, Introduction Xin Liu PhD student of Dr. Rokne Contact Slides downloadable at pages.cpsc.ucalgary.ca/~liuxin.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
AN ORTHOGONAL PROJECTION
Elementary Linear Algebra Anton & Rorres, 9th Edition
§ Linear Operators Christopher Crawford PHY
Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’
Chap. 5 Inner Product Spaces 5.1 Length and Dot Product in R n 5.2 Inner Product Spaces 5.3 Orthonormal Bases: Gram-Schmidt Process 5.4 Mathematical Models.
AGC DSP AGC DSP Professor A G Constantinides©1 Signal Spaces The purpose of this part of the course is to introduce the basic concepts behind generalised.
Class 26: Question 1 1.An orthogonal basis for A 2.An orthogonal basis for the column space of A 3.An orthogonal basis for the row space of A 4.An orthogonal.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Review of Linear Algebra Optimization 1/16/08 Recitation Joseph Bradley.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
Mathematical Tools of Quantum Mechanics
Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
1 Chapter 8 – Symmetric Matrices and Quadratic Forms Outline 8.1 Symmetric Matrices 8.2Quardratic Forms 8.3Singular ValuesSymmetric MatricesQuardratic.
Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.
Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.
Unsupervised Learning II Feature Extraction
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Object Orie’d Data Analysis, Last Time Organizational Matters
Cornea Data Main Point: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space.
Beyond Vectors Hung-yi Lee. Introduction Many things can be considered as “vectors”. E.g. a function can be regarded as a vector We can apply the concept.
Inner Product Spaces Euclidean n-space: Euclidean n-space: vector lengthdot productEuclidean n-space R n was defined to be the set of all ordered.
Lecture 11 Inner Product Spaces Last Time Change of Basis (Cont.) Length and Dot Product in R n Inner Product Spaces Elementary Linear Algebra R. Larsen.
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Review of Linear Algebra
Object Orie’d Data Analysis, Last Time
Matrices and Vectors Review Objective
Singular Value Decomposition
Participant Presentations
Chapter 3 Linear Algebra
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Maths for Signals and Systems Linear Algebra in Engineering Lecture 18, Friday 18th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Presentation transcript:

Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: More NCI 60 Data & Detailed (math ’ cal) look at PCA

Last Time: Checked Data Combo, using DWD Dir ’ ns

DWD Views of NCI 60 Data Interesting Question: Which clusters are really there? Issues: DWD great at finding dir’ns of separation And will do so even if no real structure Is this happening here? Or: which clusters are important? What does “important” mean?

Real Clusters in NCI 60 Data Simple Visual Approach: Randomly relabel data (Cancer Types) Recompute DWD dir’ns & visualization Get heuristic impression from this Deeper Approach Formal Hypothesis Testing (Done later)

Random Relabelling #1

Random Relabelling #2

Random Relabelling #3

Random Relabelling #4

Revisit Real Data

Revisit Real Data (Cont.) Heuristic Results: Strong Clust’s Weak Clust’s Not Clust’s Melanoma C N S NSCLC Leukemia Ovarian Breast Renal Colon Later: will find way to quantify these ideas i.e. develop statistical significance

NCI 60 Controversy Can NCI 60 Data be normalized? Negative Indication: Kou, et al (2002) Bioinformatics, 18, –Based on Gene by Gene Correlations Resolution: Gene by Gene Data View vs. Multivariate Data View

Resolution of Paradox: Toy Data, Gene View

Resolution: Correlations suggest “ no chance ”

Resolution: Toy Data, PCA View

Resolution: PCA & DWD direct ’ ns

Resolution: DWD Adjusted

Resolution: DWD Adjusted, PCA view

Resolution: DWD Adjusted, Gene view

Resolution: Correlations & PC1 Projection Correl ’ n

Needed final verification of Cross-platform Normal ’ n Is statistical power actually improved? Will study later

DWD: Why does it work? Rob Tibshirani Query: Really need that complicated stuff? (DWD is complex) Can’t we just use means? Empirical Fact (Joel Parker): (DWD better than simple methods)

DWD: Why does it work? Xuxin Liu Observation: Key is unbalanced sub-sample sizes (e.g biological subtypes) Mean methods strongly affected DWD much more robust Toy Example

DWD: Why does it work?

Xuxin Liu Example Goals: –Bring colors together –Keep symbols distinct (interesting biology) Study varying sub-sample proportions: –Ratio = 1: Both methods great –Ratio = 0.61: Mean degrades, DWD good –Ratio = 0.35: Mean poor, DWD still OK –Ratio = 0.11: DWD degraded, still better Later: will find underlying theory

PCA: Rediscovery – Renaming Statistics: Principal Component Analysis (PCA) Social Sciences: Factor Analysis (PCA is a subset) Probability / Electrical Eng: Karhunen – Loeve expansion Applied Mathematics: Proper Orthogonal Decomposition (POD) Geo-Sciences: Empirical Orthogonal Functions (EOF)

An Interesting Historical Note The 1 st (?) application of PCA to Functional Data Analysis: Rao, C. R. (1958) Some statistical methods for comparison of growth curves, Biometrics, 14, st Paper with “Curves as Data” viewpoint

Detailed Look at PCA Three important (and interesting) viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review linear alg. and multivar. prob.

Review of Linear Algebra Vector Space: set of “vectors”,, and “scalars” (coefficients), “closed” under “linear combination” ( in space) e.g., “ dim Euclid’n space”

Review of Linear Algebra (Cont.) Subspace: subset that is again a vector space i.e. closed under linear combination e.g. lines through the origin e.g. planes through the origin e.g. subsp. “generated by” a set of vector (all linear combos of them = = containing hyperplane through origin)

Review of Linear Algebra (Cont.) Basis of subspace: set of vectors that: span, i.e. everything is a lin. com. of them are linearly indep’t, i.e. lin. Com. is unique e.g. “unit vector basis” since

Review of Linear Algebra (Cont.) Basis Matrix, of subspace of Given a basis,, create matrix of columns:

Review of Linear Algebra (Cont.) Then “linear combo” is a matrix multiplicat’n: where Check sizes:

Review of Linear Algebra (Cont.) Aside on matrix multiplication: (linear transformat’n) For matrices, Define the “matrix product” (“inner products” of columns with rows) (composition of linear transformations) Often useful to check sizes:

Review of Linear Algebra (Cont.) Matrix trace: For a square matrix Define Trace commutes with matrix multiplication:

Review of Linear Algebra (Cont.) Dimension of subspace (a notion of “size”): number of elements in a basis (unique) (use basis above) e.g. dim of a line is 1 e.g. dim of a plane is 2 dimension is “degrees of freedom”

Review of Linear Algebra (Cont.) Norm of a vector: in, Idea: “length” of the vector Note: strange properties for high, e.g. “length of diagonal of unit cube” =

Review of Linear Algebra (Cont.) Norm of a vector (cont.): “length normalized vector”: (has length one, thus on surf. of unit sphere & is a direction vector) get “distance” as:

Review of Linear Algebra (Cont.) Inner (dot, scalar) product: for vectors and, related to norm, via

Review of Linear Algebra (Cont.) Inner (dot, scalar) product (cont.): measures “angle between and ” as: key to “orthogonality”, i.e. “perpendicul’ty”: if and only if

Review of Linear Algebra (Cont.) Orthonormal basis : All ortho to each other, i.e., for All have length 1, i.e., for

Review of Linear Algebra (Cont.) Orthonormal basis (cont.): “Spectral Representation”: where check: Matrix notation: where i.e. is called “transform (e.g. Fourier, wavelet) of ”

Review of Linear Algebra (Cont.) Parseval identity, for in subsp. gen’d by o. n. basis : Pythagorean theorem “Decomposition of Energy” ANOVA - sums of squares Transform,, has same length as, i.e. “rotation in ”

Gram-Schmidt Ortho-normalization Idea: Given a basis, find an orthonormal version, by subtracting non-ortho part Review of Linear Algebra (Cont.)

Projection of a vector onto a subspace : Idea: member of that is closest to (i.e. “approx’n”) Find that solves: (“least squares”) For inner product (Hilbert) space: exists and is unique Review of Linear Algebra (Cont.)

Projection of a vector onto a subspace (cont.): General solution in : for basis matrix, So “proj’n operator” is “matrix mult’n”: (thus projection is another linear operation) (note same operation underlies least squares) Review of Linear Algebra (Cont.)

Projection using orthonormal basis : Basis matrix is “orthonormal”: So = = Recon(Coeffs of “in dir’n”)

Review of Linear Algebra (Cont.) Projection using orthonormal basis (cont.): For “orthogonal complement”,, and Parseval inequality:

Review of Linear Algebra (Cont.) (Real) Unitary Matrices: with Orthonormal basis matrix (so all of above applies) Follows that (since have full rank, so exists …) Lin. trans. (mult. by ) is like “rotation” of But also includes “mirror images”

Review of Linear Algebra (Cont.) Singular Value Decomposition (SVD): For a matrix Find a diagonal matrix, with entries called singular values And unitary (rotation) matrices, (recall ) so that

Review of Linear Algebra (Cont.) Intuition behind Singular Value Decomposition: For a “linear transf’n” (via matrix multi’n) First rotate Second rescale coordinate axes (by ) Third rotate again i.e. have diagonalized the transformation

Review of Linear Algebra (Cont.) SVD Compact Representation: Useful Labeling: Singular Values in Increasing Order Note: singular values = 0 can be omitted Let = # of positive singular values Then: Where are truncations of

Review of Linear Algebra (Cont.) Eigenvalue Decomposition: For a (symmetric) square matrix Find a diagonal matrix And an orthonormal matrix (i.e. ) So that:, i.e.

Review of Linear Algebra (Cont.) Eigenvalue Decomposition (cont.): Relation to Singular Value Decomposition (looks similar?): Eigenvalue decomposition “harder” Since needs Price is eigenvalue decomp’n is generally complex Except for square and symmetric Then eigenvalue decomp. is real valued Thus is the sing’r value decomp. with: