Statistical Smoothing

Slides:

Advertisements

Similar presentations

Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

SPM 2002 C1C2C3 X =  C1 C2 Xb L C1 L C2  C1 C2 Xb L C1  L C2 Y Xb e Space of X C1 C2 Xb Space X C1 C2 C1  C3 P C1C2  Xb Xb Space of X C1 C2 C1 

Unsupervised learning

Chapter 7 – Classification and Regression Trees

Overview Correlation Regression -Definition

Clustering and Dimensionality Reduction Brendan and Yifang April

Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.

Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.

Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.

Matlab Software To Do Analyses as in Marron’s Talks Matlab Available from UNC Site License Download Software: Google “Marron Software”

Covariance and correlation

DATA MINING CLUSTERING K-Means.

Chapter 9 – Classification and Regression Trees

Lecture 20: Cluster Validation

Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.

Robust PCA Robust PCA 3: Spherical PCA. Robust PCA.

Support Vector Machines Graphical View, using Toy Example:

Examining Relationships in Quantitative Research

Object Orie’d Data Analysis, Last Time Finished Q-Q Plots –Assess variability with Q-Q Envelope Plot SigClust –When is a cluster “really there”? –Statistic:

Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness.

More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.

Stat 13, Tue 5/29/ Drawing the reg. line. 2. Making predictions. 3. Interpreting b and r. 4. RMS residual. 5. r Residual plots. Final exam.

11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.

SiZer Background Scale Space – Idea from Computer Vision Goal: Teach Computers to “See” Modern Research: Extract “Information” from Images Early Theoretical.

SWISS Score Nice Graphical Introduction:. SWISS Score Toy Examples (2-d): Which are “More Clustered?”

LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.

Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.

CHAPTER 1: Introduction. 2 Why “Learn”? Machine learning is programming computers to optimize a performance criterion using example data or past experience.

1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina.

Chapter 25 Analysis and interpretation of user observation evaluation data.

Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.

GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.

Statistical Smoothing In 1 Dimension (Numbers as Data Objects)

Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.

SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)

Object Orie’d Data Analysis, Last Time DiProPerm Test –Direction – Projection – Permutation –HDLSS hypothesis testing –NCI 60 Data –Particulate Matter.

Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Clustering Idea: Given data

Unsupervised Learning

SiZer Background Finance "tick data":

Data Transformation: Normalization

Chapter 15 – Cluster Analysis

On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.

Data Mining: Concepts and Techniques

IMAGE PROCESSING RECOGNITION AND CLASSIFICATION

Analyzing and Interpreting Quantitative Data

Radial DWD Main Idea: Linear Classifiers Good When Each Class Lives in a Distinct Region Hard When All Members Of One Class Are Outliers in a Random Direction.

(will study more later)

CSE 4705 Artificial Intelligence

6. Introduction to nonparametric clustering

Unit 4 Vocabulary.

Multivariate Statistical Methods

On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

How where first 3 displays generated?

How to Start This PowerPoint® Tutorial

On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.

Cluster Analysis.

Text Categorization Berlin Chen 2003 Reference:

Unsupervised Learning

Unsupervised Learning

Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF

Presentation transcript:

Statistical Smoothing In 1 Dimension, 2 Major Settings: Density Estimation “Histograms” Nonparametric Regression “Scatterplot Smoothing”

Kernel Density Estimation Chondrite Data: Sum pieces to estimate density Suggests 3 modes (rock sources)

Scatterplot Smoothing E.g. Bralower Fossil Data – some smooths

SiZer Background Fun Scale Space Views (Incomes Data) Surface View

(derivative CI contains 0) SiZer Background SiZer Visual presentation: Color map over scale space: Blue: slope significantly upwards (derivative CI above 0) Red: slope significantly downwards (derivative CI below 0) Purple: slope insignificant (derivative CI contains 0)

SiZer Background E.g. - Marron & Wand (1992) Trimodal #9 Increasing n Only 1 Signif’t Mode Now 2 Signif’t Modes Finally all 3 Modes

Significance in Scale Space SiZer Background Extension to 2-d: Significance in Scale Space Main Challenge: Visualization References: Godtliebsen et al (2002, 2004, 2006)

SiZer Overview Would you like to try smoothing & SiZer? Marron Software Website as Before In “Smoothing” Directory: kdeSM.m nprSM.m sizerSM.m Recall: “>> help sizerSM” for usage

PCA to find clusters Return to PCA of Mass Flux Data: MassFlux1d1p1.ps

PCA to find clusters SiZer analysis of Mass Flux, PC1 MassFlux1d1p1s.ps

Clustering Idea: Given data 𝑋 1 ,⋯, 𝑋 𝑛 Assign each object to a class Of similar objects Completely data driven I.e. assign labels to data “Unsupervised Learning” Contrast to Classification (Discrimination) With predetermined given classes “Supervised Learning”

Clustering Important References: MacQueen (1967) Hartigan (1975) Gersho and Gray (1992) Kaufman and Rousseeuw (2005) See Also: Wikipedia

K-means Clustering Main Idea: for data 𝑋 1 ,⋯, 𝑋 𝑛 Partition indices 𝑖=1,⋯,𝑛 among classes 𝐶 1 ,⋯, 𝐶 𝐾 Given index sets 𝐶 1 ,⋯, 𝐶 𝐾 that partition 1,⋯,𝑛 represent clusters by “class means” i.e, 𝑋 𝑗 = 1 # 𝐶 𝑗 𝑖∈ 𝐶 𝑗 𝑋 𝑖 (within class means)

Within Class Sum of Squares K-means Clustering Given index sets 𝐶 1 ,⋯, 𝐶 𝐾 Measure how well clustered, using Within Class Sum of Squares

K-means Clustering Common Variation: Put on scale of proportions (i.e. in [0,1]) By dividing “within class SS” by “overall SS” Gives Cluster Index:

K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means CI small when 𝐶 1 ,⋯, 𝐶 𝐾 gives tight clusters (within SS contains little variation) CI big when 𝐶 1 ,⋯, 𝐶 𝐾 gives poor clustering (within SS contains most of variation) CI = 1 when all cluster means are same

K-means Clustering Clustering Goal: Given data Choose classes To miminize

2-means Clustering Study CI, using simple 1-d examples Varying Standard Deviation

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering Study CI, using simple 1-d examples Varying Standard Deviation Varying Mean

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering Study CI, using simple 1-d examples Varying Standard Deviation Varying Mean Varying Proportion

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering Study CI, using simple 1-d examples Over changing Classes (moving b’dry)

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering C. Index for Clustering Greens & Blues

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering Curve Shows CI for Many Reasonable Clusterings

2-means Clustering Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data  interesting effects Multiple local minima (large number) Maybe disconnected Optimization (over 𝐶 1 ,⋯, 𝐶 𝐾 ) can be tricky… (even in 1 dimension, with 𝐾=2)

2-means Clustering

2-means Clustering Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data  interesting effects Can have 4 (or more) local mins (even in 1 dimension, with K = 2)

2-means Clustering

2-means Clustering Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data  interesting effects Local mins can be hard to find i.e. iterative procedures can “get stuck” (even in 1 dimension, with K = 2)

2-means Clustering Study CI, using simple 1-d examples Effect of a single outlier?

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

2-means Clustering

(really a “good clustering”???) 2-means Clustering Study CI, using simple 1-d examples Effect of a single outlier? Can create local minimum Can also yield a global minimum This gives a one point class Can make CI arbitrarily small (really a “good clustering”???)

SWISS Score Another Application of CI (Cluster Index) Cabanski et al (2010) Idea: Use CI in bioinformatics to “measure quality of data preprocessing” Philosophy: Clusters Are Scientific Goal So Want to Accentuate Them

SWISS Score Toy Examples (2-d): Which are “More Clustered?”

SWISS Score Toy Examples (2-d): Which are “More Clustered?”

Standardized Within class Sum of Squares SWISS Score SWISS = Standardized Within class Sum of Squares

Standardized Within class Sum of Squares SWISS Score SWISS = Standardized Within class Sum of Squares

Standardized Within class Sum of Squares SWISS Score SWISS = Standardized Within class Sum of Squares

Standardized Within class Sum of Squares SWISS Score SWISS = Standardized Within class Sum of Squares

SWISS Score Nice Graphical Introduction:

SWISS Score Nice Graphical Introduction:

SWISS Score Revisit Toy Examples (2-d): Which are “More Clustered?”

SWISS Score Toy Examples (2-d): Which are “More Clustered?”

SWISS Score Toy Examples (2-d): Which are “More Clustered?”

SWISS Score Now Consider K > 2: How well does it work?

SWISS Score K = 3, Toy Example:

SWISS Score K = 3, Toy Example:

SWISS Score K = 3, Toy Example:

SWISS Score K = 3, Toy Example: But C Seems “Better Clustered”

SWISS Score K-Class SWISS: Instead of using K-Class CI Use Average of Pairwise SWISS Scores

SWISS Score K-Class SWISS: Instead of using K-Class CI Use Average of Pairwise SWISS Scores (Preserves [0,1] Range)

SWISS Score Avg. Pairwise SWISS – Toy Examples

SWISS Score Additional Feature: Ǝ Hypothesis Tests: H1: SWISS1 < 1 H1: SWISS1 < SWISS2 Permutation Based See Cabanski et al (2010)

Clustering A Very Large Area K-Means is Only One Approach Has its Drawbacks (Many Toy Examples of This) Ǝ Many Other Approaches Important (And Broad) Class Hierarchical Clustering

Hiearchical Clustering Idea: Consider Either: Bottom Up Aggregation: One by One Combine Data Top Down Splitting: All Data in One Cluster & Split Through Entire Data Set, to get Dendogram

Hiearchical Clustering Aggregate or Split, to get Dendogram Thanks to US EPA: water.epa.gov

Hiearchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals

Hiearchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up

Hiearchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up

Hiearchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up

Hiearchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up

Hiearchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up, End Up With All in 1 Cluster

Hiearchical Clustering Aggregate or Split, to get Dendogram Split: Start With All in 1 Cluster

Hiearchical Clustering Aggregate or Split, to get Dendogram Split: Start With All in 1 Cluster, Move Down

Hiearchical Clustering Aggregate or Split, to get Dendogram Split: Start With All in 1 Cluster, Move Down, End With Individuals

Hiearchical Clustering Vertical Axis in Dendogram: Based on: Distance Metric (Ǝ many, e.g. L2, L1, L∞, …) Linkage Function (Reflects Clustering Type) A Lot of “Art” Involved

Hiearchical Clustering Dendogram Interpretation Branch Length Reflects Cluster Strength

Participant Presentations Xiao Yang Analysis of Climate Data Huijun Qian Quantitative Analysis of Non-muscle Myosin II Minifilaments