Michael A. Burr, Eynat Rafalin, and Diane L. Souvaine

Slides:



Advertisements
Similar presentations
A Fast PTAS for k-Means Clustering
Advertisements

Measures of Location and Dispersion
3.6 Support Vector Machines
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS Random Variables and Distribution Functions
Introduction to Algorithms
Winter Education Conference Consequential Validity Using Item- and Standard-Level Residuals to Inform Instruction.
Math HSPA.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Clipping Covers chapter 10 of Computer Graphics and Virtual Environments (Slater, Steed and Chrysanthou) . See ©Yiorgos Chrysanthou.
Polygon Scan Conversion – 11b
Data Visualization Lecture 4 Two Dimensional Scalar Visualization
GR2 Advanced Computer Graphics AGR
SI23 Introduction to Computer Graphics
7.1 si31_2001 SI31 Advanced Computer Graphics AGR Lecture 7 Polygon Shading Techniques.
Factorise the following 10x a – 20 36m a + 27b + 9c 9y² - 12y 30ab³ + 35a²b 24x4y³ - 40x²y.
Approximating the area under a curve using Riemann Sums
OLAP Over Uncertain and Imprecise Data T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan (Wisconsin),
1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Part- I {Conic Sections}
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Chapter 4 Partition I. Covering and Dominating.
Minimum Weight Plastic Design For Steel-Frame Structures EN 131 Project By James Mahoney.
Alpha Shapes. Used for Shape Modelling Creates shapes out of point sets Gives a hierarchy of shapes. Has been used for detecting pockets in proteins.
CSE554Cell ComplexesSlide 1 CSE 554 Lecture 3: Skeleton and Thinning (Part II) Fall 2013.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
14 Vectors in Three-dimensional Space Case Study
1 Adornments, Flowers, and Kneser-Poulsen Bob Connelly Cornell University (visiting University of Cambridge)
Association Rule Mining
Solving Equations How to Solve Them
Quantitative Analysis (Statistics Week 8)
Graphs, representation, isomorphism, connectivity
CSE 4101/5101 Prof. Andy Mirzaian Computational Geometry.
1 Motion and Manipulation Configuration Space. Outline Motion Planning Configuration Space and Free Space Free Space Structure and Complexity.
Splines IV – B-spline Curves
Trigonometry Trigonometry begins in the right triangle, but it doesn’t have to be restricted to triangles. The trigonometric functions carry the ideas.
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
Shape Analysis and Retrieval D2 Shape Distributions Notes courtesy of Funk et al., SIGGRAPH 2004.
9. Two Functions of Two Random Variables
 Over-all: Very good idea to use more than one source. Good motivation (use of graphics). Good use of simplified, loosely defined -- but intuitive --
Secret Sharing, Matroids, and Non-Shannon Information Inequalities.
A. S. Morse Yale University University of Minnesota June 4, 2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.
Developable Surface Fitting to Point Clouds Martin Peternell Computer Aided Geometric Design 21(2004) Reporter: Xingwang Zhang June 19, 2005.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2013 – 12269: Continuous Solution for Boundary Value Problems.
Computational Geometry II Brian Chen Rice University Computer Science.
CHAPTER 5: CONVEX POLYTOPES Anastasiya Yeremenko 1.
A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.
Computational Geometry The art of finding algorithms for solving geometrical problems Literature: –M. De Berg et al: Computational Geometry, Springer,
Computational Geometry The art of finding algorithms for solving geometrical problems Literature: –M. De Berg et al: Computational Geometry, Springer,
Data Depth Jason Burrowes-Jones Presentation Outline Background Review What is known Project Objectives Present Work and Results Future Goals.
Theta Function Lecture 24: Apr 18. Error Detection Code Given a noisy channel, and a finite alphabet V, and certain pairs that can be confounded, the.
reconstruction process, RANSAC, primitive shapes, alpha-shapes
1 Finite-Volume Formulation. 2 Review of the Integral Equation The integral equation for the conservation statement is: Equation applies for a control.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Geometric and combinatorial issues in data depth
Review Measures of central tendency
Introduction to Geometry – Points, Lines, and Planes Points - have no height or width - represented using a dot and labeled using a capital letter A B.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-5 The Normal Distribution.
Center for Graphics and Geometric Computing, Technion 1 Computational Geometry Chapter 8 Arrangements and Duality.
Lecture 7 : Point Set Processing Acknowledgement : Prof. Amenta’s slides.
CSE554ContouringSlide 1 CSE 554 Lecture 4: Contouring Fall 2015.
So, what’s the “point” to all of this?….
1 Overview (Part 1) Background notions A reference framework for multiresolution meshes Classification of multiresolution meshes An introduction to LOD.
Lecture 9 : Point Set Processing
Intrinsic Data Geometry from a Training Set
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Decimation Of Triangle Meshes
Chapter 12 Statistics 2012 Pearson Education, Inc.
Chapter 12 Statistics.
Presentation transcript:

Michael A. Burr, Eynat Rafalin, and Diane L. Souvaine Simplicial Depth: An Improved Definition, Analysis, and Efficiency in the Finite Sample Case Michael A. Burr, Eynat Rafalin, and Diane L. Souvaine Tufts University www.cs.tufts.edu/research/geometry Should be all set – colors on this page? CCCG 2004 NSF grant #EIA-99-96237

Introduction Introduction to Data Depth Simplicial Depth Why? Examples Desirable Properties Simplicial Depth Definition Properties Problems Revised Definition Ongoing work Should be all set – another picture on the left?

What is Data Depth and Why? Measures how deep (central) a given point is relative to a distribution or a data cloud. Deals with the shape of the data. Can be thought of as a measure of how well a point characterizes a data set Provides an alternative to classical statistical analysis. No assumption about the underlying distribution of the data. Deals with outliers. Why study? Many measures are geometric in nature. Can be computationally expensive to compute depth. Should be all set – Difficult to talk about outliers with this data set but the contours with the other data set are bad.

Examples Half-Space (Tukey, Location) (Tukey 75) Regression Depth (Rousseeuw and Hubert 94) Simplicial Depth (Liu 90) … and many more. 2 Data Points in this Half-plane 3 Data Points in this Half-plane Should be all set - Check the year on Regression depth. Which side should be gray?

Desirable Properties of Data Depth Liu (90) / Serfling and Zuo (00) P1 – Affine Invariance P2 – Maximality at Center P3 – Monotonicity Relative to Deepest Point P4 – Vanishing at Infinity We propose (BRS 04) P5 – Invariance Under Dimensions Change Should be all set – the slash and the BRS?

Affine Invariance (P1) A – affine transformation Should be all set – more information about this specific affine transformation?

Maximality at Center (P2) p is the center q is any point Should be all set – anything to add?

Monotonicity Relative to Deepest Point (P3) point between p and q p is the deepest point Should be all set – anything to add? q is any point

Vanishing at Infinity (P4) q is far from the data cloud Should be all set – anything to add?

Invariance Under Dimensions Change (P5) Is this an data set? Is this an data set? Should be all set – anything to add?

Simplicial Depth (Liu 90) The simplicial depth of a point p with respect to a probability distribution F in is the probability that a random closed simplex in contains p. where is a closed simplex formed by d+1 random observations from F. The simplicial depth of a point p with respect to a data set in is the fraction of closed simplicies formed by d+1 points of S containing p. where I is the indicator function. Should be all set – anything to add?

Sample Version of Simplicial Depth The simplicial depth of a point p with respect to a data set in is the fraction of closed simplicies formed by d+1 points of S containing p. Total number of simplicies= =20 ( ) 6 3 .2 .3 .3 .2 p is contained in 6 simplicies .3 .3 .4 .4 Should be all set – anything to add? .3 The depth of p= =.3 6 20 __ .3 .4 .4 .2 .4 .2 .3 .3 .3 .3 .2

Properties Is a statistical depth function in the continuous case. (Liu 90) Is affine invariant (P1) and vanishes at infinity (P4) in the sample case. (Serfling and Zuo 00) Should be all set – anything to add?

Problems in the Sample Case Does not always attain maximality at the center (P2) and does not always have monotonicity relative to the deepest point (P3). (Serfling and Zuo 00) The depth on the boundary of cells is at least the depth in each of the adjacent cells – causes discontinuities. Does not have invariance under dimensions change (P5). Should be all set – too much color?

Simplicial Depth (Liu 90) (BRS 04) .6 .3 B C .6 .3 .3 .4 .4 E .5 .3 .3 Y .8 .5 .4 .4 .7 .4 X .5 .35 .6 .3 .3 D Should be all set – anything to add? .3 Averaging number of closed and open simplicies containing a point .6 A Total number of simplicies = ( ) = 10 5 3

Revised Definition (BRS 04) The simplicial depth of a point p with respect to a data set in is the average of the fraction of closed simplicies containing p and the fraction of open simplicies containing p, formed by d+1 points of S. Equivalently - the fraction of simplicies with data points as vertices which contain p in their open interior. - the fraction of simplicies with data points as vertices which contain p in their boundary. Should be all set – anything to change?

Properties of the Revised Definition Reduces to the original definition, for continuous distributions and for points lying in the interior of cells. Keeps ranking order of data points Can be calculated using the existing algorithms, with slight modifications. Fixes Zuo and Serfling’s counterexamples. The depth on the boundary of two cells is the average of the two adjacent cells. Invariant under dimensions change (P5) for the change from to . Should be all set – anything to add or remove?

Invariance Under Dimension Change (P5) Degenerate simplicies Both points C and A (a point between B and C) lie within the open (degenerate) simplex BCD – think of it as a very thin triangle. Both points B and D are vertices of the (degenerate) simplex BCD. For a point, p, consider the ratio: For both definitions, the ratio for a position (non-data point) is 2/3. For Liu’s definition, the ratio for a data point is not 2/3. For the BRS definition, the ratio for a data point is 2/3. Should be all set – anything to add?

Remaining Problems (P2 and P3) Should be all set – too many arrows?

Remaining Problems (Data Points) Data points are still over counted – there can still be discontinuities at data points. However, to fix the depth at data points, more features need to be considered. Data points are inherently part of simplicies (a point makes a triangle with every other pair of points) and edges are inherently part of simplicies (the two endpoints of an edge make a triangle with every other vertex). To retain invariance under dimensions change (P5), given a data set in , which lies on a d-flat, then the depth of a point when the data set is evaluated as a d-dimensional data set should be a multiple of the depth when the data set is evaluated as a b-dimensional data set. Neither of the above ideas completely solve the problem and it appears that the best solutions take into account the geometry of the entire data set. Should be all set – too many words? Check d-flat

Ongoing Work The current algorithm for finding the median (the deepest point) is O(n4) to walk an arrangement of O(n2) segments. We can improve this algorithm by comparing simplicial depth and half-space depth. We are further improving this by considering simplicial depth in the dual. The problems with data points are improved by generalizing this work to higher dimensions. To find the depth at all points, we are using local information to form an approximation for the depth measure. Should be all set – anything to add or remove?

References G. Aloupis, C. Cortes, F. Gomez, M. Soss, and G. Toussaint. Lower bounds for computing statistical depth. Computational Statistics & Data Analysis, 40(2):223-229, 2002. G. Aloupis, S. Langerman, M. Soss, and G. Toussaint. Algorithms for bivariate medians and a Fermat-Torricelli problem for lines. In Proc. 13th CCCG, pages 21-24, 2001. M. Burr, E. Rafalin, and D. L. Souvaine. Simplicial depth: An improved definition, analysis, and efficiency for the sample case. Technical report 2003-28, DIMACS, 2003. A. Y. Cheng and M. Ouyang. On algorithms for simplicial depth. In Proc. 13th CCCG, pages 53-56, 2001. J. Gil, W. Steiger, and A. Wigderson. Geometric medians. Discrete Math., 108(1-3):37-51, 1992. Topological, algebraical and combinatorial structures. Frolík's memorial volume. S. Khuller and J. S. B. Mitchell. On a triangle counting problem. Inform. Process. Lett., 33(6):319-321, 1990. R. Liu. On a notion of data depth based on random simplices. Ann. of Statist., 18:405-414, 1990. Y. Zuo and R. Serfling. General notions of statistical depth function. Ann. Statist., 28(2):461-482, 2000. Should be all set – too much?