Contours: Y R f R* f(x) Y R f S

Slides:



Advertisements
Similar presentations
7.1 Area Between 2 Curves Objective: To calculate the area between 2 curves. Type 1: The top to bottom curve does not change. a b f(x) g(x) *Vertical.
Advertisements

One of the most important problems is Computational Geometry is to find an efficient way to decide, given a subdivision of E n and a point P, in which.
15 PARTIAL DERIVATIVES.
FUNCTIONS AND MODELS Chapter 1. Preparation for calculus :  The basic ideas concerning functions  Their graphs  Ways of transforming and combining.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Information Systems (SIS) COMP Raster-based structures (2) Data conversion.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
 Row and Reduced Row Echelon  Elementary Matrices.
Multi-Dimensional Arrays
Ellipses. Solve each equation = x x 2 = = 120 – x 2 Ellipses - Warm Up.
Chapter 9 – Classification and Regression Trees
Slide 5- 1 Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
Individual Project ~ FRQ Presentation 2002 AP Calculus AB #1 Marija Jevtic.
Arc Length and Surfaces of Revolution
Machine Learning is based on Near Neighbor Set(s), NNS. Clustering, even density based, identifies near neighbor cores 1 st (round NNS s,  about a center).
A Review of Some Fundamental Mathematical and Statistical Concepts UnB Mestrado em Ciências Contábeis Prof. Otávio Medeiros, MSc, PhD.
Volume: The Disk Method
A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.
Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
Exponential and Logarithmic Functions
Bootstrapped Optimistic Algorithm for Tree Construction
An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.
Introduction to Conic Sections Conic sections will be defined in two different ways in this unit. 1.The set of points formed by the intersection of a plane.
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
Equation of Circle Midpoint and Endpoint Distance Slope
Copyright © 2004 Pearson Education, Inc. Chapter 2 Graphs and Functions.
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
Conic Sections Circles Objective: Find the standard form of a circle.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.
Math 1314 College Algebra Final Review Solutions.
Chapter Six Overview Applications of the Definite Integral in Geometry, Science, and Engineering.
7 Applications of Integration
Some Basic Relationships Between Pixels
Strategies for Spatial Joins
Introduction to Functions of Several Variables
Copyright © Cengage Learning. All rights reserved.
Instance Based Learning
Database analysis can be broken down into 2 areas,
Space Filling Curves and Functional Contours
Efficient Ranking of Keyword Queries Using P-trees
Efficient Ranking of Keyword Queries Using P-trees
In this section, we will learn about: Using integration to find out
Proximal Support Vector Machine for Spatial Data Using P-trees1
Curl and Divergence.
= xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes
Mean Shift Segmentation
Database analysis can be broken down into 2 areas, Querying and Data Mining. Data Mining can be broken down into 2 areas, Machine.
Graphing Quadratic Functions Rational Functions Conic Sections
Smoothing using only the two hi order bits (aggregation by
Yue (Jenny) Cui and William Perrizo North Dakota State University
Section 2.8 Distance and Midpoint Formulas; Circles
A Fast and Scalable Nearest Neighbor Based Classification
Pre-Processing What is the best amount of amortized preprocessing?
Vertical K Median Clustering
Lecture 15: Bitmap Indexes
Geometry Unit 12 Distance and Circles.
Section 1.9 Distance and Midpoint Formulas; Circles
cskin(C,k)  allskin(C,k)s closed skin, and
9.3 Graph and Write Equations of Circles
Creating Meshes Through Parameterized Functions
Objectives and Vocabulary
Vertical K Median Clustering
Review Given a training space T(A1,…,An, C) and its features subspace X(A1,…,An) = T[A1,…,An], a functional f:X Reals, distance d(x,y)  |f(x)-f(y)| and.
NP-Completeness Yin Tat Lee
Creating Meshes Through Parameterized Functions
11.1 Functions of two or more variable
pTrees predicate Tree technologies
Applications of Integration
Presentation transcript:

Contours: Y R f R* f(x) Y R f S  f:R(A1..An)  Y A1 A2 An Af x1 x2 xn f(x1..xn) : . . . R* Equivalently,  derived attribute, Af, with domain=Y (equivalence is x.Af = f(x) xR) A1 A2 An x1 x2 xn : . . . f(x) f R Y x and  SY, the f-contour(S) = f-1(S) Equiv., Af-contour(S) = Select x1..xn From R* Where x.Af=f(x1..xn) A1 A2 An : : . . . S f R Y graph(f) = { ( x, f(x) ) | xR } f-contour(S) If S={a}, we use f-Isobar(a) equiv. Af-Isobar(a) If f is a local density and {Sk} is a partition of Y, {f-1(Sk)} partitions R. (eg, In OPTICS, f=reachability distance, {Sk} is the partition produced by intersections of graph-f wrt to a walk of R and a horizontal line. A Weather map use equiwidth interval partition of S=Reals (barometric pressure or temperature contours). A grid is the intersection partition with respect to the dimension projection functions (next slide). A Class is a contour under f:RC, the class map. An L -disk about a is the intersection of the -dimension_projection contours containing a.

f:R(A1..An)Y SY The (uncompressed) Predicate-tree 0Pf, S is : 0Pf,S(x)=1(true) iff f(x)S The Compressed P-tree, sPf,S is the compression of 0Pf,S with equi-width leaf size, s, as follows 1. Choose a walk of R (converts 0Pf,S from bit map to bit vector) 2. Equi-width partition 0Pf,S with segment size, s (s=leafsize, the last segment can be short) 3. Eliminate and mask to 0, all pure-zero segments (call mask, NotPure0 Mask or EM) 4. Eliminate and mask to 1, all pure-one segments (call mask, Pure1 Mask or UM) (EM=existential aggregation UM=universal aggregation) Compressing each leaf of sPf,S with leafsize=s2 gives: s1,s2Pf,S Recursivly, s1, s2, s3Pf,S s1, s2, s3, s4Pf,S ... (builds an EM and a UM tree) BASIC P-trees If Ai Real or Binary and fi,j(x)  jth bit of xi ; {(*)Pfi,j ,{1} (*)Pi,j}j=b..0 are basic (*)P-trees of Ai, *= s1..sk If Ai Categorical and fi,a(x)=1 if xi=a, else 0; {(*)Pfi,a,{1} (*)Pi,a}aR[Ai] are basic (*)P-trees of Ai Notes: The UM masks (e.g., of 2k,...,20Pi,j, with k=roof(log2|R| ), form a (binary) tree. Whenever the EM bit is 1, that entire subtree can be eliminated (since it represents a pure0 segment), then a 0-node at level-k (lowest level = level-0) with no sub-tree indicates a 2k-run of zeros. In this construction, the UM tree is redundant. We call these EM trees the basic binary P-trees.

= xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes A useful functional: TV(a) =xR(x-a)o(x-a) If we use d for an index variable over the dimensions, = xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes = xRd=1..n(k2kxdk)2 - 2xRd=1..nad(k2kxdk) + |R||a|2 = xd(i2ixdi)(j2jxdj) - 2xRd=1..nad(k2kxdk) + |R||a|2 = xdi,j 2i+jxdixdj - 2 x,d,k2k ad xdk + |R||a|2 = x,d,i,j 2i+j xdixdj - |R||a|2 2 dad x,k2kxdk + = x,d,i,j 2i+j xdixdj - |R|dadad 2|R| dadd + = x,d,i,j 2i+j xdixdj + dadad ) |R|( -2dadd + TV(a) = i,j,d 2i+j |Pdi^dj| - |R||a|2 k2k+1 dad |Pdk| + Note that the first term does not depend upon a. Thus, the derived attribute, TV-TV() (eliminate 1st term) is much simpler to compute and has identical contours (just lowers the graph by TV() ). We also find it useful to post-compose a log to reduce the number of bit slices. The resulting functional is called the High-Dimension-ready Total Variation or HDTV(a).

TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad ) From equation 7, f(a)=TV(a)-TV() TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad ) = |R| ( -2d(add-dd) + d(adad- dd) ) + dd2 ) = |R|( dad2 - 2ddad = |R| |a-|2 so f()=0 g(a) HDTV(a) = ln( f(a) )= ln|R| + ln|a-|2 The length of g(a) depends only on the length of a-, so isobars are hyper-circles centered at  The graph of g is a log-shaped hyper-funnel: For an -contour ring (radius  about a) go inward and outward along a- by  to the points; inner point, b=+(1-/|a-|)(a-) and outer point, c=-(1+/|a-|)(a-).  x1 x2 g(a)=HDTV(x) g(b) g(c) Then take g(b) and g(c) as lower and upper endpoints of a vertical interval. Then we use EIN formulas on that interval to get a mask P-tree for the -contour (which is a well-pruned superset of the -neighborhood of a) -contour (radius  about a) a b c

contour of dimension projection f(a)=a1 If the HDTV circumscribing contour of a is still too populous, use circumscribing Ad-contour (Note: Ad is not a derived attribute at all, but just Ad, so we already have its basic P-trees). As pre-processing, calculate basic P-trees for the HDTV derived attribute (or another hypercircular contour derived attribute). To classify a 1. Calculate b and c (Depend on a, ) 2. Form mask P-tree for training pts with HDTV-values[HDTV(b),HDTV(c)] 3. User that P-tree to prune out the candidate NNS. If the count of candidates is small, proceed to scan and assign class votes using Gaussian vote function, else prune further using a dimension projections). (Use voting function, G(x) = Gauss(|x-a|)-Gauss(), where Gauss(r) is (1/(std*2)e-(r-mean)2/2var (std, mean, var are wrt set distances from a of voters i.e., {r=|x-a|: x a voter} ) HDTV(x)  -contour (radius  about a) a HDTV(c) We can also note that HDTV can be further simplified (retaining same contours) using h(a)=|a-|. Since we create the derived attribute by scanning the training set, why not just use this very simple function? Others leap to mind, e.g., hb(a)=|a-b| HDTV(b) x1 contour of dimension projection f(a)=a1 b c x2

Graphs of functionals with hyper-circular contours TV()=TV(x33) TV(x15) 1 2 3 4 5 X Y TV  HDTV Graphs of functionals with hyper-circular contours  h(a)=|a-| 1 2 3 TV(x15)-TV() 4 5 X Y TV-TV()  hb(a)=|a-b| b

= (1/|a|)d(xxdad) factor out ad Angular Variation functionals: e.g., AV(a)  ( 1/|a| ) xR xoa d is an index over the dimensions,  COS(a) = (1/|a|)xRd=1..nxdad = (1/|a|)d(xxdad) factor out ad = (1/|a|)d=1..n(xxd) ad = |R|/|a|d=1..n((xxd)/|R|) ad = |R|/|a|d=1..n d ad COS(a)  a = ( |R|/|a| )  o a COS(a)  AV(a)/(|||R|) = oa/(|||a|) = cos(a) COS (and AV) has hyper-conic isobars center on  COSb(a)?  a b COS and AV have -contour(a) = the space between two hyper-cones center on  which just circumscribes the Euclidean -hyperdisk at a. Intersection (in pink) with HDTV -contour. Graphs of functionals with hyper-conic contours: E.g., COSb(a) for any vector, b

f(a)x = (x-a)o(x-a) d = index over dims, = d=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes = d=1..n(k2kxdk)2 - 2d=1..nad(k2kxdk) + |a|2 = d(i2ixdi)(j2jxdj) - 2d=1..nad(2kxdk) + |a|2 =di,j 2i+jxdixdj - 2 d,k2k ad xdk + |a|2 f(a)x = i,j,d 2i+j (Pdi^dj)x - |a|2 k2k+1 dad (Pdk)x + β exp( -f(a)x ) = βexp(-i,j,d 2i+j (Pdi^dj)x) *exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) β exp( -f(a)x ) = β (exp(-i,j,d 2i+j (Pdi^dj)x) exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) ) Adding up the Gaussian weighted votes for class c: xcβ exp( -f(a)x ) = β xc (exp(-i,j,d 2i+j (Pdi^dj)x) exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) ) xc exp((-i,j,d 2i+j (Pdi^dj)x) + k,d2k+1 ad (Pdk)x ) Collecting diagonal terms inside exp xc exp( ij,d -2i+j (Pdi^dj)x + i=j,d(ad2i+1-22i ) (Pdi)x ) i,j,d inside exp we have coefs which do not involve x multiplied by a 1-bit or a 0-bit, depending on x thus for fixed i,j,d we either have the x-indep coef (if 1-bit) or we don't (if 0-bit) xc ( ij,d exp(-2i+j (Pdi^dj)x) * i=j,d exp((ad2i+1-22i)(Pdi)x) )  ( ij,d:Pdijx=1 exp(-2i+j ) * i=j,d:Pdijx=1 exp((ad2i+1-22i)) ) (eq1)