Outlier Respecting Points Approximation Danny Z. Chen and Haitao Wang Computer Science and Engineering University of Notre Dame Indiana, USA
The motivation Propose a new problem model for dealing with outliers an optimal solution data without outliers data many optimal solutions outliers a particular optimal solution that respects outliers
Points approximation Input: A point set P in 2-D Output: An approximation function f Approximation error: Vertical distance e(P,f)=max{error of each point} error
The problem (min-#) Given: An allowed error ε≥ 0 Goal: an approximation function f of minimized size, such that e(p,f) ≤ε
Problem variations Step function (SF) error
Problem variations (cont.) Piecewise-linear function (PF) error
Problem variations (cont.) Weighted versions for both SF and PF Every point has a weight ui Error of each point: ui×vertical distance WSF and WPF there is a weight ui vertical distance
Problem variations (cont.) Outlier versions: Allow a given number g of outliers e(P,f)=max{error of every non-outlier point} error an outlier
Problem variations (cont.) Outlier versions of SF, WSF OSF and OWSF error outlier
Problem variations (cont.) Outlier versions of PF, WPF OPF and OWPF error outlier
Outlier-respecting versions (new) Allow a given number g of outliers e(P,f)=max{errors of non-outlier points} Outlier error: e’(P,f)=max{errors of outliers} Goal: minimize the size |f|, such that e(P,f)≤ε among all optimal solutions for the minimized |f|, find the solution with minimum outlier error e’(P,f)
An example outlier an SF case: ɛ=1.5, g=1 (one outlier) y (4,5) ORSF outlier error: 3.5 (3,2) 1.5 (2,1) 0.5 OSF outlier error: 4.5 x (1,0)
Outlier-respecting versions (cont.) Step function: SF -> OSF -> ORSF Weighted: WSF -> OWSF -> ORWSF Piecewise linear function: PF->OPF->ORPF Weighted: WPF -> OWPF -> ORWPF
Previous results problem versions results OSF O(ng2) [Fournier and Vigneron,08’] OWSF O(ng2) [Chen and Wang 09’] OPF O(ng4log2n) [Chen and Wang 09’] OWPF
Our new results on the outlier-respecting versions problem versions results OSF O(ng2) ORSF O(ng3logn) OWSF ORWSF O(ng3lognlogg) OPF O(ng4log2n) OWPF ORPF O(n2+δg3.5+1.5δlogn) ORWPF O(glogn) O(glogn) O(n)
Our algorithms A dynamic programming algorithmic scheme for all problems (ORSF, ORWSF, ORPF, ORWPF) Different computational components for each specific problem
The computational components The computational components of our algorithmic scheme Compute w(i,j,q) for each query on (i,j,q) 1≤i ≤j≤n, for point index, 0 ≤q≤g, for the number of outliers w(i,j,q): the outlier error of using one segment to approximate {pi,pi+1,…,pj} with q outliers outliers wijq pi pj
Computing w(i,j,q) for ORSF Observation: Only need to consider the highest q points and the lowest q points outlier wijq pi pj the case q=1
Computing w(i,j,q) for ORSF Observation: Only need to consider the highest q points and the lowest q points 2 outliers wijq pi pj the case q=2
Computing w(i,j,q) for ORSF Find the highest and lowest q points Use a q-range-minima data structure O(q) time, with O(nlognlogq) time preprocessing (Chen and Wang 09’) Compute wijq: O(q) time
Computing w(i,j,q) for ORWSF Observation: a point pt can be approximated within error ɛ by a segment if and only if the y-coordinate of the segment is in the interval [yt-ɛ/ut,yt+ɛ/ut] yt-ɛ/ut is the lower end of the interval yt+ɛ/ut is the upper end of the interval pi pj
Computing w(i,j,q) for ORWSF (cont.) Observation: only need to consider the points with q highest lower ends and q lowest upper ends q=3 outliers pi pj
Computing w(i,j,q) for ORWSF (cont.) A difficulty: determine the optimal segment in the strip to minimize the outlier error Model it as finding the lowest point in the common intersection of a set of upper half-planes A naïve approach takes O(q2) time O(qlogq) time: model it as updating the upper envelope for an offline sequence half-plane insertions and deletions common intersection lowest point
Computing w(i,j,q) for ORWSF (cont.) Compute the q points with highest lower end and the q points with lowest upper end O(q) time Determine the value of w(i,j,q) O(qlogq) time
Computing w(i,j,q) for ORPF/ORWPF Total time: O((nq1.5)1+δ) time 3-D segment dragging queries: A convex polyhedron G in 3-D Each query is specified by a line segment e outside G and a direction perpendicular to e, find the first point (if any) on G that is hit by e if we move e along the direction Our result: With O(n) preprocessing, answer each query in O(logn) time e G
Thank you