1 DATA STRUCTURES USED IN SPATIAL DATA MINING
2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles, polygons, cubes and other geometric objects. Spatial data occupies a certain amount of space called its spatial extent, which is characterized by location and boundary. broadly be defined as data which covers multidimensional points, lines, rectangles, polygons, cubes and other geometric objects. Spatial data occupies a certain amount of space called its spatial extent, which is characterized by location and boundary. USES USES Geographic Information Systems. Geographic Information Systems. CAD/CAM It can CAD/CAM It can Multimedia Applications Multimedia Applications – Content based image retrieval – Content based image retrieval – Fingerprint matching – Fingerprint matching – MRI ( Digitized medical images) – MRI ( Digitized medical images)
3 Features of spatial data Specific features of spatial data are rich data types, implicit spatial relationships among the variables, observations that are not independent, spatial auto correlation among the features. Specific features of spatial data are rich data types, implicit spatial relationships among the variables, observations that are not independent, spatial auto correlation among the features. It has two distinct types of attributes i.e. spatial attributes, non spatial attributes. Spatial attributes are used to define the spatial locations and extend of spatial objects. It has two distinct types of attributes i.e. spatial attributes, non spatial attributes. Spatial attributes are used to define the spatial locations and extend of spatial objects.
4 Types of spatial databases Region Data: It has a spatial extent having a location and boundary. Region data basically is the geometric approximation to an actual database. Region Data: It has a spatial extent having a location and boundary. Region data basically is the geometric approximation to an actual database. Point Data: Point data consists of collection of points in a multidimensional space. It doesnt cover any area of space. Point Data: Point data consists of collection of points in a multidimensional space. It doesnt cover any area of space.
5 What is Spatial Data Mining? It is defined as the non-trivial search for interesting and unexpected spatial patterns from spatial databases. It is defined as the non-trivial search for interesting and unexpected spatial patterns from spatial databases. New understanding of geographic processes for critical questions like how is the health of planet Earth? Characterize effects of human activity on environment and ecology ? needs spatial data mining. New understanding of geographic processes for critical questions like how is the health of planet Earth? Characterize effects of human activity on environment and ecology ? needs spatial data mining.
6 Spatial data in GIS A geographic information system is any system for capturing, storing, analyzing and managing data and associated attributes which are spatially referenced to Earth. A geographic information system is any system for capturing, storing, analyzing and managing data and associated attributes which are spatially referenced to Earth. There are two broad methods used to store data in a GIS i.e. Raster and Vector. In a GIS, geographical features are often expressed as vectors, by considering those features as geometrical shapes like point, chains, polygons There are two broad methods used to store data in a GIS i.e. Raster and Vector. In a GIS, geographical features are often expressed as vectors, by considering those features as geometrical shapes like point, chains, polygons.
7 Spatial data structures used in GIS In order to handle spatial data efficiently, as required in computer aided design and geo- data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations. Quad tree Quad tree k-d tree k-d tree R-tree R-tree R+-tree R+-tree R*-tree R*-tree
8 Quad trees It is used to store 2D space. It is used to store 2D space. Each node of a quad tree is associated with a rectangular region of space. Each node of a quad tree is associated with a rectangular region of space. The top node is associated with the entire target space. The top node is associated with the entire target space. Each internal node splits the space into four disjunct sub spaces according to the axes. Each internal node splits the space into four disjunct sub spaces according to the axes. Each of these sub spaces is split recursively until there is at most one object inside each of them. Each of these sub spaces is split recursively until there is at most one object inside each of them.
9 Division of space by quadtree
10 k-d Trees k-d Trees A k-d tree partitions the space into two sub spaces according to one of the coordinates of the splitting points. A k-d tree partitions the space into two sub spaces according to one of the coordinates of the splitting points. Let level(nod) be the length of the path from the root to the node nod and suppose the axes are numbered from 0 to k 1. At the level level(nod) in every node the space is split according to the coordinate number (level(nod) mod k). Let level(nod) be the length of the path from the root to the node nod and suppose the axes are numbered from 0 to k 1. At the level level(nod) in every node the space is split according to the coordinate number (level(nod) mod k). The partitioning is done along one dimension at the node at the top level of the tree, along another dimension in nodes at the next level and so on, cycling through the dimensions. The partitioning is done along one dimension at the node at the top level of the tree, along another dimension in nodes at the next level and so on, cycling through the dimensions.
11 Division of space by a k-d tree
12 R-Trees It is a balanced tree structure with the index objects stored in leaf nodes. The structure is completely dynamic with no need for intermittent restructuring. If M is the maximum number of entries in one node and m = M/2. Then m specifies the minimum number of entries allowed in a node except for the root.
13 Continue… Every non-leaf node has between m and M children unless it is the root. unless it is the root. The root node has at least two children unless it is a leaf. The root node has at least two children unless it is a leaf. For each index record (I, tuple-id) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object. For each index record (I, tuple-id) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object. For each (I, child-ptr) entry in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes. For each (I, child-ptr) entry in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes.
14 Division of space by R-trees
15 R+-tree It is an extension of R-tree. Here bounding rectangle of nodes at one level do not overlap. This feature decreases the number of searched branches of the tree and reduces the time consumption and increases the space consumption. Here the data objects are allowed to split so that different parts of one object can be stored in more nodes of one tree level.
16 Continue… Root has at least two children unless it is a leaf. Root has at least two children unless it is a leaf. All leaves are at same level. All leaves are at same level. There is no constraint on the minimum number of entries at each node. There is no constraint on the minimum number of entries at each node.
17 Division of space by R+-tree
18 R*-tree R*-tree is a modification of R–tree. R–tree tries to minimize the area of all nodes of the tree. R*-tree is a modification of R–tree. R–tree tries to minimize the area of all nodes of the tree. But R*–tree combines more criteria: But R*–tree combines more criteria: the area covered by a bounding rectangle the area covered by a bounding rectangle the margin of a rectangle: Minimization of the margin of a bounding rectangle prefers the squares. the margin of a rectangle: Minimization of the margin of a bounding rectangle prefers the squares. the overlap between rectangles: Minimization of the overlap between rectangles decreases the number of paths that must be searched the overlap between rectangles: Minimization of the overlap between rectangles decreases the number of paths that must be searched
19 Conclusion New techniques are needed for SDM due to New techniques are needed for SDM due to spatial auto correlation, continuity of space. Indexing structures discussed above are very much useful for spatial data represented in vector space. For metric spaces M-tree, Vp-tree, mvp-tree are used.The main aim of all these indexing structures is to minimize disk access. spatial auto correlation, continuity of space. Indexing structures discussed above are very much useful for spatial data represented in vector space. For metric spaces M-tree, Vp-tree, mvp-tree are used.The main aim of all these indexing structures is to minimize disk access.
20 References Spatial datamining.pdf R+-tree.pdf Data structure for spatial data mining21.pdf
21 THANK YOU
22 ??? QUERIES ???