Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
C6 Databases.
The Relational Database Model
Raster Based GIS Analysis
Group 3 Akash Agrawal and Atanu Roy 1 Raster Database.
Geodatabases by Shawn J. Dorsch Spatial Databases Part 2.
Managing Data Resources
Cartographic and GIS Data Structures
Databases Chapter Distinguish between the physical and logical view of data Describe how data is organized: characters, fields, records, tables,
Chap8: Trends in DBMS 8.1 Database support for Field Entities
Geographic Information Systems
File Systems and Databases
GIS 200 Introduction to GIS Buildings. Poly Streams, Line Wells, Point Roads, Line Zoning,Poly MAP SHEETS.
Organizing Data & Information
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Introduction to Databases Transparencies
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Spatial Data Models. What is a Data Model? What is a model? (Dictionary meaning) A set of plans (blueprint drawing) for a building A miniature representation.
Chapter 1: Introduction to Spatial Databases 1.1 Overview 1.2 Application domains 1.3 Compare a SDBMS with a GIS 1.4 Categories of Users 1.5 An example.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
Applied Cartography and Introduction to GIS GEOG 2017 EL
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Geographic Information System GIS This project is implemented through the CENTRAL EUROPE Programme co-financed by the ERDF GIS Geographic Inf o rmation.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-2 Chapters 3 and 4.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
7.1 Managing Data Resources Chapter 7 Essentials of Management Information Systems, 6e Chapter 7 Managing Data Resources © 2005 by Prentice Hall.
How do we represent the world in a GIS database?
Cartographic and GIS Data Structures Dr. Ahmad BinTouq URL:
Intro to Raster GIS GTECH361 Lecture 11. CELL ROW COLUMN.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Data Warehousing.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Data resource management
GIS Data Structures How do we represent the world in a GIS database?
+ Information Systems and Databases 2.2 Organisation.
Spatial DBMS Spatial Database Management Systems.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
DATA RESOURCE MANAGEMENT
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Trends in DBMS. Learning Objectives After this segment, students will be able to Describe why learn about field data type Describe what field data type.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Spatial Data Models Geography is concerned with many aspects of our environment. From a GIS perspective, we can identify two aspects which are of particular.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Managing Data Resources File Organization and databases for business information systems.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Introduction to Spatial Databases (2)
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Relational Algebra Chapter 4, Part A
Spatial Concepts and Data Models
Data Queries Raster & Vector Data Models
MANAGING DATA RESOURCES
Cartographic and GIS Data Structures
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Value of SDBMS Non-spatial queries: Spatial Queries:
Slides based on those originally by : Parminder Jeet Kaur
Presentation transcript:

Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary

Why Learn about Field Data-sets? Field data is timely and abundant Sensors (e.g. satellite based ones) provide periodic snapshot of Earth Most up-to-date data about current events (e.g. fires, flood) Field data are useful In creating, revising and evaluating vector data sets Digital archival of fragile historical paper maps To manually get details not captured in vector interpretations Example: Location selection for a facility (e.g. a grocery store) Consider a set of Aerial photographs of different locations Vector interpretation includes roads, water bodies, elevation What other information can aerial imagery reveal for construction planning? trees (types and location), buildings, …

What are Field Data-sets? Field data set examples Satellite images, aerial photographs Digitized paper maps Earth Science data-sets, e.g. rainfall, temperature maps Data types of Spatial field data sets Images satellite based, e.g. aerial photographs measurements from a Geo-registered sensor networks, e.g. weather Video, i.e. time series of images Audio data Focus: Primarily images though some discussion will apply to other data types

Fields and Rasters: a Sampling of Field Values Definitions Field: a mapping from a spatial domain to a value domain Image: a mapping from a rectangular grid to a value domain A rectangular grid is a collection of cells called pixels Raster is geo-registered image, i.e. grid axis have absolute spatial locations Fields are often approximated as rasters Example: Figure 8.1 Identify spatial domain, field, rectangular grid, raster approximation Fields can be approximated as images if relative spatial locations are adequate Figure 8.1

Computing with Field Data Field data manipulated using operations of map algebra image algebra An Algebra is a mathematical structure consisting of Operands and Operations Map Algebra Operand: rasters Operations: Can be classified into four groups Local, Focal, Zonal and Global Image Algebra Operand: images Operations: crop, zoom, rotate

Local Operation A local operation maps a raster into another raster such that the value of a cell in the new raster depends only on the value of that cell in the original raster Examples: unary operation : thresholding binary operation: point wise addition Figure 8.2

Focal Operation In a focal operation, the value of a cell in the new raster is dependent on the values of the cell and its neighboring cells in the original raster Examples: unary operations: focal sum, gradient, … Neighborhoods: Rook, Bishop and Queen Figure 8.3

Zonal Operation In a global operation, the value of a cell in the new raster is a function of the location or values of all cells in the original or another raster Examples: zonal sum, zonal average,... Figure 8.4

Global Operation In a zonal operation, the value of a cell in the new raster is a function of the value of that cell in the original layer and the values of other cells which appear in the same zone specified in another raster Example: distance from nearest facility Figure 8.5

Image Operations: Trim Image Operations Ignore the absolute locations of pixels. Come from image processing literature Ex. smoothing, low pass filter, high pass filter Example: A trim operation extracts an axis-aligned subset of the original raster Figure 8.6

Storage and Retrieval of Raster Data - 1 Traditional Approach Store raster data in a file system Use custom software to retrieve data-items of interest Example: personal photographs stored on MS Windows Q? What attributes can one attach to digital photographs ? Q? Is there an easy way to retrieve all pictures taken in San Francisco? Limitations Rigid schema limited ability to add and manage additional attributes Canned Queries only limited ability to support ad-hoc queries Data quality limited ability to identify duplicates or similar data-items

Storage and Retrieval of Raster Data in a SDBMS A database approach Database tables store raster data items attributes (i.e. meta-data), e.g. creation date, geo-location, subject,... Use SQL like query language to retrieve desired data-items retrieve all raster data-items overlapping with city of San Francisco (Q1) retrieve latest raster data-item within city of Paris (Q2) retrieve raster data-items similar to a given image (Q3) Pros: table schema definition allows user defined attributes improve ability to pose ad-hoc queries (Ex. Q1, Q2) improve data reliability and quality example: Query Q3 may be used for duplicate reduction

Storage and Retrieval of Raster Data - Challenges Challenges in database based approach Storage: size( raster data item) > size (disk blocks) Retrieval: raster has rich content a picture is worth a thousand word! Approaches to storage challenge 1.Delegate storage to DBMS Use Binary Large Object (BLOB) data-type create table my_picture( image: BLOB; creation_date: date; place: point; … ) 2.Do-it-yourself divide a raster data-item into smaller slices Q? Which way of slicing reduce disk I/Os for common queries?

How is Raster Data Stored on Secondary Storage? Figure 8.8 Slicing approaches Linear, e.g. one row per disk block (see Figure 8.8(b)) Tiling - see Figure 8.8(c ) Tiling is preferred For queries extracting rectangular sub-images Example - terraserver.com

How is Raster Data Queried? Retrieval challenge of rich content 1.Meta-data approach 2.Content based retrieval Meta-data approach Select a set of descriptive attributes simpler SQL data types, e.g. numeric, string, date,... example: source, location, time stamp, subject, resolution,... Store values of descriptive attributes for each raster data-item Allow SQL queries on the descriptive attributes Limitation of meta-data approach Restricts queries to content captured by descriptive attributes Does not support “Similarity” based queries example: Find all raster data-items similar to a given raster data item

Content Based Retrieval (CBR) Examples Q1. Find all raster data-items similar to a given raster data item Q2. Locate a photograph of a river in Minnesota with trees nearby Q3. Find all images of state parks which have a lake within them, are within a radius of one hundred miles from Chicago, and are southwest of Chicago State of the Art However, few robust implementations of CBR are available as of 2002 several research prototypes address similarity query Q1 Result quality is similar to those of web searches (e.g. some of the retrieved raster data-item are useful many similar data item are not retrieved in the result usable in application domains such as publishing Our goal is to understand a current approach to similarity queries involving spatial similarities

Content Based Retrieval (CBR) Spatial Similarity Consider a pair of raster images with common objects (e.g. parks, lakes) Spatial similarity between raster images can be defined based on similarity of spatial relationships (e.g. topological, directional) Q? Which pairs exhibit higher similarity? P1: (inside, disjoint)or P2: (inside, covered by) P3: (disjoint, touch)orP4: (disjoint, inside) P5: (north west, north)orP6: (west, east) A graph framework for comparing spatial relationships Nodes = spatial relationships Edges = connect most similar nodes Similarity metric = number of edge on shortest path between 2 nodes See Figures 8.9 and 8.10

Topological Relationship Similarity Figure 8.9 Study Figure 8.9, pp.234 Nodes = topological relationships Edges = most similar Similarity measure = path length Inference from Model P2: (inside, covered by) more similar than P1: (inside, disjoint) Do you agree? Review Figure 2.3 (pp.30)

Direction Relationship Similarity Figure 8.10 Study Figure 8.10, pp.235 Nodes = topological relationships; Edges = most similar Similarity measure = path length Inference: P5 (north-west, north) more similar than P6 (west, east)

Distance Similarity Distance similarity is based on Euclidean distance between the centroids of the objects. Example: Image R is more similar to P than Q in Figure 8.11 (pp.235) Figure 8.11

A Computational Approach to CBR Figure 8.12 Attribute Relation Graph (ARG) Node = objects in a raster Edges = relationships Example: raster of Figure 8.12(a) ARG in Figure 8.12(b) point object O3 rectangles O1, O2 edge (O1, O2) shows that they are disjoint, at 61 degree direction and 5.2 units distant Vector representation of ARG Lists objects and edge properties Example in Figure 8.12

A Computational Approach to CBR Figure 8.13 Steps: 1.Represent each raster data item by its ARG vector 2.Map query raster data item by its ARG vector 3.Find most similar raster data-items in the database by comparing ARG vector representations use a distance metric use a multi-dimensional index Comment: Result quality is similar to those of web searches. Some of the retrieved raster data-item are useful

Why are Data Warehouses Interesting? Data Warehouse facilitate group decision making Consider a dataset 1 measure (i.e. Sales) 3 dimensions (e.g. Company, Year, Region) Analysis questions Q1. Rank Regions by total sales. Q2. Rank years by total sales. Q3. Where are sales consistently growing? Cross tabulates summaries reports used to analyze the trends Example:

Generating Cross-tabulation Summaries Traditional Approach Use custom software pulling data out of a DBMS Limitations: redundant of work, inefficient use of resources Data Warehouse approach Cross-tab. Can be generated using a set of simple report each report is generated from a SQL “Select... group by” statement Example: Fig (pp. 244) and Table 8.3 (pp. 245) cross-tab example in last slide is a union of SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2 table 8.3 shows SQL queries to compute each part Advantage rest of SQL is available for pre/post processing of data performance gains by eliminating unnecessary copying of data

Example Data Warehouse Figure 8.19

Cross-tabulation vs. Report Hierarchy Spreadsheet view of a report Views a report a N-dim. Spreadsheet N = number of dimension attributes Each cell contains value of “measure” Cross-tabulation view of a Report hierarchy Example: report hierarchy for SALES-L0-A, SALES-L2-A, SALES-L1-B, SALES-L2, Figure 8.19 (pp.244)

What is a Data Warehouse? Data Warehouse is a special purpose database Primarily used for specialized data analysis purposes Facilitates generation and navigation of a hierarchy of reports Special purpose data-sets and queries Data consists of: a few measure attributes a set of dimension attributes The measure attribute depends on dimension attributes Queries generate reports report measure for selected values of dimensions aggregate measure for given subset of dimensions What is a spatial data warehouse? Data warehouses with spatial measures or dimensions Example: census data - census tract is a spatial dimension Example: logistics data - route is a spatial dimension

Data Warehouse Operations Operations on a data warehouse Roll-up, Drill-down Slice, Dice Pivot Roll-up Inputs: A report R, A subset S of dimensions in R Output: A sequence of reports summarizing R Example 1: R=SALES-Base, S=(Year, Region) in Figure 8.19 (pp. 244) Output consists of reports SALES-L0-A, SALES-L1-B, SALES-L2 Example 2: R=SALES-Base, S=(Region, Year) Output consists of reports SALES-L0-A, SALES-L1-A, SALES-L2 Drill-down Inputs: A report R, A dimension D not in R Output: A reports detailing R on D Example: R = SALES-L1-B, D = Region in Figure 8.19 (pp.244) Output : report SALES-L0-A

Data Warehouse Operations Slice, Dice Reduce dimensions in a table (Figure 8.7, pp.232) Inputs: A report R, A value V for a dimension D in R Output: A subset of R where D=V Example: R=SALES-L0-A, D=Year, V=1994 in Figure 8.19 (pp.244) output: Table 8.5 (pp.246) includes tuple (ALL, 1994, America, 35) Figure 8.7

Data Warehouse Operations Pivot For a spreadsheet view of reports Transposes a spreadsheet Example Inputs: A spreadsheet view of a report R Output: A transposed spreadsheet Example: R= SALES-L0-A, Figure 8.19 (pp.244)

Logical Data Model of a DWH Purpose of a logical data model Specify a framework to specify computational structure Allow extension of SQL to model new needs Cube operation Input : A fact table Output: A set of summary reports covering all subsets of dimension columns equivalent to union of all tables and reports in Figure 8.19 (pp.244) Example: Figure 8.18, pp.243 SELECT Company, Year, Region, Sum(Sales) AS Sales FROMSALES GROUP BY CUBE Company, Year, Region

Figure 8.18

Physical Data Model of a DWH Purpose: Computationally efficient implementation Ideas: Pre-computation - pre-compute some of reports and use those to compute other reports New indexing methods, e.g. bit-map index Query Processing Strategies strategies for aggregate functions new strategies for multi-table joins Let us look at strategies for aggregate functions

DWH Physical Model: Aggregate Function Strategies Aggregate Functions Compute summary statistics for a given set of values Examples: sum, average, centroid (Table 8.1, pp.238) Strategies for efficient computation Characterize easy to compute aggregate functions 3 categories: distributive algebraic holistic First 2 categories can be computed easily in one scan of the dataset

Definitions of Aggregate Function Categories Notation: F,G,G1,G2,…,Gn are aggregate functions where n is small S is a set of values, e.g. S=(1,2,3,4) P=(S1,S2,…,Sp) is a partition of S, e.g. P=(S1,S2), S1=(1,2), S2=(3,4) Distributive (F) if there exists a G such that F(S) = G(F(S1),F(S2),…,F(Sn)) Example: sum is distributive Illustration: sum(1,2,3,4) = sum(sum(1,2),sum(3,4)) Algebraic (F) if there exists G1,…,Gn, (where n is small) and F(S) = G(G1(S1),…,Gn(S1),G2(S1),…,Gn(S2),…,G1(Sp),…, Gn(Sp)) Example: average is distributive Illustration: average(1,2,3,4)= {count(1,2)*average(1,2)+count(3,4)*average} / {count(1,2)+count(3,4)}

Example: Distributive Aggregate Function Figure 8.14 Examples in cross-tabulation scenario (Figure 8.14, pp.238): Example 1: Min is distributive Example 2: Count is distributive

Examples: Algebraic Aggregate Functions Figure 8.15 Examples in cross-tabulation scenario (Figure 8.15, pp.239): Average and Variance are algebraic

Discussion - Spatial Data Warehouse Example Consider the example in Figure 8.16, pp.241 A map interpretation may be attached to each report each row has a spatial footprint, which can be aggregated by geometric-union The collection of maps may be called a mapcube Issues: What is needed in OGIS standard to support map-cube operation? Hierarchical collection of maps in mapcube what is an appropriate cartography to convey the relationship among maps?

Spatial Data Warehouses and Mapcube Figure 8.16

Figure 8.17

Summary Field data Useful in many applications due to rich content Represented as raster or image Operations can be categorized into local, focal, zonal, and global Field data storage and retrieval Tiling is a preferred way to divide raster data into disk blocks Meta-data based query is often used for retrieval Content based retrieval may be used for similarity searches Data warehouses support analysis e.g. cross-tabulation reports SQL CUBE operator support generation of DWH reports Distributive and Algebraic aggregate functions can be computed easily