1 Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna Maddalena
2 Outline Introduction Modeling of data and patterns Query operators Summary and future work
3 Motivation Huge amounts of data are produced. Interesting knowledge has to be detected and extracted. Knowledge extraction techniques (i.e., Data Mining) are not sufficient: Huge amounts of results (clusters, association tules, decision trees etc) Arbitrary modeling of results
4 Motivation (con’t) We need to be able to manipulate the knowledge discovered! The basic requirements: A generic and homogenous model for patterns. Well defined query operators. Efficient storage.
5 The Patterns and PBMS [Rizzi et. al. ER 2003] Patterns are compact and rich in semantics representations of raw data. Clusters, association rules, decision trees e.t.c. Pattern Base Management System Patterns are treated as first class citizens Pattern-based queries Approximate mapping between patterns and raw data
6 Contributions We formally define the logical foundations for pattern management We present a pattern specification language We introduce queries and query operators
7 Outline Introduction Modeling of data and patterns Query operators Summary and future work
8 PBMS architecture Pattern Space: Pattern Types Pattern Classes Patterns Intermediate Results Data Space
9 The patterns Patterns hold information for: the data source the structure of the pattern The relation between the structure and the source, in an approximate logical formula.
10 Pattern - Cluster Example Pid337 Structure[CENTER: [X: 21, Y: 1200], RAD: 12 ] DataEMP: {[Age, Salary]} Formula(t.Age - 21) 2 + (t.Salary ) 2 ≤ 12 2 where t EMP
11 Pattern type - example NameDisk Structure Schema[CENTER: [X:real, Y: real], RAD: real ] Data SchemaREL: {[X: real, Y: real]} Formula Schema(t.X - CENTER.X) 2 + (t.Y - CENTER.Y ) 2 ≤RAD 2 where t REL
12 The formula An intentional description of the pattern- data relation pros: Efficiency, more intuitive results cons: Accuracy
13 Intentional vs. Extensional
14 The formula (con’t) The formula is a predicate: fp(x,y) where x Source,y Structure Expressiveness. Functions and predicates Safety. Range restriction. Queries employing the formula are n-depth domain independent.
15 Outline Introduction Modeling of data and patterns Query operators Summary and future work
16 Query Operators Query operator classes: Database operators Pattern Base operators Crossover database operators Crossover pattern base operators
17 Crossover Operators PID data structure formula Pattern Space Data Space Exact Approximation Exact evaluation, via the intermediate mappings Approximate evaluation, via the formula
18 Crossover Operators Database Drill-Through: Which data are represented by these patterns? Data-Covering: Which data from this dataset can be represented by this pattern? Pattern Base Pattern-Covering: Which of these patterns represent this dataset?
19 Query Example Drill-through( { p | p intersects q})
20 Outline Introduction Modeling of data and patterns Query Operators Summary and future work
21 Summary Formal specification of basic PBMS concepts Investigation on the representation of the pattern-data relation Formal definition of query operators
22 Future Work Query language Generic similarity measures Efficient implementation of intermediate mappings Statistical measures for the patterns.