C [support, conf]" ; MaxConfKernelSet(DB, C, DB(C), RS, COMMON); RS=RS-{ith record in DB(C)}; i++; RS=RS \union {ith record in DB(C)}; } Invoke: MaxConfKenalSet(DB,C, DB(C), null, null); // RS, COMMON is empty initially"> C [support, conf]" ; MaxConfKernelSet(DB, C, DB(C), RS, COMMON); RS=RS-{ith record in DB(C)}; i++; RS=RS \union {ith record in DB(C)}; } Invoke: MaxConfKenalSet(DB,C, DB(C), null, null); // RS, COMMON is empty initially">
Download presentation
Presentation is loading. Please wait.
1
Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data sets, can have 10,000 attributes and 100 records. 10, 000 attributes, up to 100 million combinations of two attributes and up to 1 trillion 3 attribute sets!
2
Data Driven Algorithm Constructing the Max-conf kernel for small data sets: Input: i) a Database DB ii) a fixed consequent C Output: a set R of rules such that for any rule of the form X->C there exists a rule X'->C in R, where X' is a superset of X and X'->C has a a higher confidence then X->C
3
Algorithm: // DB(C) is the set of records that satisfy the consequent // RS is a working set which maintain the current subset of records that satisfy the consequent COMMON is the set of common descriptors for the record set RS; MaxConfKernelSet(DB, C, DB(C), RS, COMMON) { i= size(RS)+1; if (i==1) { COMMON=Descriptors in the ith record in DB(C);} RS=RS \union {ith record in DB(C)}; while (i<=size(DB(C))) do { Delete from COMMON the descriptors not shared by the ith record; Compute support of records satisfying {COMMON-C}; Compute the confidence of COMMON-C->C; if (COMMON-C)!=null) { if sufficient support and not duplicate output "COMMON-C->C [support, conf]" ; MaxConfKernelSet(DB, C, DB(C), RS, COMMON); RS=RS-{ith record in DB(C)}; i++; RS=RS \union {ith record in DB(C)}; } Invoke: MaxConfKenalSet(DB,C, DB(C), null, null); // RS, COMMON is empty initially
4
OLAP and Statistical databases Statistical databases – from early 80s –Mutidimensional datasets concerned with summariziation over the dimensions of the data sets. 2-D representations – census, socioeconomic data etd OLAP: on line analytical processing: mid 90s
5
Multi-dimensional Statistical Table
6
2-D representation of statistical data
7
A graph model for statistical data
8
A scheme for stat data
9
More schemes
11
Relational representation of statistical object
12
Automatic aggregation concept
13
Terms in SDB and OLAP
14
SDB and OLAP operators
15
Completeness of statistical algebra
16
Overlapping and timevarying categories
17
Physical organization
18
Encoding column category values
19
Array linearization
20
Header compression
21
Lattice of materialization
22
Partitioning of a data cube into subcubes
23
Cube operator
24
Data Cube – shortcomings of SQL
25
Sales Roll Up by Model by Year and by color
26
Using ALL value
27
3 dimensional rollup in SQL
28
Cross-tabulation in SQL
29
Cross Tabulation
30
CUBE operator
31
Support of histograms
32
A 3D data cube
33
ALL value and decoration field
34
Decorations
35
ROLLUP operator
36
Percentage of total as an aggregate function
37
Indices
38
STAR scheme
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.