Download presentation
Presentation is loading. Please wait.
Published byIris Hodges Modified over 9 years ago
1
Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining Jan Rauch University of Economics, Prague Czech Republic
2
Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining Presented an idea of a theoretical approach There are software tools for partial steps o Logic of discovery o Modifications o 4ft-Discoverer 2
3
Logic of Discovery Can computers formulate and verify scientific hypotheses? Can computers in a rational way analyze empirical data and produce reasonable reflection of the observed empirical world? Can it be done using mathematical logic and statistics? 3 1978
4
Logic of Discovery (simplified) Data matrix M State dependent structure Theoretical statements Theoretical calculi Observational statements Observational calculi 1: 1 Statistical hypothesis tests 4
5
Association rules – observational statements 5 M ab cd …. hypothesis tests Val( , M ) {0,1} M M
6
GUHA Procedure ASSOC – a tool for finding a set of interesting association rules 6 M ab cd Val( , M ) = 1 is prime: is true + does not logically follow from other more simple
7
Deduction rules in logic of association rules 7 Examples : Theorem for : is correct if and only if (1) or (2) (1) 1A and 1B tautologies of propositional calculus (2) 2 tautology Theorems for additional 4ft-quantifiers: is correct iff Applications: prime rules + dealing with knowledge in data mining 1A, 1B, 2, created from , , ‘, ‘
8
Data mining – CRISP-DM 8 http://www.crisp-dm.org/ Beer BMI Wine region, sportsmen, … Analytical report Logical calculus …
9
Data mining – CRISP-DM 9 http://www.crisp-dm.org/ Beer BMI Wine region, sportsmens, … Analytical report Logical calculus ??
10
Modifying Logic of Discovery 10 Logic of discovery Theoretical statements Logical calculus of associational rules Logic of association rules mining Logical calculus of associational rules A 1 A 2, A 3 A 4 … ; Cons(A 1 A 2 ), … Statements on data matrices, evaluation, Cons
11
Logic of association rules mining (simplified) 11 patientBMIBeerEducationSexStatus…AKAK o1o1 172UFD…1 o2o2 344BMM…6 o3o3 273SMW…2 …………………… onon 287BFS…4 o Type of M : number of columns + possible values, o o Val( , M ) o Items of domain knowledge: Beer BMI, … o Consequences of domain knowledge Cons(Beer BMI ), … Beer (8-10) 0.9,50 BMI (>30) Status (W) M ab cd LCAR Logical Calculus of Association Rules DK AR
12
Atomic consequences of Beer BMI (simplified) 12 patientBMIBeerEducationSexStatus … AKAK o1o1 172UFD…1 o2o2 344BMM…6 o3o3 273SMW…2 …………………… onon 287BFS…4 Cons(Beer BMI) Beer(low) 0.9,50 BMI(low )Beer(high) 0.9,50 BMI(high) Beer(0 – 3) 0.9,50 BMI(15 – 18) Beer: 0, 1, 2, …., 15 Low: - , = 0, …, 5 High: - , = 10, …, 15 BMI: 15, 1, 2, …., 35 Low: - , = 15, …, 22 High: - , = 28, …, 35 Beer(2 – 4) 0.9,50 BMI(17 – 22) Beer(11 – 13) 0.9,50 BMI(29 – 31) Beer(14 – 15) 0.9,50 BMI(30 – 35) … … … …
13
4ft-Discoverer 13 4ftD = LCAR, DK AR, 4ft-Miner, 4ft-Filter, 4ft-Synt Under implementation, based on Cons(Beer BMI) and
14
Applying 4ft-Discoverer 14 New knowledge not following from Beer BMI true in given data M ? 4ft-Miner 4ft-Filter Consequences of Beer BMI Rules not following from Beer BMI 4ft-Synt New knowledge C D, E F Particular interesting rules
15
4ft-Filter 15 4ft-Miner Cons(Beer BMI) Set of p, Base Set of Beer( ) .09, 50 BMI( ) Each p, Base : Is there Beer( ) .09, 50 BMI( ) such that is correct ? Filter out p, Base +
16
4ft-Synt 16 4ft-Miner Cons(C D) Set of p, Base Set of C( ) .09, 50 D( ) Is there enough p, Base and C( ) .09, 50 D( ) such that Consider C D as a candidate of new knowledge + is correct ?
17
Conclusions 17 http://sewebar.vse.cz/RuleML_demo/final/final.html o Rich association rules,, o Criteria of correctness for deduction rules o Formal language for domain knowledge Beer BMI, … o Atomic consequences Beer(low) p, Base BMI(low), …, Beer( ) p, Base BMI( ) o Conversion Beer BMI via o Partially implemented http://lispminer.vse.cz/, http://sewebar.vse.cz/http://lispminer.vse.cz/http://sewebar.vse.cz/
18
Thank you 18
19
19 Lower critical implication for 0 < p 1, 0 < < 0.5 : Examples of 4ft-quantifiers – statistical hypothesis tests The rule ! p; corresponds to the statistical test (on the level ) of the null hypothesis H 0 : P( | ) p against the alternative one H 1 : P( | ) > p. Here P( | ) is the conditional probability of the validity of under the condition . Fisher’s quantifier for 0 < < 0.5 : The rule ,Base corresponds to the statistical test (on the level of the null hypothesis of independence of and against the alternative one of the positive dependence.
20
20 Founded implication: M ab cd Double founded implication: Founded equivalence: Above Average: „Classical“: 4ft-Miner, important simple 4ft-quantifiers
21
The generalized quantifier is associational if it satisfies: If ( a, b, c, d ) = 1 and a’ a b’ b c’ c d’ d then also ( a’, b’, c, d ) = 1 Examples: Associational and implicational quantifiers 21 The generalized quantifier is implicational if it satisfies: If ( a, b, c, d ) = 1 and a’ a b’ b then also ( a’, b’, c, d ) = 1 Examples:
22
where is implicational is sound if there is a such that Despecifying-dereducing deduction rule SpRd 22 An example: despecifies to dereduces to instead of despecifies to and dereduces to
23
23 The 4ft quantifier is implicational if it satisfies: If ( a,b,c,d ) = 1 and a’ a b’ b then also ( a’,b’,c,d ) = 1 Deduction rules and implicational quantifiers (1) o is a-dependent if there a, a’, b, c, d such that ( a,b,c,d ) ( a’,b,c,d ), o b-dependent, …. o If is implicational then ( a,b,c,d ) = ( a,b,c’,d’ ) for all c’, c’, d, d’ o If * is implicational then we use only *( a,b ) instead of *( a,b,c,d ) TPC = a’ a b’ b is True Preservation Condition for implicational quantifiers
24
24 Theorem: If * is interesting implicational 4ft-quantifier and R = is a deduction rule then there are propositional formulas 1A, 1B, 2 derived from , , ’, ’ such that R is sound iff at least one of the conditions i), ii) is satisfied: i) both 1A and 1B are tautologies ii) 2 is a tautology Deduction rules and implicational quantifiers (2) and are examples of interesting implicational 4ft - quantifiers Definition: The implicational 4ft-quantifier * is interesting implicational if * is both a-dependent and b-dependent * (0,0) = 0
25
Class of 4ft quantifiersTruth Preservation Conditioncriterion for implicational a’ a b’ b known double implicational a’ a b’ b c’ c - double implicationala’ a b’+ c’ b + c known equivalency (associational ) a’ a b’ b c’ c d’ d - equivalencya’ + d’ a + d b’ + c’ b + c known with F-property if (a,b,c,d) = 1 and b c – 1 0 then (a,b+1,c-1,d) = 1 if (a,b,c,d) = 1 and c b – 1 0 then (a,b -1,c+1,d) = 1 known Overview of classes of 4ft-quantifiers Additional results: o Dealing with missing information o Tables of critical frequencies o Definability in classical predicate calculi o Interesting subclasses 25
26
Association rules and the ASSOC procedure (1) 26 { A, B } { E, F }
27
Association rules and the ASSOC procedure (2) 27 { A, B } { E, F } Conf ( { A, B } { E, F } ) = Supp ( { A, B } { E, F } ) = E F (E F) A B ab (A B) cd
28
GUHA and association rules 28 http://en.wikipedia.org/wiki/Association_rule_learning#cite_note-pospaper-7 History: The concept of association rules was popularised particularly due to the 1993 article of Agrawal [2], which has acquired more than 6000 citations according to Google Scholar, as of March 2008, and is thus one of the most cited papers in the Data Mining field. [2] However, it is possible that what is now called "association rules" is simliar to what appears in the 1966 paper [7] on GUHA, a general data mining method developed by Petr Hájek et al. [8]. [7]Petr Hájek [8]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.