Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining Jan Rauch University of Economics, Prague Czech Republic.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Mathematics in Engineering Education 1. The Meaning of Mathematics 2. Why Math Education Have to Be Reformed and How It Can Be Done 3. WebCT: Some Possibilities.
Statistically motivated quantifiers GUHA matrices/searches can be seen from a statistical point of view, too. We may ask ‘Is the coincidence of two predicates.
Techniques for Proving the Completeness of a Proof System Hongseok Yang Seoul National University Cristiano Calcagno Imperial College.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
1 Logic Logic in general is a subfield of philosophy and its development is credited to ancient Greeks. Symbolic or mathematical logic is used in AI. In.
Deduction In addition to being able to represent facts, or real- world statements, as formulas, we want to be able to manipulate facts, e.g., derive new.
University of Economics, Prague MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Statistical and Practical Significance Advanced Statistics Petr Soukup.
GUHA - a summary 1. GUHA (General Unary Hypotheses Automaton) is a method of automatic generation of hypotheses based on empirical data, thus a method.
GUHA - a summary 1. GUHA (General Unary Hypotheses Automaton) is a method of automatic generation of hypotheses based on empirical data, thus a method.
From Chapter 4 Formal Specification using Z David Lightfoot
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
SEWEBAR - a Framework for Creating and Dissemination of Analytical Reports from Data Mining Jan Rauch, Milan Šimůnek University of Economics, Prague, Czech.
Equational Reasoning Math Foundations of Computer Science.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Methods of Proof & Proof Strategies
Overview Definition Hypothesis
Hypothesis Testing.
1 1 Slide © 2005 Thomson/South-Western Chapter 9, Part B Hypothesis Tests Population Proportion Population Proportion Hypothesis Testing and Decision Making.
Chapter Thirteen Part I
Theoretical basis of GUHA Definition 1. A (simplified) observational predicate language L n consists of (i) (unary) predicates P 1,…,P n, and an infinite.
Slide Slide 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim about a Proportion 8-4 Testing a Claim About.
4/23/2017 HYPOTHESIS Moazzam Ali.
MATH 224 – Discrete Mathematics
1 Chapter 7 Propositional and Predicate Logic. 2 Chapter 7 Contents (1) l What is Logic? l Logical Operators l Translating between English and Logic l.
March 3, 2015Applied Discrete Mathematics Week 5: Mathematical Reasoning 1Arguments Just like a rule of inference, an argument consists of one or more.
Development in the Ferda project December 2006 Martin Ralbovský.
Logic CL4 Episode 16 0 The language of CL4 The rules of CL4 CL4 as a conservative extension of classical logic The soundness and completeness of CL4 The.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Propositional Logic Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma Guadalajara
0 What logic is or should be Propositions Boolean operations The language of classical propositional logic Interpretation and truth Validity (tautologicity)
1 Classes of association rules short overview Jan Rauch, Department of Knowledge and Information Engineering University of Economics, Prague.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
1 CMSC 250 Discrete Structures CMSC 250 Lecture 1.
LOGIC Lesson 2.1. What is an on-the-spot Quiz  This quiz is defined by me.  While I’m having my lectures, you have to be alert.  Because there are.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
1 CA 208 Logic PQ PQPQPQPQPQPQPQPQ
Logical Agents Chapter 7. Outline Knowledge-based agents Logic in general Propositional (Boolean) logic Equivalence, validity, satisfiability.
1 Introduction to Abstract Mathematics Chapter 2: The Logic of Quantified Statements. Predicate Calculus Instructor: Hayk Melikya 2.3.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
International Conference on Fuzzy Systems and Knowledge Discovery, p.p ,July 2011.
We will now study some special kinds of non-standard quantifiers. Definition 4. Let  (x),  (x) be two fixed formulae of a language L n such that x is.
Models of Computation: Automata and Formal Languages Sam M. Kim.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
Chapter 7. Propositional and Predicate Logic
Knowledge Representation and Reasoning
Business Research Methods William G. Zikmund
Overview and Basics of Hypothesis Testing
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Logics for Data and Knowledge Representation
Propositional Calculus: Boolean Algebra and Simplification
Chapter 9 Hypothesis Testing.
Business Research Methods William G. Zikmund
St. Edward’s University
Natural deduction Gerhard Gentzen.
Slides by JOHN LOUCKS St. Edward’s University.
Nature 2018 Summer Camp Hypothesis and Product Testing
Computer Security: Art and Science, 2nd Edition
Logics for Data and Knowledge Representation
Chapter 7. Propositional and Predicate Logic
This Lecture Substitution model
Statistical and Practical Significance
Business Research Methods William G. Zikmund
LEARNING OUTCOMES After studying this chapter, you should
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Presentation transcript:

Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining Jan Rauch University of Economics, Prague Czech Republic

Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining  Presented an idea of a theoretical approach  There are software tools for partial steps o Logic of discovery o Modifications o 4ft-Discoverer 2

Logic of Discovery Can computers formulate and verify scientific hypotheses? Can computers in a rational way analyze empirical data and produce reasonable reflection of the observed empirical world? Can it be done using mathematical logic and statistics?

Logic of Discovery (simplified) Data matrix M State dependent structure Theoretical statements Theoretical calculi Observational statements Observational calculi 1: 1 Statistical hypothesis tests 4

Association rules – observational statements 5 M   ab  cd …. hypothesis tests Val( , M )  {0,1} M M

GUHA Procedure ASSOC – a tool for finding a set of interesting association rules 6 M   ab  cd Val( , M ) = 1  is prime:  is true + does not logically follow from other more simple

Deduction rules in logic of association rules 7 Examples : Theorem for : is correct if and only if (1) or (2) (1)  1A and  1B tautologies of propositional calculus (2)  2 tautology Theorems for additional 4ft-quantifiers: is correct iff  Applications: prime rules + dealing with knowledge in data mining  1A,  1B,  2, created from , ,  ‘,  ‘

Data mining – CRISP-DM 8 Beer  BMI Wine region, sportsmen, … Analytical report Logical calculus …

Data mining – CRISP-DM 9 Beer  BMI Wine region, sportsmens, … Analytical report Logical calculus ??

Modifying Logic of Discovery 10 Logic of discovery Theoretical statements Logical calculus of associational rules Logic of association rules mining Logical calculus of associational rules A 1  A 2, A 3  A 4 … ; Cons(A 1  A 2 ), … Statements on data matrices, evaluation, Cons

Logic of association rules mining (simplified) 11 patientBMIBeerEducationSexStatus…AKAK o1o1 172UFD…1 o2o2 344BMM…6 o3o3 273SMW…2 …………………… onon 287BFS…4 o Type of M : number of columns + possible values, o    o Val( , M ) o Items of domain knowledge: Beer  BMI, … o Consequences of domain knowledge Cons(Beer  BMI ), … Beer (8-10)  0.9,50 BMI (>30)  Status (W) M   ab  cd LCAR Logical Calculus of Association Rules DK  AR

Atomic consequences of Beer  BMI (simplified) 12 patientBMIBeerEducationSexStatus … AKAK o1o1 172UFD…1 o2o2 344BMM…6 o3o3 273SMW…2 …………………… onon 287BFS…4 Cons(Beer  BMI) Beer(low)  0.9,50 BMI(low )Beer(high)  0.9,50 BMI(high) Beer(0 – 3)  0.9,50 BMI(15 – 18) Beer: 0, 1, 2, …., 15 Low:  -  ,  = 0, …, 5 High:  -  ,  = 10, …, 15 BMI: 15, 1, 2, …., 35 Low:  -  ,  = 15, …, 22 High:  -  ,  = 28, …, 35 Beer(2 – 4)  0.9,50 BMI(17 – 22) Beer(11 – 13)  0.9,50 BMI(29 – 31) Beer(14 – 15)  0.9,50 BMI(30 – 35) … … … …

4ft-Discoverer 13 4ftD =  LCAR, DK  AR, 4ft-Miner, 4ft-Filter, 4ft-Synt  Under implementation, based on Cons(Beer  BMI) and

Applying 4ft-Discoverer 14 New knowledge not following from Beer  BMI true in given data M ? 4ft-Miner 4ft-Filter Consequences of Beer  BMI Rules not following from Beer  BMI 4ft-Synt New knowledge C  D, E  F Particular interesting rules

4ft-Filter 15 4ft-Miner Cons(Beer  BMI) Set of   p, Base  Set of Beer(  )  .09,  50 BMI(  ) Each   p, Base  : Is there Beer(  )  .09,  50 BMI(  ) such that is correct ? Filter out   p, Base  +

4ft-Synt 16 4ft-Miner Cons(C  D) Set of   p, Base  Set of C(  )  .09,  50 D(  ) Is there enough   p, Base  and C(  )  .09,  50 D(  ) such that Consider C  D as a candidate of new knowledge + is correct ?

Conclusions 17 o Rich association rules,, o Criteria of correctness for deduction rules o Formal language for domain knowledge Beer  BMI, … o Atomic consequences Beer(low)  p, Base BMI(low), …, Beer(  )  p, Base BMI(  ) o Conversion Beer  BMI     via o Partially implemented

Thank you 18

19 Lower critical implication for 0 < p  1, 0 <  < 0.5 : Examples of 4ft-quantifiers – statistical hypothesis tests The rule   ! p;   corresponds to the statistical test (on the level  ) of the null hypothesis H 0 : P(  |  )  p against the alternative one H 1 : P(  |  ) > p. Here P(  |  ) is the conditional probability of the validity of  under the condition . Fisher’s quantifier for 0 <  < 0.5 : The rule   ,Base  corresponds to the statistical test (on the level  of the null hypothesis of independence of  and  against the alternative one of the positive dependence.

20 Founded implication: M   ab  cd Double founded implication: Founded equivalence: Above Average: „Classical“: 4ft-Miner, important simple 4ft-quantifiers

The generalized quantifier  is associational if it satisfies: If  ( a, b, c, d ) = 1 and a’  a  b’  b  c’  c  d’  d then also  ( a’, b’, c, d ) = 1 Examples: Associational and implicational quantifiers 21 The generalized quantifier  is implicational if it satisfies: If  ( a, b, c, d ) = 1 and a’  a  b’  b then also  ( a’, b’, c, d ) = 1 Examples:

where is implicational is sound if there is a such that Despecifying-dereducing deduction rule SpRd 22 An example: despecifies to dereduces to instead of despecifies to and dereduces to

23 The 4ft quantifier  is implicational if it satisfies: If  ( a,b,c,d ) = 1 and a’  a  b’  b then also  ( a’,b’,c,d ) = 1 Deduction rules and implicational quantifiers (1) o  is a-dependent if there a, a’, b, c, d such that  ( a,b,c,d )   ( a’,b,c,d ), o b-dependent, …. o If  is implicational then  ( a,b,c,d ) =  ( a,b,c’,d’ ) for all c’, c’, d, d’ o If  * is implicational then we use only  *( a,b ) instead of  *( a,b,c,d ) TPC  = a’  a  b’  b is True Preservation Condition for implicational quantifiers

24 Theorem: If  * is interesting implicational 4ft-quantifier and R = is a deduction rule then there are propositional formulas  1A,  1B,  2 derived from , ,  ’,  ’ such that R is sound iff at least one of the conditions i), ii) is satisfied: i) both  1A and  1B are tautologies ii)  2 is a tautology Deduction rules and implicational quantifiers (2) and are examples of interesting implicational 4ft - quantifiers Definition: The implicational 4ft-quantifier  * is interesting implicational if   * is both a-dependent and b-dependent   * (0,0) = 0

Class of 4ft quantifiersTruth Preservation Conditioncriterion for implicational a’  a  b’  b known double implicational a’  a  b’  b  c’  c  - double implicationala’  a  b’+ c’  b + c known equivalency (associational ) a’  a  b’  b  c’  c  d’  d  - equivalencya’ + d’  a + d  b’ + c’  b + c known with F-property if  (a,b,c,d) = 1 and b  c – 1  0 then  (a,b+1,c-1,d) = 1 if  (a,b,c,d) = 1 and c  b – 1  0 then  (a,b -1,c+1,d) = 1 known Overview of classes of 4ft-quantifiers Additional results: o Dealing with missing information o Tables of critical frequencies o Definability in classical predicate calculi o Interesting subclasses 25

Association rules and the ASSOC procedure (1) 26 { A, B }  { E, F }

Association rules and the ASSOC procedure (2) 27 { A, B }  { E, F } Conf ( { A, B }  { E, F } ) = Supp ( { A, B }  { E, F } ) = E  F  (E  F) A  B ab  (A  B) cd

GUHA and association rules 28 History: The concept of association rules was popularised particularly due to the 1993 article of Agrawal [2], which has acquired more than 6000 citations according to Google Scholar, as of March 2008, and is thus one of the most cited papers in the Data Mining field. [2] However, it is possible that what is now called "association rules" is simliar to what appears in the 1966 paper [7] on GUHA, a general data mining method developed by Petr Hájek et al. [8]. [7]Petr Hájek [8]