Download presentation
Presentation is loading. Please wait.
1
Lattice Representation of Data Dr. Alex Pogel Physical Science Laboratory New Mexico State University
2
Basic Idea Replace tabular representation by lattice representation in order to reveal hierarchical structure 1.Basic definitions 2.Information in the lattice 3.Carving up epidemiological data Ganter & Wille: Formal Concept Analysis (FCA) Barwise & Seligman: Information Flow
3
Input data Base data structure is a {0,1}-table A set G of objects (represented by rows) and A set M of attributes (represented by columns) an entry of 1 indicates object g has attribute m G M {
4
Input data, mathematically Mathematically speaking: a binary relation I from G to M, a subset of G x M interpreted as an indication of which objects g have which attributes m Via (g,m) I
5
Key Definitions The notion of “formal concept” is based on natural mappings that arise from the binary relation I [interpret G and M as before]: to each subset H of G, we associate the set a(A) of all attributes the objects in H satisfy in common a: P(G) P(M) to each subset N of M, we associate the set o(N) of all objects satisfying every attribute in N o: P(M) P(G)
6
Key Definitions The attribute subsets N of M such that a(o(N)) = N are called formal concepts in FCA And are called closed sets in mathematics, as a(o(–)) is a closure operator on M A formal concept can be identified geometrically within a data table by reshuffling rows and columns such that 1.object-attribute relations are maintained and 2.a maximal rectangle of 1s appears.
7
Animal Context
8
Shuffling Reveals a Concept
9
BIRD is the (formal) concept
10
Closure System Arises Taking all closed sets together we obtain a closure system [aka a topped intersection structure, in Davey-Priestley] which is always a complete lattice [an ordered set for which every subset has both a supremum and infimum in the set] Examples: R with <=, P(S) with inclusion, any topology with inclusion,…
11
Focus on attribute logic
12
Full list: difficult, redundant all implications that hold for the data, with up to three attributes in their premise; 125 with positive support
13
Duquenne-Guigues Basis 20 implications generate the full list, and serve as a basis (analogy with linear algebra); ordered by support value
14
Full list, basis, and original data
15
Implication Reads Upwards at top right: warm-blooded implies airbreather 1 st in basis: high support indicated in lime green
16
A Subinterval of the lattice fourlegged implies airbreather pet implies warm-blooded (iguana?) and fur implies fourlegged and warm-blooded (platypus?)
17
Original data preserved animals 26 and 27 share the attributes “lives in water”, “is warm-blooded” and “is an airbreather”
18
Original data preserved animals 26 and 27 share the attributes “lives in water”, “is warm-blooded” and “is an airbreather”
19
Color-coded support the similarity in color between “livestock” and the concept node below it yields the association rule livestock implies fur with 79% confidence And 11% support (bottom)
20
Visual Vocabulary Small subdiagrams (Specifically meet-subsemilattices) can be recognized as complex sentences
21
3 unordered attribute concepts cb a Note: the top element is really irrelevant, but adding it makes everything we’ll look at a lattice instead of just a meet semilattice (definition: an ordered structure closed under finite meet (glb))
22
Here’s the best known outcome cb a No non-trivial implications
23
W over V: a & c b cb a
24
Diamond in diamond c b a Under condition c, a and b are equivalent
25
Convergence c b a any two imply the third
26
Two Complex Sentences So, we can read that For nocturnal animals and pets, the attributes fourlegged and warm- blooded are equivalent, and the only implication between the attributes “nocturnal,” “fur” and “pet” is pet and nocturnal implies fur.
27
The Hague, Netherlands
28
Before Freese improvement
29
After Freese improvement
30
Apparent Splits
31
Eliminating Light Smokers
32
Why no object names?
33
Lung Cancer and Smoking nearly half of these 30+ year smokers have lung cancer
34
Bird-keeping and Smoking Association rules involving bird-keeping and smoking
35
Limitations as KDD Process Needs attention given to data preparation Need more built-in verification of discovered rules No domain-specific constructions (advantage ?) Does not scale without clustering (universal ?)
36
Epidemiological functions Plan to add odds ratio calculation, via click Lung Cancer No Lung Cancer BirdKeep Yes 3334 BirdKeep No 1664 OR = 3.9
37
Clustering for too large lattices
38
Support for improvement Traditional diagram improvement algorithms are based solely upon the order structure We are now moving towards the inclusion of support values in these algorithms I will talk about this topic in detail in July, here at DIMACS, as part of the Applications of Lattice Theory workshop END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.