Lattice Representation of Data Dr. Alex Pogel Physical Science Laboratory New Mexico State University.

Slides:



Advertisements
Similar presentations
Finding Gold In The Forest …A Connection Between Fractal Trees, Topology, and The Golden Ratio.
Advertisements

Introduction The concept of transform appears often in the literature of image processing and data compression. Indeed a suitable discrete representation.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Linear Algebra Applications in Matlab ME 303. Special Characters and Matlab Functions.
Methods of Proof Chapter 7, Part II. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound) generation.
Copyright © Cengage Learning. All rights reserved.
Knowledge Acquisition and Modelling Concept Mapping.
Systems Analysis and Design 9th Edition
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Investigating JAVA Classes with Formal Concept Analysis Uri Dekel Based on M.Sc. work at the Israeli Institute of Technology. To appear:
Lecture 2 Dr Richard Reilly Dept. of Electronic & Electrical Engineering Room 153, Engineering Building To insert your company logo on this slide From.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
Normalization I.
Let remember from the previous lesson what is Knowledge representation
Contemporary Logic Design Two-Level Logic © R.H. Katz Transparency No. 4-1 Chapter #2: Two-Level Combinational Logic Section 2.3, Switches and Tools.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
PHY 201 (Blum)1 Karnaugh Maps References: Chapters 4 and 5 in Digital Principles (Tokheim) Chapter 3 in Introduction to Digital Systems (Palmer and Perlman)
Equivalence Class Testing
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 9 Functional Testing
Relations Chapter 9.
Introduction Information in science, business, and mathematics is often organized into rows and columns to form rectangular arrays called “matrices” (plural.
Multivariate Linear Systems and Row Operations.
1 PowerPoint Presentation Design Wednesday, September 02, 2015Ms. Wear Info Tech 9/10.
Grade 2 – Module 8 Module Focus Session
Geometry Grades K-2. Goals:  Build an understanding of the mathematical concepts within the Geometry Domain  Analyze how concepts of Geometry progress.
Rev.S08 MAC 1140 Module 12 Introduction to Sequences, Counting, The Binomial Theorem, and Mathematical Induction.
Sequences Informally, a sequence is a set of elements written in a row. – This concept is represented in CS using one- dimensional arrays The goal of mathematics.
System of Linear Equations Nattee Niparnan. LINEAR EQUATIONS.
© 2012 Common Core, Inc. All rights reserved. commoncore.org NYS COMMON CORE MATHEMATICS CURRICULUM A Story of Ratios Grade 8 – Module 6 Linear Functions.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Boolean Algebra – the ‘Lingua Franca’ of the Digital World The goal of developing an automata is based on the following (loosely described) ‘ideal’: if.
Simplex method (algebraic interpretation)
Topic III The Simplex Method Setting up the Method Tabular Form Chapter(s): 4.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Categories and Computer Science
Karnaugh Maps References:
Lecture 9 Topics: –Combinational circuits Basic concepts Examples of typical combinational circuits –Half-adder –Full-adder –Ripple-Carry adder –Decoder.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Database Concepts. Data :Collection of facts in raw form. Information : Organized and Processed data is information. Database : A Collection of data files.
7 th Workshop on Intelligent and Knowledge Oriented Technologies Smolenice WIKT 2012 Reduction of Computation Times of GOSCL Algorithm Using.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
CS1Q Computer Systems Lecture 7
Relations and their Properties
CSE Winter 2008 Introduction to Program Verification January 31 proofs through simplification.
Chapter 7 Section 7.2 Addition & Subtraction in Different Bases.
CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Machine Learning Concept Learning General-to Specific Ordering
Simplex Method Simplex: a linear-programming algorithm that can solve problems having more than two decision variables. The simplex technique involves.
June 12, 2002© Howard Huang1 Karnaugh maps Last time we saw applications of Boolean logic to circuit design. – The basic Boolean operations are.
Copyright © Cengage Learning. All rights reserved. Line and Angle Relationships 1 1 Chapter.
SECTION 2 BINARY OPERATIONS Definition: A binary operation  on a set S is a function mapping S X S into S. For each (a, b)  S X S, we will denote the.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
1 1.2 Linear Equations in Linear Algebra Row Reduction and Echelon Forms © 2016 Pearson Education, Ltd.
Decision Support Systems INF421 & IS Simplex: a linear-programming algorithm that can solve problems having more than two decision variables.
Boolean Algebra and Computer Logic Mathematical Structures for Computer Science Chapter 7 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Boolean Logic.
© Copyright 2008 STI INNSBRUCK Formal Concept Analysis Intelligent Systems – Lecture 12.
Chapter 2 Sets and Functions.
Zigzag Persistent Homology Survey
Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103 Chapter 2 Sets Slides are adopted from “Discrete.
From now on: Combinatorial Circuits:
Functional Dependencies
Karnaugh maps Last time we saw applications of Boolean logic to circuit design. The basic Boolean operations are AND, OR and NOT. These operations can.
Chapter 3: Multivalued Dependencies
Finite Mathematical Systems
Presentation transcript:

Lattice Representation of Data Dr. Alex Pogel Physical Science Laboratory New Mexico State University

Basic Idea Replace tabular representation by lattice representation in order to reveal hierarchical structure 1.Basic definitions 2.Information in the lattice 3.Carving up epidemiological data Ganter & Wille: Formal Concept Analysis (FCA) Barwise & Seligman: Information Flow

Input data Base data structure is a {0,1}-table  A set G of objects (represented by rows) and  A set M of attributes (represented by columns)  an entry of 1 indicates object g has attribute m G M {

Input data, mathematically Mathematically speaking: a binary relation I from G to M, a subset of G x M interpreted as an indication of which objects g have which attributes m Via (g,m)  I

Key Definitions The notion of “formal concept” is based on natural mappings that arise from the binary relation I [interpret G and M as before]: to each subset H of G, we associate the set a(A) of all attributes the objects in H satisfy in common a: P(G)  P(M) to each subset N of M, we associate the set o(N) of all objects satisfying every attribute in N o: P(M)  P(G)

Key Definitions The attribute subsets N of M such that a(o(N)) = N are called formal concepts in FCA And are called closed sets in mathematics, as a(o(–)) is a closure operator on M A formal concept can be identified geometrically within a data table by reshuffling rows and columns such that 1.object-attribute relations are maintained and 2.a maximal rectangle of 1s appears.

Animal Context

Shuffling Reveals a Concept

BIRD is the (formal) concept

Closure System Arises Taking all closed sets together we obtain a closure system [aka a topped intersection structure, in Davey-Priestley] which is always a complete lattice [an ordered set for which every subset has both a supremum and infimum in the set] Examples: R with <=, P(S) with inclusion, any topology with inclusion,…

Focus on attribute logic

Full list: difficult, redundant all implications that hold for the data, with up to three attributes in their premise; 125 with positive support

Duquenne-Guigues Basis 20 implications generate the full list, and serve as a basis (analogy with linear algebra); ordered by support value

Full list, basis, and original data

Implication Reads Upwards at top right: warm-blooded implies airbreather 1 st in basis: high support indicated in lime green

A Subinterval of the lattice fourlegged implies airbreather pet implies warm-blooded (iguana?) and fur implies fourlegged and warm-blooded (platypus?)

Original data preserved animals 26 and 27 share the attributes “lives in water”, “is warm-blooded” and “is an airbreather”

Original data preserved animals 26 and 27 share the attributes “lives in water”, “is warm-blooded” and “is an airbreather”

Color-coded support the similarity in color between “livestock” and the concept node below it yields the association rule livestock implies fur with 79% confidence And 11% support (bottom)

Visual Vocabulary Small subdiagrams (Specifically meet-subsemilattices) can be recognized as complex sentences

3 unordered attribute concepts cb a Note: the top element is really irrelevant, but adding it makes everything we’ll look at a lattice instead of just a meet semilattice (definition: an ordered structure closed under finite meet (glb))

Here’s the best known outcome cb a No non-trivial implications

W over V: a & c  b cb a

Diamond in diamond c b a Under condition c, a and b are equivalent

Convergence c b a any two imply the third

Two Complex Sentences So, we can read that For nocturnal animals and pets, the attributes fourlegged and warm- blooded are equivalent, and the only implication between the attributes “nocturnal,” “fur” and “pet” is pet and nocturnal implies fur.

The Hague, Netherlands

Before Freese improvement

After Freese improvement

Apparent Splits

Eliminating Light Smokers

Why no object names?

Lung Cancer and Smoking nearly half of these 30+ year smokers have lung cancer

Bird-keeping and Smoking Association rules involving bird-keeping and smoking

Limitations as KDD Process Needs attention given to data preparation Need more built-in verification of discovered rules No domain-specific constructions (advantage ?) Does not scale without clustering (universal ?)

Epidemiological functions Plan to add odds ratio calculation, via click Lung Cancer No Lung Cancer BirdKeep Yes 3334 BirdKeep No 1664 OR = 3.9

Clustering for too large lattices

Support for improvement Traditional diagram improvement algorithms are based solely upon the order structure We are now moving towards the inclusion of support values in these algorithms I will talk about this topic in detail in July, here at DIMACS, as part of the Applications of Lattice Theory workshop END