Coproduct Transformations on Lattices of Closed Partial Orders Gemma Casas-Garriga MOISES meeting MOISES meeting, Valladolid, Sept 2004 LARCA
Data Description D = {s 1, …, s N } where each s i is a sequence. A sequence is an ordered list of sets of items: For example, We consider a set of sequences D to be analyzed Id
Basic Definitions An episode in D is an acyclic directed graph, indicating a partial order between items BC A D The support of a poset in D is the number of input sequences that are compatible with it. BC A D it is compatible with second and third input sequences
Problem Formulation Goal: to identify posets and their support (alternatively, whose support is over a minimum user-specified threshold) Problem: many redundant partial orders For ex. both P and P’ are compatible with the same input sequences, but P is more “informative” than P’. BC A D P BCA P’ P’ P
Problem Formulation If P’ P we say that P is more specific than P’. Specificity relation is different from classical inclusion of episodes. Goal redefined: to identify the most specific partial orders among those occurring in the same input seqs ( alternatively, with support over a minimum threshold). BC A D BCAD ||,,,
Example BC A D BCDA, || BC D A ABCD BCAD BCDA 1,2,3 2,3 1, Input seqs where the poset is the most specific.
Motivation Ordering relationships are useful in many domains: web mining, monitoring of processes, e-comerce... The most specific episodes give a general view of D, summarizing all the input sequences without redundancies.
Addressing the Problem Observation: Identifying such structures directly from the data is a complex task (specificity relation is difficult to calculate). Our proposal: – Constructing hybrid episodes out of their maximal paths. – That is, finding those subsequences in D that will identify maximal paths of the final desired episodes. BC A D Two max paths:
Our Proposal BC A D BCDA, || B C D A ABCD BCAD BCDA Set of all seqs. identifying max. paths: What are these sequences?
Result 1 Theorem: sequences identifying maximal paths of the most specific posets are a particular case of so-called stable sequences Stable sequences are maximal among those having the same number of occurences (support) in D. {s | s’ s.t s s’ and support (s) = support (s’)} is stable is not stable because it is contained in that has the same support. Many algorithms for minig stable seqs: CloSpan, BIDE, TSP...
How to construct posets out of Stable Sequences? BC A D BCDA, || B C D A ABCD BCAD BCDA Stable Sequences Some stable sequences may identify maximal paths of different partial orders.
Result 2 We characterize a closure operator working on sets of sequences. A closure operator satisfies the three basic closure axioms: Monotonicity, Extensivity, and Idempotency. Broadly: Given any set of sequences S, (S) returns the set of maximal sequences that are present in the same input sequences where S is contained ( { } ) = {, }
A set of sequences S will be closed if it coincides with its closure : (S) = S(S) = S Result 2 Lemma: individually, sequences in a closed set S, are stable sequences. ( {, } ) = {, } Both and are stable sequences
Theorem: closed sets of sequences identify the maximal paths of the same partial order. Result 2 {, } BC A D BCDA, || B C D A {, } Closed set of sequencesPartial Orders 3 1 2
Lattice of Closed Sets of Sequences { } {, } 1,2,3 2,3 1, BC A D BCDA, || B C D A
Lattice of Closed Partial Orders 1,2,3 CDAB BC A D BCDA, || B C D A BCDACADB 2,3 1, Moreover, these posets are closed.
Formalization A directed graph is modeled as G=(V,E,l) where V is the set of vertices; E VxV is the set of edges; and l is an injective labelling function. A poset is an acyclic directed graph, such that the relation on V stablished by edges E is reflexive, antisymmetric and transitive. A graph morphism between two graphs G=(V,E,l) and G’=(V’,E’,l’) consists of an injective function h:V V’ that preserves labels and (u,v) E implies (h(u),h(v)) E’. B C D A BCDA, ||
Result 3 Coproduct of a family of graphs: G1 Gn GG’ BC AC B C A Example:
Result 3 {, } B C D A Coproduct of: BCD AD B C D A Theorem: A lattice of stable sequences can be transformed into a lattice of closed posets by rewriting each node via coproduct transformations.
Conclusions We identify partial orders in sequential data by: – Mining stable sequences and their support (CloSpan, BIDE …). – Grouping stable sequences in closed sets of sequences, according to operator . – Getting final episodes from those agrupations. This transformation represents an important algorithmic simplification. Formally, in case of not having repetition of items, this transformation can be expressed as coproduct transformations.