Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS-71021 Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.

Slides:



Advertisements
Similar presentations
Chapter 4 An Introduction to Finite Automata
Advertisements

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Finite-State Machines with No Output Ying Lu
4b Lexical analysis Finite Automata
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions and DFAs COP 3402 (Summer 2014)
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Jan. 2013Dr. Yangjun Chen ACS Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
NFA ε - NFA - DFA equivalence. What is an NFA An NFA is an automaton that its states might have none, one or more outgoing arrows under a specific symbol.
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Basics of automata theory
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Copyright © Curt Hill Finite State Automata Again This Time No Output.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CS 3813: Introduction to Formal Languages and Automata Chapter 2 Deterministic finite automata These class notes are based on material from our textbook,
Lecture # 12. Nondeterministic Finite Automaton (NFA) Definition: An NFA is a TG with a unique start state and a property of having single letter as label.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
CS 203: Introduction to Formal Languages and Automata
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 04: Theory of Automata:08 Transition Graphs.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
Lecture 09: Theory of Automata:2014 Asif NawazUIIT, PMAS-Arid Agriclture University Rawalpindi. Kleene’s Theorem and NFA.
L ECTURE 3 T HEORY OF AUTOMATA. E QUIVALENT R EGULAR E XPRESSIONS Definition Two regular expressions are said to be equivalent if they generate the same.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
Kleene’s Theorem and NFA
Lexical analysis Finite Automata
Non Deterministic Automata
PROPERTIES OF REGULAR LANGUAGES
Chapter 7 PUSHDOWN AUTOMATA.
Two issues in lexical analysis
Chapter 2 FINITE AUTOMATA.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Non-Deterministic Finite Automata
Intro to Data Structures
Introduction to Finite Automata
Finite Automata.
4b Lexical analysis Finite Automata
Lexical Analysis Uses formalism of Regular Languages
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Prepared by- Patel priya( ) Guided by – Prof. Archana Singh Gandhinagar Institute of Technology SUBJECT - CD ( ) Introcution to Regular.
Presentation transcript:

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple paths An query evaluation algorithm

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Example. Let G be a graph describing a hypertext document: Nodes – chucks of text Edges – links (cross-references). Readers read the document by following links. Query: is there a way to get from Section 3.1 to Section 5.2 and then to the conclusion? Sec3.1Sec5.2 Conc link +

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Basic definitions We model a graph database as a labeled directed graph G = (V, E, ,  ), where V is a set of nodes, E is a set of edges,  is a set of symbols, called the alphabet, and  is an edge labeling function mapping E to . A regular path expression is defined by the following grammar:  := ø |  | a | - | (  1 +  2 ) | (  1  2 ) |  *, where , ,  1, and  2 denote regular path expressions, a denotes a constant in , “-” denotes a wildcard matching any constant in , ø and  denote the empty set and the empty string, respectively.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Basic definitions The language L(  ) denoted by  is defined as follows. L(  ) = {  }. L(ø) = ø. L(a) = {a}, for a  . L(  1 +  2 ) = L(  1 )  L(  2 ) = {w | w  L(  1 ) or w  L(  2 ) L(  1  2 ) = L(  1 )L(  2 ) = {w 1 w 2 | w 1  L(  1 ) and w 2  L(  2 )}. L(  * ) =, where L 0 (  ) = {  } and L i (  ) = L i-1 (  )L(  ). Regular expressions  1 and  2 are equivalent, written  1   2, if L(  1 ) = L(  2 ). The length of regular expression , denoted |  |, is the number of symbols appearing in .

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Basic definitions A nondeterministic finite automaton (NDFA) M is a 5-tuple (S, , , s 0, F), where 1.S is the finite set of states of the control. 2.  is the alphabet from which input symbols are chosen. 3.  is the state transition function which maps S  (   {  }) to the set of subsets of S. 4.s 0 in S is the initial state of the finite control. 5.F  S is the set of finite (or accepting) states. Associated with an NDFA is a directed graph, in which each node stands for a state in the NDFA and each edge (s, s’) labeled with a symbol a in  for a state transit  (s, a) which contains s’. The extended transition function  * is defined as follows. Let s and t be two states in S. For a  , and w   *,  * (s,  ) = {s}, and  * (s, wa) =  t  *(s,w)  (t, a).

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Basic definitions An NDFA M = (S, , , s 0, F) accepts w   * if  * (s 0, w)  F  ø. The language L(M) accepted by M is the set of all strings accepted by M. A deterministic finite automaton (DFA) is a nondeterministic finite automaton (S, I, , s 0, F) with the following conditions satisfied: 1.  (s,  ) =  for all s  S, and 2.Each state has 1 or 0 successor.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Simple paths Let  be a finite alphabet disjoint from { , , (, )}. A regular expression R over  and the language L(R) denoted by R are defined in the usual way. Let G = (V, E, ,  ) be a db-graph and p = (v 1, e 1, …, e n-1, v n ), where v i  N, 1  i  n, and e j  E, 1  j  n, be a path in G. We say p is a simple path if all the v i ’s are distinct for 1  i  n. We call the string  (e 1 ) …  (e n-1 ) the path label of p, denoted by  (p)   *. Let R be a regular expression over . We say that the path p satisfies R if  (p)  L(R). The query Q R on db-graph G, denoted by Q R (R), is defined as the set of pairs (x, y) such that there is a simple path from x to y in G which satisfies R. If (x, y)  Q R (R), then (x, y) satisfies Q R.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Regular simple path Problem Instance: db-graph G = (V, E, ,  ), nodes x, y  N, regular expression R over . Question: Does G contain a directed simple path p = (v 1, e 1, …, e n-1, v n ) from x to y such that p satisfies R, that is,  (e 1 ) …  (e n-1 ) =  (p)  L(R)?.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Naïve method A naïve method for evaluating a query Q R on a db-graph G is to traverse every simple path satisfying R in G exactly once. The penalty for this is that such an algorithm takes exponential time when G has an exponential number of simple paths.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Intersection graph Let M 1 = (S 1, ,  1, p 0, F 1 ) and M 2 = (S 2, ,  2, q 0, F 2 ) be NDFAs. The NDFA for M 1  M 2 is I = (S 1  S 2, , , (p 0, q 0 ), F 1  F 2 ), where, for a  , (p 1, q 1 )   ((p 2, q 2 ), a) if and only if p 2   1 (p 1, a) and q 2   2 (q 1, a). We call the transition graph of I the intersection graph of M 1 and M 2. Regular path Problem Instance: db-graph G = (V, E, ,  ), nodes x, y  V, regular expression R over . Question: Does G contain a directed path (not necessarily simple) p = (v 1, e 1, …, e n-1, v n ) from x to y such that p satisfies R, that is, (e 1 ) …  (e n-1 ) =  (p)  L(R)?

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Regular path Problem can be decided in polynomial time We view the db-graph G = (V, E, ,  ) as an NDFA with initial state x and final state y. Construct the intersection graph I of G and M = (S, , , s 0, F), an NDFA accepting L(R). There is a path from x to y satisfying R if and only if there is path in I from (x, s 0 ) to (y, s f ) for some s f  F. All this can be done in polynomial time.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Algorithm A View the db-graph G = (V, E, ,  ) with nodes x, y  V, we view G as an NDFA with initial state x and final state y. regular expression R over . Question: Does G contain a directed path (not necessarily simple) p = (v 1, e 1, …, e n-1, v n ) from x to y such that p satisfies R, that is, (e 1 ) …  (e n-1 ) =  (p)  L(R)?

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Algorithm A (e 1 ) …  (e n-1 ) =  (p)  L(R)? 1.Traverse simple paths in G, using a DFA M accepting L(R) to control the search by marking nodes as they are visited. 2.Record with which state of M a node is visited. (We allow a node to be visited with different states.) 3.A node with the same state cannot be visited more than once.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Incompleteness Using the above algorithm, we may fail to find all the answers. Example Consider a query Q R, where R = aaa. 0 a 12 3 aa A CBD a a a a a a M:M: G:G:

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Assume that we start traversal from node A in G, and follow the path To B, C and D. Node A, B, C and D are marked with 0, 1, 2 and 3, respectively, and the answer (A, D) is found, since 3 is a final state. If we backtrack to node C, we cannot mark B with state 3 because (A, B, C, B) is a non-simple path. So we backtrack to A, and visit D in state 1. However, if we have retained markings, we cannot visit node C as it is already marked with state 2. Consequently, the answer (A, B) is not found.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Suffix language Definition Given an NDFA M = (S, , , s 0, F), for each pair of states s, t  S, we define the language from s to t, denoted by L st, as the set of strings that take M from state s to state t. In particular, for a state s  S, the suffix language of s, denoted by L sF (or [s]), is the set of strings that M from s to some final state. Clearly, [s 0 ] = L(M). Similar definitions apply for a DFA.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Suffix language Definition Let I be the intersection graph of a db-graph G and a DFA M = (S, , , s 0, F) accepting L(R). Assume that for nodes u and v in G and states s, t  S, there are paths p from (u, s 0 ) to (v, s) and q from (v, s) to (v, t) in I (that is, there is a cycle at v in G that satisfies L st ), such that no first component of a node p or q repeats except for the endpoints of q. In other words, p and q correspond to a simple path and a simple cycle, respectively, in G. If [t]  [s], then we say there is a conflict between s and t at v. If there are no conflicts in I, then I is said to be conflict-free, as are G and R.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Example Consider the following M and G. 0 a 12 3 aa 0 M:M: A CBD a a a a a a G:G: Recall that, if markings were retained, the answer (A, B) would not be found. However, there is a conflict. This is because node B in G can be marked with state 1 and there is a cycle at B which satisfies L 13, but [3]  [1].

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Algorithm B (e 1 ) …  (e n-1 ) =  (p)  L(R)? 1.Traverse simple paths in G, using a DFA M accepting L(R) to control the search by marking nodes as they are visited. 2.Record with which state of M a node is visited. (We allow a node to be visited with different states.) 3.If no conflects are detected, the algorithm retains markings, while while whenever a conflict arises, it unmarks nodes so that no answers are lost. 4.A node with the same state cannot be visited more than once.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Algorithm B Input: db-graph G = (V, E, ,  ), query Q R. Output: Q R (G), the value of Q R on G. 1.Construct a DFA M = (S, , , s 0, F) accepting L(R). 2.Initialize Q R (G) to . 3.For each node v  V, set CM[v] to null and PM[v] to . 4.Test [s]  [t] for each pair of states s and t. 5.For each node v  V, (a) call search-G(v, v, s 0, conflict) (b) reset PM[w] to  for any marked node w  V. Two types of markings: CM[v] – used to indicate that v is already on the stack PM[v] – a set of states, recording earlier markings of v, excluding the current path.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS procedure search-G(u, v, s, var conflict) 6.conflict  false 7.CM [v]  s 8.if s  F then Q R (G)  Q R (G)  {(u, v)} 9.for each edge in G from v to w with label a do 10.if  (s, a) = t and t  PM[w] then 11.if CM[w] = q then conflict  ([t]  [q]) 12.else /* CM[w] is null*/ 13.search-G(u, w, t, new-conflict) 14.conflict  conflict or new-conflict 15.CM[w]  null 16.if not conflict then PM[w]  PM[w]  {s}

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Example Let R = a((bc +  )d + ec)) be the regular expression for query Q R. A DFA M accepting L(R) and a db-graph G are shown below. 0 a M:M: A CBD a a b,eb,e c c b,eb,e G:G: b c 35 e d d c E d 4 Assume that we start by marking node A with state 0, after which we proceed to mark B with 1, C with 2, and B with 3. Since no edge labeled d leaves B, we backtrack to C and attempt to visit B in state 3.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Although B has already a current marking (CM[B] = 1), this is not a conflict since [1]  [3]. The algorithm now backtrack to B in state 1 and marks E with state 4. After backtracking again to B in state 1, the markings are given as in the above figure. Next, the algorithm marks C with state 5 and D with 4. On backtracking to C and attempting to mark B with 4, a conflict is detected since [4]  [1]. So on backtracking to A, the markings 5 and 1 will be removed from C and B, respectively.

Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Now D is marked with 1, but since C has a previous marking of 2, that marking will not be repeated. So C is marked with 5 (along another transit labeled with e in M. 5 was previously removed.) After this, B can be marked with 4. When the algorithm backtracks to C and attempt to visit D, it discovers that D was previously marked with 4, so no conflict is registered. The marking are now given as shown below. 0 A CBD a a b,eb,e c (3), (4), 1 (2), 5 (4) G:G: E d