Presentation is loading. Please wait.

Presentation is loading. Please wait.

So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.

Similar presentations


Presentation on theme: "So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and."— Presentation transcript:

1 So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and (ii) give us exactly the same class of languages. Languages serve two purposes in computing: (a) communicating instructions or information (b) defining valid communications What about languages outwith this class?

2 Specifying Non-Regular Languages We have already seen a number of languages that are not regular. In particular, {a n b n : n ≥ 0} the language of matched round brackets arithmetic expressions standard programming languages are not regular. However, these languages are all systematic constructions, and can be clearly and explicitly defined. Consider L = {a n b n : n ≥ 0}: (i)  L (ii) if x  L, then axb  L (iii) nothing else is in L This is a clear and concise specification of L. Can we use it to generate members of L?

3 Generating Languages Using the previous definition of L, and the notion of string substitution, we can give a generative definition of L. Let X be a new symbol. 1) X -> 2) X -> aXb This definition says that if we have a symbol X, we can replace it by the empty string, or by aXb. We now define L to be all strings over {a,b} formed by starting with X and applying rules 1) and 2) until we get a string with no X's. Example: X => aXb => aaXbb => aabb X => X => aXb => aaXbb => aaaXbbb => aaabbb

4 Grammar Formalising the previous notion of a generative definition based on string substitution, we get: A grammar is a 4-tuple, G = (N, T, S, P), where N is a finite alphabet called the non-terminals; T is a finite alphabet, called the terminals; N  T =  ; S  N is the start symbol; and P is a finite set of productions of the form , where  (N  T) +,  has at least one member from N, and   (N  T)* Thus the previous example is a grammar where N = {X} T = {a, b} S = X P = { X ->, X -> aXb} so G = ({X}, {a,b}, {X}, {X ->, X -> aXb})

5 Definitions and Notation Let G = (N,T,S,P) be a grammar. If s, t, x, y, u and v are strings s.t. s = xuy, t = xvy, and (u -> v )  P then s directly derives t., written s => t. If there is a sequence of strings s 0, s 1,..., s n s.t. s 0 => s 1 =>... => s n-1 => s n, then s 0 derives s n, written s 0 =>* s n. A sentential form of G is a string w  (N  T)* s.t. S =>* w. A sentence of G is a sentential form w  T* i.e. one with no non-terminals. The language defined by G is the set of all sentences of G, denoted L(G). aaaSbbb => aaaaSbbbb. S =>* aaaabbbb. aaaSbbb is a sentential form of G aaaabbbb is a sentence of G. L(G) = {, ab, aabb, aaabbb,...}, which is {a n b n : n ≥ 0}

6 Definitions and Notation (cont.) Notation: we normally order the set of productions, and assign them numbers. If x => y by using rule number i, then we write x => i y  ->  1 |  2 |  3... |  n is shorthand for  ->  1  ->  2 :  ->  n In general, non-terminals will be uppercase, while terminals will be lowercase. A context-free grammar (CFG) is one in which all productions are of the form  -> , where   N - i.e. the left-hand side is a single non-terminal. A context-free language (CFL) is one that can be defined by a context-free grammar.

7 Context-Free Grammars A CFG is called context-free because the left-hand side of all productions contain only single symbols, and so a production can be applied to a symbol without needing to consider the symbol's context. We only consider context-free grammars in this course. Some languages are not context-free. Example: {a n b n c n : n ≥ 0} Some languages cannot be defined by any grammar. It is believed that these are the same languages that cannot be defined by any algorithm or effective procedure.

8 Example CFG G = ({S}, {a, +, *, (, )}, S, { S -> S+S | S*S | (S) | a} )

9 Example CFG G = ({S}, {a, +, *, (, )}, S, { S -> S+S | S*S | (S) | a} ) This is a grammar of algebraic expressions. The productions are: 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a. Example derivation: S => S * S => a * S => a * (S) => a * (S + S) => a * (a + S) => a * (a + a). Note that there are many other ways of deriving the same string.

10 Why Grammar? In English, the grammar is the set of conventions defining the structure of sentences - e.g. a sentence must have a subject and an object verbs must agree with nouns e.g. "John walks" & "John and Mary walk" adjectives come before nouns e.g. "the red car" and not "the car red" We have shown a formalisation of this notion. We now can write explicit clear statements of what sentences are in a language. Grammars can be used in the processing of natural language by computer (4th year option), in formalising design, in pattern recognition, and many other areas.

11 A grammar for a small part of English S -> NP VP NP -> Det NP1 | PN NP1 -> Adj NP1| N Det -> a | the PN -> peter | paul | mary Adj -> large | black N -> dog | cat | horse VP -> V NP V -> is | likes | hates Can you derive: peter is a large black cat

12 A grammar for a small part of English S -> NP VP NP -> Det NP1 | PN NP1 -> Adj NP1| N Det -> a | the PN -> peter | paul | mary Adj -> large | black N -> dog | cat | horse VP -> V NP V -> is | likes | hates Example derivations: S => NP VP => PN VP => mary VP => mary V NP => mary hates NP => mary hates Det NP1 => mary hates the NP1 => mary hates the N => mary hates the dog S => NP VP => NP V NP => NP V Det NP1 => NP V a NP1 => NP V a Adj NP1 => NP is a Adj NP1 => NP is a Adj Adj NP1 => NP is a large Adj NP1 => NP is a large Adj N => NP is a large black N => NP is a large black cat => PN is a large black cat => peter is a large black cat

13 Regular Grammars A grammar is regular if each production is of the form: (i) A -> t or (ii) A -> tB (iii) A -> where A, B  N, t  T. Example: S -> aA | bB A -> aS | a B -> bS | b Is this s sentence of the language? aaaabb

14 Regular Grammars A grammar is regular if each production is of the form: (i) A -> t or (ii) A -> tB (iii) A -> where A, B  N, t  T. Example: S -> aA | bB A -> aS | a B -> bS | b S => aA => aaS => aaaA => aaaaS => aaaabB => aaaabb

15 Regular Grammars A grammar is regular if each production is of the form: (i) A -> t or (ii) A -> tB (iii) A -> where A, B  N, t  T. Example: S -> aA | bB A -> aS | a B -> bS | b S => aA => aaS => aaaA => aaaaS => aaaabB => aaaabb The language generated by this grammar is the language denoted by …..

16 Regular Grammars A grammar is regular if each production is of the form: (i) A -> t or (ii) A -> tB (iii) A -> where A, B  N, t  T. Example: S -> aA | bB A -> aS | a B -> bS | b S => aA => aaS => aaaA => aaaaS => aaaabB => aaaabb The language generated by this grammar is the language denoted by (aa + bb) +

17 Regular Grammars and Regular Languages Thus we now have three different definitions of the one class of languages:  regular expressions  finite state automata  regular grammars Theorem: (stated here without proof) A language is regular iff it can be defined by a regular grammar. All three are useful in Computing Science

18 Example CFG (2) 1) S -> XaaX 2) X -> aX 3) X -> bX 4) X -> S => XaaX => bXaaX => baXaaX => babXaaX => babaaX => babaaaX => babaaabX => babaaab This grammar defines the language: ……… 21 3 3 3 2 4 4

19 Example CFG (2) 1) S -> XaaX 2) X -> aX 3) X -> bX 4) X -> S => XaaX => bXaaX => baXaaX => babXaaX => babaaX => babaaaX => babaaabX => babaaab This grammar defines the language (a + b)*aa(a + b)* 21 3 3 3 2 4 4

20 ...as a Regular Grammar 1) S -> aS 2) S -> bS 3) S -> aM 4) M -> aB 5) B -> aB 6) B -> bB 7) B -> S => bS => baS => babS => babaM => babaaB => babaaaB => babaaabB => babaaab S => bS => baM => baaB => baa 2 2 2 1 3 34 4 5 6 7 7

21 Backus-Naur Form A notation devised for defining the language Algol 60. PASCAL syntax rules are often presented in this form. Example: ::= ::= real | integer | boolean ::= identifier | identifier This formalism is equivalent to CFG's, where names enclosed in are non-terminals, names in bold are terminals, and ::= is the same as the -> notation.

22 Constructing Grammars Suppose we wanted to construct a grammar for the language of all strings of the form accc...cb or abab...abcc....cabab...ab n times We need to find rules to create: (i) sequences of strings - ccc....c (ii) bracketed strigs - accc...cb, and (iii) nested strings - abab...ab abab...ab Sequencing A -> aA | or A -> Aa |  e.g. A => aA => aaA =>... => aaaaaA => aaaaa Bracketing A -> aBb or A -> Bb B ->  xB  B -> ax | Bx e.g. A => aBb => axBb => axxBb =>... => axxxxxb

23 S -> abSab | abBab B -> cB | c What language does this generate? (Say it precisely) Constructing Grammars (cont.) Nesting A -> aAb | B B -> xB | e.g. A => aAb => aaAbb => aaaAbbb =>... => aaaaaAbbbbb => aaaaaBbbbbb =>... => aaaaaxxxBbbbbb => aaaaaxxxbbbbb Example:

24 S -> abSab | abBab B -> cB | c What language does this generate? The language (ab) n +c m +(ab) n (where n>0 and m>0) Constructing Grammars (cont.) Nesting A -> aAb | B B -> xB | e.g. A => aAb => aaAbb => aaaAbbb =>... => aaaaaAbbbbb => aaaaaBbbbbb =>... => aaaaaxxxBbbbbb => aaaaaxxxbbbbb Example:

25 S -> abSab | abBab B -> cB | c Example derivations: S => abBab => abcBab =>... abccccab S => abSab => ababSabab =>abababSababab => abababBababab => abababcBababab =>... => abababccccababab Constructing Grammars (cont.) Nesting A -> aAb | B B -> xB | e.g. A => aAb => aaAbb => aaaAbbb =>... => aaaaaAbbbbb => aaaaaBbbbbb =>... => aaaaaxxxBbbbbb => aaaaaxxxbbbbb Example:

26


Download ppt "So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and."

Similar presentations


Ads by Google