Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Introduction We have already determined that some computations are impossible –The halting problem was one and there are others What we want are models of computation that give us insight into what is and is not computable Strangely enough, models of computation are closely related to the complexity of languages Copyright © Curt Hill
Languages Every natural language is spoken Usually written as well Such languages are extremely complicated Every language has a syntax and semantics Syntax – the form that the language must have Semantics – the meaning A sentence that violates the syntax may be difficult or impossible to assign meaning Copyright © Curt Hill
Grammar The grammar of a language describes the syntax of a language Since natural languages are extremely complicated, we would expect their grammar’s to also be complicated Perhaps you recall diagramming sentences from high school –This is confirming the syntax of a sentence Copyright © Curt Hill
Natural Languages These are extremely complicated The grammar for a language is a volume of books See the text for some simple examples from English Copyright © Curt Hill
Formal Languages In contrast with natural languages are formal languages –Artificial languages These are typically not designed for person to person communication –Rather person to machine or machine to machine In comparison with natural languages they: –Have very few rules –Very few exceptions to these rules Copyright © Curt Hill
Examples The largest class of these is likely programming languages There are others as well Mathematical notation may be considered a formal language even though it is designed for a form of person to person communication Copyright © Curt Hill
Noam Chomsky Professor emeritus of linguistics at MIT Developed a theory of generative grammars This includes a language hierarchy –AKA Chomsky-Schützenberger Hierarchy Most of the theory of this section was developed by Chomsky Copyright © Curt Hill
Phrase Structure Grammar A grammar, G, is a four tuple: G=(V,T,S,P) V is the alphabet or vocabulary T is a set of terminal elements S is the start symbol or distinguished symbol P is a set of productions –Productions are rewrite rules A grammar should be able to enumerate any legal sentence of the language Copyright © Curt Hill
Formal Grammars Each grammar consists of four things V – a finite set of non-terminals (aka variables) T – a finite set of terminal symbols –Words made up from an alphabet S – the start symbol –Must be an element of V P – a set of productions Copyright © Curt Hill
V A set of elements or symbols We may think about this as the character set –Although that is a little misleading –The alphabet of English is made up of letters, digits and punctuation –But not every combination of letters is a word Perhaps the better way to think about V is as words and stand-alone symbols –There is usually a rule for construction Copyright © Curt Hill
T and N There is a set of terminal symbols, T, as well as a set of non-terminal symbols, N –T is a subset of V Terminals can exist in a legal instance of the language Non-terminals are concepts that need to be instantiated, that is converted into concrete terminals Copyright © Curt Hill
Examples In English any legal word is a terminal A concept like “noun phrase” is a non-terminal –This can be instantiated in a myriad of actual words In C++ the reserved word for or an identifier would be terminals In C++ the concept “if statement” is a non-terminal Copyright © Curt Hill
P A set of productions A production is a rewrite rule Form: – X Y This means that we can rewrite X as Y –Since is hard to type we often use ::= Each production must have at least one non-terminal on the left The complexity of these rules determines the type of language Copyright © Curt Hill
S The start symbol or distinguished symbol This is a non-terminal from which all derivations start In English this is usually something like “sentence” In most programming languages it is something like “program” or “unit” Copyright © Curt Hill
Grammars We should be able to produce two things from a grammar –A generator –A recognizer A generator should produce any legal string in the language A recognizer should determine if a string is legal or not –This process is part of parsing Copyright © Curt Hill
Language Recognizer Automaton that reads in a purported construction in the language It answers yes or no if this is indeed in the language Sometimes a reference recognizer is produced A recognizer is not a compiler –Only purpose is to classify Copyright © Curt Hill
Language generators Generates correct statements or correct programs If given enough time ( ) should generate every correct statement in the language Since it generates random correct statements it has some use in learning the syntax Copyright © Curt Hill
Some Examples Lets consider a simple grammar that generates and bit string G = {V, T, S, P} V = {Z, B, 0, 1} T = {0, 1} S = Z P = {Z B, B BB, B 0, B 1} Terminals are 0 and 1 Non terminals are Z and B Copyright © Curt Hill
Derivations Is the above grammar able to generate all possible bit strings? Let’s consider a few: 1 (start with Z) –Z B (B) –B 1 (1) 10 (start with F) –Z B (B) –B BB (BB) –B 1 (1B) –B 0 (10) Copyright © Curt Hill
One More 010 (start with Z) –Z B (B) –B BB (BB) –B 0 (0B) –B BB (0BB) –B 1 (01B) –B 0 (010) Are you convinced? Copyright © Curt Hill
Definitions A string may be derived from the start symbol if it is a legal construct of the language A string is a direct derivation from another if it needs only one production A string is a derivation from another if it needs one or more production applications Copyright © Curt Hill
A Language Definition: The language of a grammar is the set of all possible strings that may be derived from a grammar –The finished string must only contain non-terminals Copyright © Curt Hill
Other Way Let us now try one where we want a particular language and we have to come up the grammar Lets consider the set of bit strings that start with 00 and end with a sequence of 1s As a regular expression: 00(0|1)*1+ –001, , – , among others Copyright © Curt Hill
The Grammar There is not a single way to do the previous G = (V, T, S, P) T = {0,1} S = S What is P? Copyright © Curt Hill
P Must have at least one production starting with S: –S 00 B 1 B then looks like the bit string of before: –B0–B0 –B1–B1 –B BB What other possibilities could we have? Copyright © Curt Hill
Audience Participation What is the grammar for the bit strings that look like this: 0 h 1 j 0 k where h>0,j>0,k>0 This includes: –010, , among others Copyright © Curt Hill
One Last Thing (or not) Finally lets look at an example programming language A subset of C Copyright © Curt Hill
C Subset as an Example V – set of non-terminals –Statement –Declaration –For-statement T – set of terminals –Reserved words –Punctuation –Identifiers Copyright © Curt Hill
C example again S – Start symbol –Independently compilable part –Program –Function –Constant P – set of productions –Rewrite rules –Start at the start symbol –End at terminals Copyright © Curt Hill
C For Production For-statement for ( expression; expression; expression) statement This contains the terminals: –For ( ; ) Non-terminals –Expression –Statement Copyright © Curt Hill
Productions Again Each non-terminal should have one or more productions that define it –Every non-terminal must have one or more productions Multiple productions usually signify alternation Recursion is allowed Copyright © Curt Hill
Recursion Productions may be recursive Recall for-statement, here is Statement Statement expression ; Statement for-statement ; Statement if-statement ; Statement while-statement ; Statement compound-statement Etc. Copyright © Curt Hill
Exercises 13.1a –1, 5, 13 Copyright © Curt Hill