Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transformational grammars

Similar presentations


Presentation on theme: "Transformational grammars"— Presentation transcript:

1 Transformational grammars
Anastasia Berdnikova & Denis Miretskiy

2 Transformational grammars
Overview Transformational grammars – definition Regular grammars Context-free grammars Context-sensitive grammars Break Stochastic grammars Stochastic context-free grammars for sequence modelling Transformational grammars

3 Why transformational grammars?
The 3-dimensional folding of proteins and nucleic acids Extensive physical interactions between residues Chomsky hierarchy of transformational grammars [Chomsky 1956; 1959] Application to molecular biology [Searls 1992; Dong & Searls 1994; Rosenblueth et al. 1996] Transformational grammars

4 Transformational grammars
Introduction ‘Colourless green ideas sleep furiously’. Chomsky constructed finite formal machines – ‘grammars’. ‘Does the language contain this sentence?’ (intractable)  ‘Can the grammar create this sentence?’ (can be answered). TG are sometimes called generative grammars. Transformational grammars

5 Transformational grammars
Definition TG = ( {symbols}, {rewriting rules α→β - productions} ) {symbols} = {nonterminal} U {terminal} α contains at least one nonterminal, β – terminals and/or nonterminals. S → aS, S → bS, S → e (S → aS | bS | e) Derivation: S=>aS=>abS=>abbS=>abb. Transformational grammars

6 Transformational grammars
The Chomsky hierarchy W – nonterminal, a – terminal, α and γ –strings of nonterminals and/or terminals including the null string, β – the same not including the null string. regular grammars: W → aW or W → a context-free grammars: W → β context-sensitive grammars: α1Wα2 → α1βα2. AB → BA unrestricted (phase structure) grammars: α1Wα2 → γ Transformational grammars

7 Transformational grammars
The Chomsky hierarchy Transformational grammars

8 Transformational grammars
Automata Each grammar has a corresponding abstract computational device – automaton. Grammars: generative models, automata: parsers that accept or reject a given sequence. - automata are often more easy to describe and understand than their equivalent grammars. - automata give a more concrete idea of how we might recognise a sequence using a formal grammar. Transformational grammars

9 Parser abstractions associated with the hierarchy of grammars
Grammar Parsing automaton regular grammars finite state automaton context-free grammars push-down automaton context-sensitive grammars linear bounded automaton unrestricted grammars Turing machine Transformational grammars

10 Transformational grammars
Regular grammars W → aW or W → a sometimes allowed: W → e RG generate sequence from left to right (or right to left: W → Wa or W → a) RG cannot describe long-range correlations between the terminal symbols (‘primary sequence’) Transformational grammars

11 Transformational grammars
An odd regular grammar An example of a regular grammar that generates only strings of as and bs that have an odd number of as: start from S, S → aT | bS, T → aS | bT | e. Transformational grammars

12 Transformational grammars
Finite state automata One symbol at a time from an input string. The symbol may be accepted => the automaton enters a new state. The symbol may not be accepted => the automaton halts and reject the string. If the automaton reaches a final ‘accepting’ state, the input string has been succesfully recognised and parsed by the automaton. {states, state transitions of FSA}{nonterminals, productions of corresponding grammar} Transformational grammars

13 FMR-1 triplet repeat region
Human FMR-1 mRNA sequence, fragment GCG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG AGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CTG . . . g c g c g g c t g S 1 2 3 4 5 6 7 8 ε a c Transformational grammars

14 Moore vs. Mealy machines
Finite automata that accept on transitions are called Mealy machines. Finite automata that accept on states are called Moore machines. (HMM) The two types of machines are interconvertible: S → gW1 in the Mealy machine  S → gŴ1, Ŵ1 → gW1 in the Moore machine. Transformational grammars

15 Deterministic vs. nondeterministic automata
In a deterministic finite automaton, no more than one accepting transition is possible for any state and any input symbol. An example of nondeterministic finite automaton – FMR-1. Parsing with deterministic finite state automaton is extremely efficient [BLAST.] Transformational grammars

16 Transformational grammars
PROSITE patterns RU1A_HUMAN S R S L K M R G Q A F V I F K E V S S A T SXLF_DROME K L T G R P R G V A F V R Y N K R E E A Q ROC_HUMAN V G C S V H K G F A F V Q Y V N E R N A R ELAV_DROME G N D T Q T K G V G F I R F D K R E E A T RNP-1 motif [RK]– G – {EDRKHPCG} – [AGSCI] – [FY] – [LIVA] – x – [FYM]. A PROSITE pattern = pattern element - pattern element pattern element. In a pattern element, a letter indicates the single-letter code for a amino-acid, [] – any one of enclosed residues can occur; {} – anything but one can occur, x – any residue can occur at this position. Transformational grammars

17 A regular grammar for PROSITE patterns
S → rW1 | kW1 W1 → gW2 W2 → [afilmnqstvwy]W3 W3 → [agsci]W4 W4 → fW5 | yW5 W5 → lW6 | iW6 | vW6 | aW6 W6 → [acdefghiklmnpqrstvwy]W7 W7 → f | y | m [ac]W means aW | cW Transformational grammars

18 What a regular grammar can’t do
RG cannot describe language L when: L contains all the strings of the form aa, bb, abba, baab, abaaba, etc. (a palindrome language). L contains all the strings of the form aa, abab, aabaab (a copy language). Transformational grammars

19 Transformational grammars
Regular language: a b a a a b Palindrome language: a a b b a a Copy language: a a b a a b Palindrome and copy languages have correlations between distant positions. Transformational grammars

20 Context-free grammars
The reason: RNA secondary structure is a kind of palindrome language. The context-free grammars (CFG) permit additional rules that allow the grammar to create nested, long-distance pairwise correlations between terminal symbols. S → aSa | bSb | aa | bb S => aSa => aaSaa => aabSbaa => aabaabaa Transformational grammars

21 A context-free grammar for an RNA stem loop
seq 1 seq 2 seq 3 A A C A C A G A G A G A C A G G A A A C U G seq 1 G●C U●A U x C G C U G C A A A G C seq 2 A●U C●G C x U G C U G C A A C U G seq 3 C●G G●C G x G x S → aW1u | cW1g | gW1c | uW1a, W1 → aW2u | cW2g | gW2c | uW2a W2 → aW3u | cW3g | gW3c | uW3a, W3 → gaaa | gcaa Transformational grammars

22 Transformational grammars
Parse trees Root – start nonterminal S, leaves – the terminal symbols in the sequence, internal nodes are nonterminals. The children of an internal node are the productions of it. Any subtree derives a contiguous segment of the sequence. S 5’ ’ S S C ● G G ● C W W A ● U G ● C W W G ● C U ● A W W3 G A G A c a g g a a a с u g g g u g c a a a c c A A C A Transformational grammars

23 Parse tree for a PROSITE pattern
W1 W2 W3 W4 W5 W6 W7 r g q a f v i f Parse tree for the RNP-1 motif RGQAFVIF. Regular grammars are linear special cases of the context-free grammars. Parse tree for a regular grammar is a standard linear alignment of the grammar nonterminals into sequence terminals. Transformational grammars

24 Transformational grammars
Push-down automata The parsing automaton for CFGs is called a push-down automaton. A limited number of symbols are kept in a push-down stack. A push-down automaton parses a sequence from left to right according to the algorithm. The stack is initialised by pushing the start nonterminal into it. The steps are iterated until no input symbols remain. If the stack is empty at the end then the sequence has been successfully parsed. Transformational grammars

25 Algorithm: Parsing with a push-down automaton
Pop a symbol off the stack. If the poped symbol is nonterminal: - Peek ahead in the input from the current position and choose a valid production for the nonterminal. If there is no valid production, terminate and reject the sequence. - Push the right side of the chosen production rule onto the stack, rightmost symbols first. If the poped symbol is a terminal: - Compare it to the current symbol of the input. If it matches, move the automaton to the right on the input (the input symbol is accepted). If it does not match, terminate and reject the sequence. Transformational grammars

26 Parsing an RNA stem loop with a push-down automaton
Input string Stack Automaton operation on stack and input GCCGCAAGGC S Pop S. Peek at input; produce S → g1c. GCCGCAAGGC g1c Pop g. Accept g; move right on input. GCCGCAAGGC 1c Pop 1. Peek at input; produce 1 → c2g. GCCGCAAGGC c2gc Pop c. Accept c; move right on input. GCCGCAAGGC 2gc Pop 2. Peek at input; produce 2 → c3g. GCCGCAAGGC c3ggc Pop c. Accept c; move right on input. (several acceptances) GCCGCAAGGC c Pop c. Accept c; move right on input. GCCGCAAGGC Stack empty. Input string empty. Accept. Transformational grammars

27 Context-sensitive grammars
Copy language: cc, acca, agaccaga, etc. initialisation: S → CW terminal generation: nonterminal generation: CA → aC W → AÂW | GĜW | C CG → gC nonterminal reordering: ÂC → Ca ÂG → GÂ ĜC → Cg ÂA → AÂ termination: ĜA → AĜ CC → cc ĜG → GĜ Transformational grammars

28 Linear bounded automaton
A mechanism for working backwards through all possible derivations: either the start was reached, or valid derivation was not found. Finite number of possible derivations to examine. Abstractly: ‘tape’ of linear memory and a read/write head. The number of possible derivations is exponentially large. Transformational grammars

29 NP problems and ‘intractability’
Nondeterministic polynomial problems: there is no known polynomial-time algorithm for finding a solution, but a solution can be checked for correctness in polynomial time. [Context-sensitive grammars parsing.] A subclass of NP problems - NP-complete problems. A polynomial time algorithm that solves one NP-complete problem will solve all of them. [Context-free grammar parsing.] Transformational grammars

30 Unrestricted grammars and Turing machines
Left and right sides of the production rules can be any combinations of symbols. The parsing automaton is a Turing machine. There is no general algorithm for determination whether a string has a valid derivation in less than infinite time. Transformational grammars

31 Transformational grammars
Stochastic grammars Stochastic grammar model generates different strings x with probability Non-stochastic grammars either generate a string x or not For stochastic regular and context-free grammars Transformational grammars

32 Example of stochastic grammar
For production rule Stochastic regular grammar might assign probabilities of 0.5 for the productions: Transformational grammars

33 Another probabilities
Exceptions can be admitted without grossly degrading of a grammar Exceptions should has low, but non-zero probabilities Transformational grammars

34 Stochastic context-sensitive or unrestricted grammars
Context-sensitive grammar {aa, ab, ba, bb} In general Transformational grammars

35 Stochastic context-sensitive grammar
In fact if and We have that Sum of probabilities of all possible productions from any non terminal is 1 if and only if or Transformational grammars

36 Proper stochastic grammar
Previous grammar can be changed in this way Now Transformational grammars

37 Hidden Markov models and stochastic regular grammars
Any HMM state which makes N transitions to new states that each emit one of M symbols can also be modeled by a set of NM stochastic regular grammar productions. Transformational grammars

38 Stochastic context-free grammars for sequence modeling
We can use stochastic context-free grammars for sequence modeling. To do it we should solve these problems: (i) calculate an optimal alignment of a sequence to a parameterized stochastic grammar (the alignment problem). Transformational grammars

39 Transformational grammars
Other problems (ii) Calculate the probability of a sequence given a parameterized stochastic grammar (the scoring problem). (iii) Given a set of example sequences, estimate optimal probability parameters for an unparameterised stochastic grammar (the training problem). Transformational grammars

40 Normal forms for stochastic context-free grammars
Chomsky normal form; production rules should be like this: or For example, production rule could be expanded to in Chomsky normal form. Transformational grammars

41 The inside-outside algorithm for SGFCs
The inside-outside algorithm for SGFCs in Chomsky normal form is the natural counterpart of the forward-backward algorithm for HMMs Computational complexity of inside-outside algorithm is substantially greater Transformational grammars

42 Transformational grammars
The inside algorithm Let we have Chomsky normal form SCFG with M nonterminals W1,W2,…WM start from W1 Production rules are: Wv WyWz and Wv a Probability parameters for this productions are: tv(y,z) and ev(a) respectively Transformational grammars

43 Transformational grammars
The inside algorithm Algorithm calculates the probability of a parse subtree rooted at nonterminal Wv for subsequence xi,…,xj for all i, j and v The calculations requires an three-dimensional dynamic programming matrix Transformational grammars

44 Transformational grammars
Algorithm: Inside Initialisation: for i=1 to L, v=1 to M: Iteration: for i=1 to L-1, j=i+1 to L, v=1 to M: Termination: Transformational grammars

45 Iteration step of the inside algorithm
Transformational grammars

46 Transformational grammars
The outside algorithm Algorithm calculates the probability of a complete parse tree rooted at the start nonterminal for the sequence x1,…,xL excluding all parse subtrees for sequence xi,…,xj rooted at nonterminal Wv for all i, j and v Transformational grammars

47 Transformational grammars
The outside algorithm The calculations requires an three-dimensional dynamic programming matrix (like the inside algorithm) Calculating requires the results from a previous inside calculation Transformational grammars

48 Transformational grammars
Algorithm: Outside Initialisation: for v=2 to M. Iteration: for i = 1 to L, j =L to I, v =1 to M: Termination: Transformational grammars

49 Iteration step of the outside algorithm
Transformational grammars

50 Parameter re-estimation by expectation maximisation
Transformational grammars

51 Parameter re-estimation by expectation maximisation
Re-estimation equation for probabilities of the production rules Wv WyWz is Production rule Wv a: Transformational grammars

52 The CYK alignment algorithm
Initialisation: for i=1 to L, v=1 to M: Iteration: for i=1 to L-1, j=i+1 to L, v=1 to M Termination: Transformational grammars

53 Transformational grammars
CYK traceback Initialisation: Push (1, L, 1) on the stack Iteration: Pop (i, j, v); (y, z, k) = If =(0, 0, 0) then attach xi as a child of v else attach y, z to parse tree as children of v Push (k+1, j, z). Push(i, k, y) Transformational grammars

54 Summary Goal HMM algorithm SCFG algorithm optimal alignment Viterbi
CYK forward inside EM parameter estimation forward-backward Inside-outside memory complexity O(LM) O(L2M) Time complexity O(LM2) O(L3M3) Transformational grammars


Download ppt "Transformational grammars"

Similar presentations


Ads by Google