Transformational Grammars and PROSITE Patterns Roland Miezianko CIS 595 - Bioinformatics Prof. Vucetic.

Slides:



Advertisements
Similar presentations
Transformational Grammars “Colourless green ideas sleep furiously” - Noam Chomsky We might ask “Is this novel sentence (or sequence!) grammatical?” i.e.,
Advertisements

Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
ICE1341 Programming Languages Spring 2005 Lecture #4 Lecture #4 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
ISBN Chapter 3 Describing Syntax and Semantics.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
C SC 473 Automata, Grammars & Languages Automata, Grammars and Languages Discourse 01 Introduction.
Transformational grammars
Protein Modules An Introduction to Bioinformatics.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
1 Introduction: syntax and semantics Syntax: a formal description of the structure of programs in a given language. Semantics: a formal description of.
Chapter 3: Formal Translation Models
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
Fall 2006Costas Busch - RPI1 The Chomsky Hierarchy.
A shorted version from: Anastasia Berdnikova & Denis Miretskiy.
Fall 2003Costas Busch - RPI1 Turing Machines (TMs) Linear Bounded Automata (LBAs)
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Languages and Grammars MSU CSE 260. Outline Introduction: E xample Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar,
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
Languages & Strings String Operations Language Definitions.
Week 14 - Friday.  What did we talk about last time?  Exam 3 post mortem  Finite state automata  Equivalence with regular expressions.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
1 Section 14.2 A Hierarchy of Languages Context-Sensitive Languages A context-sensitive grammar has productions of the form xAz  xyz, where A is a nonterminal.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
Introduction to Language Theory
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
9.7: Chomsky Hierarchy.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Chapter 7 - Sequence patterns1 Chapter 7 – Sequence patterns (first part) We want a signature for a protein sequence family. The signature should ideally.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
1 Course Overview Why this course “formal languages and automata theory?” What do computers really do? What are the practical benefits/application of formal.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Lecture #2 Advanced Theory of Computation. Languages & Grammar Before discussing languages & grammar let us deal with some related issues. Alphabet: is.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Introduction to Formal Languages
CIS Automata and Formal Languages – Pei Wang
Course 1 Introduction to Formal Languages and Automata Theory (part 1)
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Complexity and Computability Theory I
Automata and Languages What do these have in common?
Natural Language Processing - Formal Language -
Context Sensitive Grammar & Turing Machines
Formal Language Theory
CSE322 The Chomsky Hierarchy
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Regular Grammar.
Regular Expressions
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
The Chomsky Hierarchy Costas Busch - LSU.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Presentation transcript:

Transformational Grammars and PROSITE Patterns Roland Miezianko CIS Bioinformatics Prof. Vucetic

Agenda Transformational GrammarsTransformational Grammars –Definition –The Chomsky Hierarchy Finite State AutomataFinite State Automata –FMR-1 Triplet Repeat Region –Regular Grammar Example PROSITEPROSITE –Patterns in Regular Grammar Form

Assumptions Treated biological sequences as one-dimensional strings of independent and uncorrelated symbols.Treated biological sequences as one-dimensional strings of independent and uncorrelated symbols. Need to address interaction among base pairs to understand secondary structures.Need to address interaction among base pairs to understand secondary structures.

Secondary Structures The 3-D folding of proteins and nucleic acids involves extensive physical interactions between residues that are not adjacent in primary sequence. [1]The 3-D folding of proteins and nucleic acids involves extensive physical interactions between residues that are not adjacent in primary sequence. [1] Require a model for secondary structure that reflect the interaction among base pairs.Require a model for secondary structure that reflect the interaction among base pairs.

Modeling Strings General theories for modeling strings of symbols has been developed by computational linguistsGeneral theories for modeling strings of symbols has been developed by computational linguists –Chomsky in 1956, 1959 –Interested in how a brain or computer program could algorithmically determine whether a sentence was grammatical or not

Transformational Grammars Transformational Grammars consist of:Transformational Grammars consist of: –Symbols Abstract Nonterminal SymbolsAbstract Nonterminal Symbols Terminal SymbolsTerminal Symbols –Rewriting Rules (Productions) A --> BA --> B

Transformational Grammars, Example Example Grammar Two-letter terminal alphabet: {a, b} Single nonterminal letter: S Three Productions: S->aS S->bS S->e (e=special blank terminal symbol) Example derivation of our simple grammar: S->aS->abS->abbS->abb

Chomsky Hierarchy Four types of restrictions on grammar’s productions resulted on four classes of grammars.Four types of restrictions on grammar’s productions resulted on four classes of grammars. –Regular Grammars –Context-Free Grammars –Context-Sensitive Grammars –Unrestricted Grammars

Chomsky Hierarchy regular context-free context-sensitive unrestricted

Automata Each grammar has a corresponding abstract computational device called: automatonEach grammar has a corresponding abstract computational device called: automaton GrammarParsing Automaton RegularFinite State Context-FreePush-Down Context-SensitiveLinear Bounded UnrestrictedTuring Machine

FRM-1 Triplet Repeat Region FRM-1 gene sequence contains CGG which is repeated number of timesFRM-1 gene sequence contains CGG which is repeated number of times Number of triplets is highly variable between individualsNumber of triplets is highly variable between individuals Increased copy number is associated with a genetic diseaseIncreased copy number is associated with a genetic disease

FRM-1 Triplet Repeat Region FSA will match any string from the “language” that contains the strings:FSA will match any string from the “language” that contains the strings: GCG CTG GCG CGG CTG GCG CGG CGG CTG GCG CGG CGG CGG CGG … CTG

FRM-1 Triplet Repeat Region

Regular Grammar for our Finite State Automaton finds any number of copies of CGG

PROSITE Patterns PROSITE database is an example of a biological application of regular grammarsPROSITE database is an example of a biological application of regular grammars –Unlike methods which assign scores to alignments, PROSITE patterns either match a sequence or do not.

PROSITE Patterns Consists of a string of pattern elements separated by dashes and terminated by a periodConsists of a string of pattern elements separated by dashes and terminated by a period –Pattern Element – single letter –[ ] - any one letter –{ } – anything but enclosed letters –X – any residue can occur –X(y) – any letter of length y

PROSITE Patterns [RK]-G-{EDRKHPCG}-[AGSCI]-[FY]-[LIVA]-x-[FYM]. RNP-1 Motif

Conclusion Transformational grammars are useful in developing acceptors of different length sequences and for matching specific multi- sequence regions.Transformational grammars are useful in developing acceptors of different length sequences and for matching specific multi- sequence regions. Higher order grammars in the Chomsky hierarchy are more difficult to program and applyHigher order grammars in the Chomsky hierarchy are more difficult to program and apply

References [1] Durbin, R. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. University of Cambridge Press, [2] Gibson, G. A Primer of Genome Science. Sinauer Associates, Inc. Publishers, [4] PROSITE Database [3] Mount, D. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001.

Transformational Grammars and PROSITE Patterns QuestionsAndAnswers