Sónia Martins Bruno Martins José Cruz IGC, February 20 th, 2008.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials.
Transformational Grammars The Chomsky hierarchy of grammars Context-free grammars describe languages that regular grammars can’t Unrestricted Context-sensitive.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Stochastic Context Free Grammars for RNA Modeling CS 838 Mark Craven May 2001.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Project 4 Information discovery using Stochastic Context-Free Grammars(SCFG) Wei Du Ranjan Santra May 16, 2001.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Noncoding RNA Genes Pt. 2 SCFGs CS374 Vincent Dorie.
Normal forms for Context-Free Grammars
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Context-free Grammars
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
Project No. 4 Information discovery using Stochastic Context-Free Grammars(SCFG) Wei Du Ranjan Santra May
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
1 Introduction to Automata Theory Reading: Chapter 1.
Evolution of Universal Grammar Pia Göser Universität Tübingen Seminar: Sprachevolution Dozent: Prof. Jäger
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Some Probability Theory and Computational models A short overview.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
CS 3240: Languages and Computation Context-Free Languages.
Transformational Grammars and PROSITE Patterns Roland Miezianko CIS Bioinformatics Prof. Vucetic.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki.
November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Hidden Markov Models in Bioinformatics
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Formal Languages and Grammars
Context Free Grammars and Regular Grammars Needs for CFG Grammars and Production Rules Context Free Grammars (CFG) Regular Grammars (RG)
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Genetic Programming. What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Estimation of Distribution Algorithm and Genetic Programming Structure Complexity Lab,Seoul National University KIM KANGIL.
Context Free Grammars & Parsing CPSC 388 Fall 2001 Ellen Walker Hiram College.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Genome Annotation (protein coding genes)
Programming Languages Translator
Hidden Markov Models (HMM)
Stochastic Context-Free Grammars for Modeling RNA
Natural Language Processing - Formal Language -
Context free grammar.
PARSE TREES.
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
CSE322 Chomsky classification
Formal Language.
Stochastic Context-Free Grammars for Modeling RNA
Catalogues, Homology & Molecular Evolution.
Regular Grammar.
The Language of genes Su Dong Kim Biointelligence Lab.
Decidable Problems of Regular Languages
Stochastic Context Free Grammars for RNA Structure Modeling
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
High-Level Programming Language
Presentation transcript:

Sónia Martins Bruno Martins José Cruz IGC, February 20 th, 2008

What are Grammars? “A formal grammar is: a precise description of a formal language.” in Wikipedia “A formal language is: An organized set of symbols (…) that (…) can be precisely defined in terms of just the shapes and locations of those symbols. Such a language can be defined, then, without any reference to any meanings of any of its expressions; it can exist before any interpretation is assigned to it --that is, before it has any meaning.” in Wikipedia A computer language (e.g. python) is the most handy example of a formal language. In our case the language can be: secondary structure of RNA, protein fold, sequence alignment...

The grammar G = {T, N, P} consists of: T – A finite set of terminal elements N – A finite set of non-terminal elements P – A finite set of production rules Example: T = {a, b, c}, the “words” of our language N = {S, A, B, C} P: S → A | B | C | ε A → aS | a B → baS C → caS This grammar can produce the following texts: a (from S1, A2,) aaaa (from S1, A1, S1, A1, S1, A1, S1, A1) ba (from S2, B, S4) bacaba (from S2, B, S3, C, S2, B, S4) …and a infinity of other useless garbage. How we define grammars?

Chomsky classification: Type 3: Regular(A → αA | α) Type 2: Context Free (A → γ) Type 1: Context Sensitive(αAβ → αγβ) Type 0: Recursively Enumerable(α → β) α, β: sequence of terminals γ: non empty sequence of terminals and non-terminals Types of grammars

Linguistics in sequence analysis Evolutionary proccess is time reversible L 1 - all parameters and rules are know L 2 - set of probabilities derived from the evolution of L 1

Languages and application areas Stochastic Regular (SRG): Hidden Markov Models Protein gene finding Comparative genomics

Languages and application areas RNA structure prediction Gene finding Stochastic Context Free (SCFG): S → LS (.87) | L (.13) F → dFd (.79) | LS (.21) L → s (.89) | dFd (.11) Example: s s d d d s s s s d d d s s s s s Knudsen et al., Bioinformatics, 15, 6, 1999, Time complexity: O(N 3 )

Languages and application areas Stochastic tree adjoining (STAG): Model RNA pseudoknots Complex protein structures Matsui et al., Bioinformatics, 21, 11, 2005, (A(G[AC)U)U] Model of non nested dependencies

Grammar Type Non Terminal Symbols Rules Parameter For a given set of terminal symbols Evolve (where)?

Non Terminal Symbols Rules Parameter Exploring Mechanism: Random Walk Genetic Algorithm AI Search Algorithm Scoring Mechanism: RNA – Energy function Protein – Energy function Aligment Likelyhood Evolve (how)?

We can define a grammar to define grammars: R1: S → RS (p i ) | R (p i ) R2: R → NT '→' L (p i ) R3: L → NT L(p i )| T L(p i ) | NT(p i ) | T(p i ) R4: NT → 'S'(p i )| 'L' (p i ) | 'F' (p i ) R5: T → 's'(p i ) | 'd' (p i ) Meta Grammar

Meta Grammar: R1: S → RS | R R2: R → NT '→' L R3: L → NT L | T L | NT | T R4: NT → 'S' | 'L' | 'F' R5: T → 's' | 'd' Lets try to evolve the rule S→LS|L to S→LS|F: S S→ LS S→ L S→ LS S→ F Meta Grammar R1a R2 R4a R3a R4b R3c R4a R1b R2 R4a R3c R4b R1a R2 R4a R3a R4b R3c R4a R1b R2 R4c R3c R4b

Search space too big - combinatorial explosion (maybe unfeasible without supervision) Easy to fall into a dead end (infinite grammar – Gödel's ghost) Do grammars formally describe the phenomena we want to model? Remarks