Omphalos Session. Omphalos Session Programme Design & Results 25 mins Award Ceremony 5 mins Presentation by Alexander Clark25 mins Presentation by Georgios.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Artificial Intelligence Presentation
MATH 224 – Discrete Mathematics
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Iowa State University Department of Computer Science, Iowa State University Artificial Intelligence Research Laboratory Center for Computational Intelligence,
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Dealing with NP-Complete Problems
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
CS 310 – Fall 2006 Pacific University CS310 Decidability Section 4.1/4.2 November 10, 2006.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
CS5371 Theory of Computation Lecture 12: Computability III (Decidable Languages relating to DFA, NFA, and CFG)
Finite State Machines Data Structures and Algorithms for Information Processing 1.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Some Probability Theory and Computational models A short overview.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Representing Languages by Learnable Rewriting Systems Rémi Eyraud Colin de la Higuera Jean-Christophe Janodet.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Learning regular tree languages from correction and equivalence queries Cătălin Ionuţ Tîrnăucă Research Group on Mathematical Linguistics, Rovira i Virgili.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
CS 154 Formal Languages and Computability March 8 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
 2005 SDU Lecture11 Decidability.  2005 SDU 2 Topics Discuss the power of algorithms to solve problems. Demonstrate that some problems can be solved.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Computability Joke. Context-free grammars Parsing. Chomsky
Chapter 7. Classification and Prediction
Closed book, closed notes
Context-Free Grammars: an overview
Programming Languages Translator
Challenges in Creating an Automated Protein Structure Metaserver
EDU 385 Session 8 Writing Selection items
Lecture 22 Pumping Lemma for Context Free Languages
4 (c) parsing.
Parsing Techniques.
Jaya Krishna, M.Tech, Assistant Professor
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Jaya Krishna, M.Tech, Assistant Professor
Intro to Theory of Computation
Parsing and More Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
CHAPTER 2 Context-Free Languages
فصل دوم Context-Free Languages
Chunk Parsing CS1573: AI Application Development, Spring 2003
CSE 6408 Advanced Algorithms.
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Omphalos Session

Omphalos Session Programme Design & Results 25 mins Award Ceremony 5 mins Presentation by Alexander Clark25 mins Presentation by Georgios Petasis 10 mins Open Discussion on Omphalos and GI competition20 mins

Omphalos : Design and Results Brad Starkie François Coste Menno van Zaanen

Contents Design of the Competition –A complexity measure for GI Results Conclusions

Aims Promote new and better GI algorithms A forum to compare GI algorithms Provide an indicative measure of current state-of-the-art.

Design Issues Format of training data Method of evaluation Complexity of tasks

Training Data Plain Text or Structured Data –Bracketed, Partially bracketed, Labelled, Unlabelled (+ve and –ve data) or (+ve data only) Plain Text,(+ve and –ve) and (+ve only) –Similar to Abbadingo –Placed fewest restrictions on competitors

Method of Evaluation Classification of unseen examples Precision and Recall Comparison of derivation trees Classification of unseen examples –Similar to Abbadingo –Placed fewest restrictions on competitors

Complexity of the Competition Tasks Learning task should be sufficiently difficult. –Outside the current state-of-the-art, but not too difficult Ideally provable that the training sentences are sufficient to identify the target language

Three axes of difficulty Complexity of the underlying grammar +ve/-ve or +ve only. Similarity between -ve and +ve examples.

Complexity Measure of GI Created a model of the GI based upon a brute force search (Non polynomial) Complexity measure = size of the hypothesis space created when presented with a characteristic set.

Hypothesis Space for GI All CFGs can be converted to Chomsky Normal Form. For any sentence there are a finite number of unlabelled derivations given CNF –Finite number of labelled derivation trees The grammar can be reconstructed given sufficient number of derivation trees All possible labelled derivation trees corresponds to all possible CNF grammars given the maximum number of nterms Solution: calculate max number of nterms and create all possible grammars

BruteForceLearner Given the positive examples construct all possible grammars Discard any grammars that generate any negative sentences Randomly select a grammar from hypothesis set

Characteristic Set of Positive Sentences Put the grammar into minimal CNF form –If a single rule is removed one or more sentences can't be derived For each rule add a sentence that can only be derived using that rule –Such a sentence exists if G in minimal form When presented with this set, one of the hypothesis grammars is correct

Characteristic set of Negative Sentences. Given G calculate positive sentences Construct hypothesis set For each hypothesis H  G, L(H)  L(G) add + sentence s | s  L(G) but s  L(H) For each hypothesis H  G, L(H)  L(G) add - sentence s | s  L(H) but s  L(G) Generating -ve data according to this technique requires exponential time – Therefore cannot be used to generate –ve data in Omphalos.

Creation of the Target Grammars Benchmark probs identified in literature –Stolcke-93,Nakamura-02,Cook-76,Hopcroft-01 Number of nterms, terms and rules were selected Randomly generated grammars, useless rules removed, CF constructs (center recursion) added A characteristic set of sentences was generated, and complexity measured To test if deterministic, LR(1) tables created using Bison For non-deterministic grammar non-deterministic constructs added

Creation of positive data Characteristic set generated from grammar Additional training examples added –Size of training set 10  20  size of characteristic set Longest training example was shorter than the longest test

Creation of negative data Not guaranteed to be sufficient Originally randomly created (bad idea) For probs 6a  10 regular equivalents to grammars constructed and negative data could be generated from regular equivalent to CFG –Nederhof-00 Center recursion expanded to a finite depth Vs true center recursion Equal number of positive and negative examples in the test sets

Participation Omphalos 1st page: ~ 1000 hits from 270 domains –Attempted to discard crawlers and bots hits –All continents except 2 Data sets : downloaded by 70 different domains Oracle: 139 label submissions by 8 contestants (4) –Short test sets: 76 submissions –Large test sets: 63 submissions

Results

Techniques Used. Prob 1 –Solved by hand. Probs 3, 4, 5, and 6 –Pattern matching using n-grams. –Generated its own negative data –the majority of randomly generated strings would not be contained within the language. Probs 2, 6.2, 6.4 –Distributional Clustering and ABL

Conclusions The way in which negative data is created is crucial to judging performance of competitors entries

Review of Aims Promote development of new and better GI algorithms –Partially achieved A forum to compare different GI algorithms –Achieved Provide an indicative measure of the state- of-the-art. –Achieved