Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)

Example Guessing strings over alphabet {a,b} a Guess? a, aa Guess? a, aa, aaa Guess? a, aa, aaa, b Guess? a, aa, aaa, b, bb Guess? a, aa, aaa, b, bb, bbb Guess? a, aa, aaa, b, bb, bbb, aba Guess? … Infinite process called Infinite process called identification in the limit

Learning paradigm Language or class of languages to be learned Learner Environment in which a language is presented to the learner Hypotheses that occur to the learner about language to be learned on the basis of the environment

Formal language L Given finite alphabet, say T={a,b} A sentence w is a string of symbols over the alphabet T: , a, aa, ab, ba, bba, …  =empty string A language L={w 0, w 1, w 2,…} is a set of correct sentences, say L 1 ={a, aa, aaa, aaaa, …} L 2 ={a, b, bab, aba, babab, …} w 0, w 1, w 2,… is a text for L (order and repetition do not matter) w 2, w 1, w 2, w 1, w 3,… another text for L

Chomsky grammar G=(T, V, S, P) T={a,b}, V={S}, P=production rules L 1 = {a, aa, aaa, …} P 1 1.S  aS 2.S  a L(G 1 )= L 1 S  a  aaS  aaaS  aaaa S  aS  aaS  aaaS  aaaa Regular grammar Finite state automaton L 2 ={a, b, bab, aba, babab, aabaa, …} P 2 1.S  aSa 2.S  bSb 3.S  a 4.S  b L(G 2 )= L 2 Context-free grammar Push-down automaton

Coding Chomsky languages Chomsky languages=computably enumerable languages Gödel coding by numbers of finite sequences of syntactic objects Code of L is e: L=L e Algorithmic enumeration of all (unrestricted) Chomsky languages L 0, L 1, L 2, …,L e,…

Decidable (computable) language A language is decidable if there is an algorithm that recognizes correct from incorrect sentences. Chomsky language is decidable exactly when the incorrect sentences form a Chomsky language. Not every Chomsky language is decidable. There is no algorithmic enumeration of all decidable languages.

Learning from text An algorithmic learner is a Turing machine being fed text for the language L to be learned, sentence by sentence. At each step the learner guesses the code for the language being fed: w 0 ; e 0 w 0, w 1 ; e 1 w 0, w 1, w 2 ; e 2 … Learning is successful if the sequence e 0, e 1, e 2, e 3, … converges to the “description” of L. converges to the “description” of L.

Syntactic convergence EX-learning EX=explanatory For some n, have e 0, e 1, …, e n, e n, e n, e n, … and The set of all finite languages is EX-learnable from text.

Semantic convergence BC-learning BC = behaviorally correct For some n, have e 0, e 1, …, e n, e n+1, e n+2, e n+3, … and There are classes of languages that are BC-learnable, but not EX-learnable

Learning from an informant L = {w 0, w 1, w 2, w 3, … } (not L)= {u 0, u 1, u 2, u 3, … } = incorrect sentences in proper vocabulary Learning steps w 0 ; e 0 w 0 ; e 0 w 0, u 0 ; e 1 w 0, u 1, w 1 ; e 2 w 0, u 1, w 1, u 2 ; e 3 w 0, u 1, w 1, u 2 ; e 3…

Locking sequence for EX-learner (Blum-Blum) If a learner can learn a language L, then there is a finite sequence  of sentences in L, called a locking sequence for L, on which the learner “locks” its correct hypothesis; that is, after that sequence the hypothesis does not change. Same hypothesis on  and ( ,  ) for any 

Angluin criterion Maximum finite fragment property Consider a class of Chomsky languages. Then the class is EX-learnable from text exactly when for every language L in the class, there is a finite fragment D of L (D ⊂ L) such that every other possibly bigger fragment U of L: D ⊆ U ⊂ L cannot be in the class.

Problem How can we formally define and study certain learning strategies? Constraints on hypotheses: consistency, confidence, reliability, etc.

Consistent learning A learner is consistent on a language L if at every step, the learner guesses a language which includes all the data given to the learner up to that point. The class of all finite languages can be identified consistently. If a languages is consistently EX-learnable by an algorithmic learner, then it must be a decidable language.

Popperian learning of total functions (Total) computable function f can be tested against finite sequences of data given to the learner. A learner is Popperian on f if on any sequence of positive data for f, the learner guesses a computable function. A learner is Popperian on f if on any sequence of positive data for f, the learner guesses a computable function. A learner is Popperian if it is Popperian on every computable function. A learner is Popperian if it is Popperian on every computable function. Not every algorithmically EX-learnable class of functions is Popperian.

Confident learning Learner is confident when it is guaranteed to converge to some hypothesis, even if it is given text for a language that does not belong to the class to be learned. Must be also accurate on the languages in the class. There is a class that is learnable by an algorithmic EX-learner, and is learnable by (another) confident EX-learner, but cannot be learned by an algorithmic and confident EX- learner.

Reliable learning Learner is reliable if it is not allowed to converge incorrectly (although might never converge on the text for a language not in the class). Reliable EX-learnability from text implies that every language in the text must be finite.

Decisive learning Learner, once it has put out a revised hypothesis for a new language, which replaces an earlier hypothesized language, never returns to the old language again. Decisive EX-learning from text not restrictive for general learners, nor for algorithmic learners of computable functions. Decisiveness reduces the power of algorithmic learning for languages.

U-shaped learning (Baliga, Case, Merkle, Stephan, and Wiehagen) Variant of non-decisive learning Mimics learning-unlearning-relearning pattern Overregularization in Language Acquisition, monograph by Markus, Pinker, Ullman, Hollande, Rosen, and Xu.

Problem How can we develop algorithmic learning theory for languages more complicated than Chomsky languages, in particular, ones closer to natural language? (Case and Royer) Correction grammars: L 1 – L 2, where G 1 is Chomsky (unrestricted, type-0; or context- free, type-2) grammar for generating language L 1 and G 2 is the one generating the editing (corrections) L 2 where G 1 is Chomsky (unrestricted, type-0; or context- free, type-2) grammar for generating language L 1 and G 2 is the one generating the editing (corrections) L 2 Burgin: “Grammars with prohibition and human-computer interaction,” 2005. Ershov’s difference hierarchy in computability theory for limit computable languages

Problem What is the significance of negative versus positive information in the learning process? Learning from switching type of information (Jain and Stephan): Learning from switching type of information (Jain and Stephan): Learner can request positive or negative information about L, but when he, after finitely many switches, requests information of the same type, he receives all of it (in the limit)

Harizanov-Stephan’s result Consider a class of Chomsky languages. Assume that there is a language L in the family such that for every finite set of sentences D, there are languages U and U′ in the family with U ⊂ L ⊂ U′ and and D ∩ U = D ∩ U′ Then the family cannot be even BC-learned from switching. Then the family cannot be even BC-learned from switching. U approximates L from below; U′ from above U and U′ coincide on D

Problem What are good formal frameworks that unify deduction and induction? Martin, Sharma and Stephan: use parametric logic (5 parameters: vocabulary, structures, language, data sentences, assumption sentences) Model theoretic approach, based on the Tarskian “truth-based” notion of logical consequence. The difference between deductive and inductive consequences lies in the process of deriving a consequence from the premises.

Deduction vs induction A sentence s is a deductive consequence of a theory T if s can be inferred from T with absolute certainty. A sentence s is an inductive consequence of a theory T if s can be correctly (only hypothetically) inferred from T, but can also be incorrectly inferred from other theories T′ that have enough in common with T to provisionally force the inference of s.

Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)

Similar presentations

Presentation on theme: "Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)

Similar presentations

Presentation on theme: "Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)"— Presentation transcript:

Similar presentations

About project

Feedback