Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York
Meeting 1 (Overview): Today’s agenda: Why computationally model language learning? Linguistics, state space search and definitions Early (classic) computational approaches Gold - language can’t be learned theorem Angluin - Oh yes it can Artificial Neural Networks: an Introduction Tlearn software demonstration (if time)
Explicitness of the computational model can ground linguistic theories - "...it may be necessary to find out how language learning could work in order for the developmental data to tell us how is does work." (Pinker, 1979) Can natural language grammar be modeled by X? only if X is both descriptively adequate (predicts perceived linguistic phenomena) and explanatorily adequate (explains how the phenomena come to be) (Bertolo, MIT Encyclopedia of Cognitive Science) If a computational model demonstrates that some formally defined class of models cannot be learned, X had better fall outside of that class regardless of its descriptive adequacy.
Generative Linguistics phrase structure rule (PS) grammar - a formalism based on rewrite rules which are recursively applied to yield the structure of an utterance. transformational grammar - sentences have (at least) two phrase structures: an original or base-generated structure and the final or surface structure. A transformation is a mapping from one phrase structure to another. principles and parameters - all languages share the same principles with a finite number of sharply delineated differences or parameters. NON-generative linguistics. See Elman, Language as a dynamical system.
Syntax acquisition can be viewed as a state space search — nodes represent grammars including a start state and a target state. — arcs represent a possible change from one hypothesized grammar to another. G0G0 G3G3 G5G5 G2G2 G6G6 G5G5 G4G4 G targ
Gold’s grammar enumeration learner (1967) G1G1 G2G2 G3G3 G targ G0G0 s L(G 0 ) s L(G 1 )s L(G 0 )s L(G 3 ) s L(G 1 )s L(G 2 )s L(G 3 )s L(G targ ) s L(G 2 ) where s is a function that returns the next sentence from the input sample being fed to the learner, and L(G i ) is the language generated by grammar G i. Two points: The learner is error-driven error-driven learners converge on the target in the limit
Learnability - Under what conditions is learning possible? Feasibility - Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of work? A class of grammars H) is learnable iff a learner such that G H, (fair) generable by G, the learner converges on G.
An early learnability result (Gold, 1967) Exposed to input strings of an arbitrary target language L targ = L(G targ ) where G targ H, it is impossible to guarantee that a learner can converge on G targ if H is any class in the Chomsky hierarchy. Moreover, no learner is uniformly faster than one that executes simple error-driven enumeration of languages. H - The hypothesis space is the set of grammars that may be hypothesized by the learner
L(G i ) L(G m ) "Walked." "She walked." "She ate." "She eated." "Eated." L(G k ) L(G o ) "Walked she." The Overgeneralization Hazard
If H = L(G i ) L(G k ) L(G i ) An infinite language an infinite set of included finite languages then H is unlearnable H L reg L reg is unlearnable L reg L cf L cs L re No class of languages in the chomsky hierarchy is learnable
Assume there exists a rival learner that converges earlier than the enumeration learner. The rival arrives at the target at time i, The enumerator at time j (i < j). At time j, the enumeration learner had to be conjecturing SOME grammar consistent with the input up to that point. If the target had happened to be that grammar, the enumerator would have been correct and the rival incorrect. Thus, for every language that the rival converges on faster than the enumerator, there is a language for which the reverse is true. Gold’s Enumeration Learner is as fast as any other learner
Corollary: Language just can't be learned ;-)
The class of human languages must intersect the Chomsky Hierarchy so that it does not coincide with any other class that properly includes any class in the hierarchy. L re L cs L cf L reg L human
Angluin’s Theorem (1980) A class of grammars H is learnable iff for every language L i = L(G i ), G i H there exists a finite subset D such that no other language L(G), G H includes D and is included in L i. if this language can be generated by a grammar in H, H is not learnable! D L(G i ) L(G)
Artificial Neural Networks: A brief introduction a) fully recurrent b) feedforward c) multi-component
bias node If these inputs are great enough, the unit fires. That is to say, a positive activation occurs here. How can we implement the AND function? Threshold node Input activations
First we must decide on representation: possible inputs: 1,0 possible outputs: 1, unit inputs unit output Boolean AND: How can we implement the AND function? We want an artificial neuron to implement this function.
1 1 1 net unit inputs unit output Oooops net = ∑activations arriving at threshold node
STEP activation function f(x) = 1 if x > 0 f(x) = 0 if x <= net f (netΣ)
f(net 9 ) a8a8 a7a7 a9a9 w 79 w 89 = a 7 (w 79 ) = 1(.75) =.75 = a 8 (w 89 ) =.3(1.6667) =.5 net 9 = Σ j a j (w j9 ) =.3(1.6667) + 1(.75) = 1.25 = 1 / (1+ e (-net) ) = w a1a1