11-761 Language and Statistics Spring 2014 Roni Rosenfeld http://www.cs.cmu.edu/~roni/11761-s14/
Course Goals and Style Teaching statistical techniques for language technologies Plugging gaping holes in LTI/CS grad student education in probability, statistics and information theory. 13 January 2014 © Roni Rosenfeld, 2014
Course philosophy Socratic Method Highly interactive Highly adaptable participation strongly encouraged (pls state your name) Highly interactive Highly adaptable based on how fast we move Lots of Probability, Statistics, Information theory not in the abstract, but rather as the need arises Lectures emphasize intuition, not rigor or detail background reading will have rigor & detail 13 January 2014 © Roni Rosenfeld, 2014
Course Mechanics Highly recommended: learn & use a text processing language like perl, python, … Can you derive Bayes equation in your sleep? New this year: 11661 (masters level): no final project Hand in assignments via Blackboard Vigorous enforcement of collaboration & disclosure policy 13 January 2014 © Roni Rosenfeld, 2014
Background Material No single book exists which covers the course material. “Foundations of Statistical NLP”, Manning & Schutze Computational Linguistics perspective “Statistical Methods in Speech Recognition”, Jelinek “Text Compression”, Bell, Cleary & Witten first 4 chapters; rest is mostly text compression “Probability and Statistics”, DeGroot “All of Statistics” & “All of nonparametric Statistics”, Wasserman Lots of individual articles 13 January 2014 © Roni Rosenfeld, 2014
Syllabus (subject to change) Overview and Grand Thoughts What Is All This Good For? source-channel formulation Words, Words, Words type vs, token, Zipf, Mandlebrot, heterogeneity of langauge Modeling Word distributions - the unigram: [estimators, ML, zero frequency, smoothing, shrinkage, G-T] N-grams: Deleted Interpolation Model, backoff, toolkit Measuring Success: perplexity [entropy, KL-div, MI], the entropy of English, alternatives 13 January 2014 © Roni Rosenfeld, 2014
Syllabus (continued) Clustering: Latent Variable Models, EM class-based N-grams, hierarchical clustering hard and soft clustering Latent Variable Models, EM Hidden Markov Models, revisiting interpolated and class n-grams Part-Of-Speech tagging, Word Sense Disambiguation Decision & Regression Trees Particularly as applied to language Stochastic Grammars (SCFG, inside-outside alg., Link grammar) 13 January 2014 © Roni Rosenfeld, 2014
Syllabus (continued) Maximum Entropy Modeling exponential models, ME principle, feature induction... Language Model Adaptation caches, backoff Dimensionality reduction latent semantic analysis Syntactic Language Models 13 January 2014 © Roni Rosenfeld, 2014