11-761 Language and Statistics Spring 2010 Roni Rosenfeld http://www.cs.cmu.edu/~roni/11761-s10/
Course Goals and Style Teaching statistical techniques for language technologies Plugging gaping holes in LTI grad student education in probability, statistics and information theory. 26 December 2018 © Roni Rosenfeld, 2010
Course philosophy Socratic Method Highly interactive Highly adaptable participation strongly encouraged (pls state your name) Highly interactive Highly adaptable based on how fast we move Lots of Probability, Statistics, Information theory not in the abstract, but rather as the need arises Lectures emphasize intuition, not rigor or detail background reading will have rigor & detail 26 December 2018 © Roni Rosenfeld, 2010
Course Mechanics Highly recommended: learn & use a text processing language like perl, python, awk… Can you derive Bayes equation in your sleep? 26 December 2018 © Roni Rosenfeld, 2010
Background Material No single book exists which covers the course material. “Foundations of Statistical NLP”, Manning & Schutze Computational Linguistics perspective “Statistical Methods in Speech Recognition”, Jelinek “Text Compression”, Bell, Cleary & Witten first 4 chapters; rest is mostly text compression “Probability and Statistics”, DeGroot “All of Statistics” & “All of nonparametric Statistics”, Wasserman Lots of individual articles 26 December 2018 © Roni Rosenfeld, 2010
Syllabus (subject to change) Overview and Grand Thoughts What Is All This Good For? source-channel formulation Words, Words, Words type vs, token, Zipf, Mandlebrot, heterogeneity of langauge Modeling Word distributions - the unigram: [estimators, ML, zero frequency, G-T] N-grams: Deleted Interpolation Model, backoff, toolkit Measuring Success: perplexity [entropy, KL-div, MI], the entropy of English, alternatives 26 December 2018 © Roni Rosenfeld, 2010
Syllabus (continued) Clustering: Latent Variable Models, EM class-based N-grams, hierarchical clustering Latent Variable Models, EM Hidden Markov Models, revisiting interpolated and class n-grams Part-Of-Speech tagging, Word Sense Disambiguation Decision & Regression Trees Stochastic Grammars (SCFG, inside-outside alg., Link grammar) Maximum Entropy Modeling exponential models, ME principle, feature induction... 26 December 2018 © Roni Rosenfeld, 2010
Syllabus (continued) Language Model Adaptation caches, backoff Dimensionality reduction latent semantic analysis Statistical Parsing Statistical Machine Translation Statistical Text Segmentation Statistical Information Retrieval Statistical Information Extraction 26 December 2018 © Roni Rosenfeld, 2010