Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language and Statistics

Similar presentations


Presentation on theme: "Language and Statistics"— Presentation transcript:

1 11-761 Language and Statistics
Spring 2014 Roni Rosenfeld

2 Course Goals and Style Teaching statistical techniques for language technologies Plugging gaping holes in LTI/CS grad student education in probability, statistics and information theory. 13 January 2014 © Roni Rosenfeld, 2014

3 Course philosophy Socratic Method Highly interactive Highly adaptable
participation strongly encouraged (pls state your name) Highly interactive Highly adaptable based on how fast we move Lots of Probability, Statistics, Information theory not in the abstract, but rather as the need arises Lectures emphasize intuition, not rigor or detail background reading will have rigor & detail 13 January 2014 © Roni Rosenfeld, 2014

4 Course Mechanics Highly recommended: learn & use a text processing language like perl, python, … Can you derive Bayes equation in your sleep? New this year: 11661 (masters level): no final project Hand in assignments via Blackboard Vigorous enforcement of collaboration & disclosure policy 13 January 2014 © Roni Rosenfeld, 2014

5 Background Material No single book exists which covers the course material. “Foundations of Statistical NLP”, Manning & Schutze Computational Linguistics perspective “Statistical Methods in Speech Recognition”, Jelinek “Text Compression”, Bell, Cleary & Witten first 4 chapters; rest is mostly text compression “Probability and Statistics”, DeGroot “All of Statistics” & “All of nonparametric Statistics”, Wasserman Lots of individual articles 13 January 2014 © Roni Rosenfeld, 2014

6 Syllabus (subject to change)
Overview and Grand Thoughts What Is All This Good For? source-channel formulation Words, Words, Words type vs, token, Zipf, Mandlebrot, heterogeneity of langauge Modeling Word distributions - the unigram: [estimators, ML, zero frequency, smoothing, shrinkage, G-T] N-grams: Deleted Interpolation Model, backoff, toolkit Measuring Success: perplexity [entropy, KL-div, MI], the entropy of English, alternatives 13 January 2014 © Roni Rosenfeld, 2014

7 Syllabus (continued) Clustering: Latent Variable Models, EM
class-based N-grams, hierarchical clustering hard and soft clustering Latent Variable Models, EM Hidden Markov Models, revisiting interpolated and class n-grams Part-Of-Speech tagging, Word Sense Disambiguation Decision & Regression Trees Particularly as applied to language Stochastic Grammars (SCFG, inside-outside alg., Link grammar) 13 January 2014 © Roni Rosenfeld, 2014

8 Syllabus (continued) Maximum Entropy Modeling
exponential models, ME principle, feature induction... Language Model Adaptation caches, backoff Dimensionality reduction latent semantic analysis Syntactic Language Models 13 January 2014 © Roni Rosenfeld, 2014


Download ppt "Language and Statistics"

Similar presentations


Ads by Google