Language and Statistics

Slides:



Advertisements
Similar presentations
EECS738 Xue-wen Chen EECS 738: Machine Learning Fall 2011, Prof. Xue-wen Chen The University of Kansas.
Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Introduction LING 572 Fei Xia Week 1: 1/3/06. Outline Course overview Problems and methods Mathematical foundation –Probability theory –Information theory.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000.
Natural Language Processing Ellen Back, LIS489, Spring 2015.
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
COURSE OVERVIEW ADVANCED TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
Machine Learning Queens College Lecture 1: Introduction.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Machine Learning Queens College Lecture 13: SVM Again.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
LING 388: Language and Computers Sandiway Fong Lecture 30 12/8.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
Introduction to CL & NLP CMSC April 1, 2003.
CS 6961: Structured Prediction Fall 2014 Course Information.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
30 March – 8 April 2005 Dipartimento di Informatica, Universita di Pisa ML for NLP With Special Focus on Tagging and Parsing Kiril Ribarov.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
Language and Statistics
Latent Dirichlet Allocation
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Course Summary ChengXiang Zhai ( 翟成祥 ) Department of.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Language Model for Machine Translation Jang, HaYoung.
Usman Roshan Dept. of Computer Science NJIT
Who am I? Work in Probabilistic Machine Learning Like to teach 
Tools for Natural Language Processing Applications
LECTURE 01: COURSE OVERVIEW
Natural Language Processing (NLP)
Special Topics in Data Mining Applications Focus on: Text Mining
Machine Learning in Natural Language Processing
CSE 515 Statistical Methods in Computer Science
CS4705 Natural Language Processing
John Lafferty, Chengxiang Zhai School of Computer Science
CS4705 Natural Language Processing
CSCI 5832 Natural Language Processing
LECTURE 01: COURSE OVERVIEW
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Command Me Specification
Language Model Approach to IR
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
Natural Language Processing (NLP)
Tokenizing Search/regex Statistics
Natural Language Processing (NLP)
Presentation transcript:

11-761 Language and Statistics Spring 2010 Roni Rosenfeld http://www.cs.cmu.edu/~roni/11761-s10/

Course Goals and Style Teaching statistical techniques for language technologies Plugging gaping holes in LTI grad student education in probability, statistics and information theory. 26 December 2018 © Roni Rosenfeld, 2010

Course philosophy Socratic Method Highly interactive Highly adaptable participation strongly encouraged (pls state your name) Highly interactive Highly adaptable based on how fast we move Lots of Probability, Statistics, Information theory not in the abstract, but rather as the need arises Lectures emphasize intuition, not rigor or detail background reading will have rigor & detail 26 December 2018 © Roni Rosenfeld, 2010

Course Mechanics Highly recommended: learn & use a text processing language like perl, python, awk… Can you derive Bayes equation in your sleep? 26 December 2018 © Roni Rosenfeld, 2010

Background Material No single book exists which covers the course material. “Foundations of Statistical NLP”, Manning & Schutze Computational Linguistics perspective “Statistical Methods in Speech Recognition”, Jelinek “Text Compression”, Bell, Cleary & Witten first 4 chapters; rest is mostly text compression “Probability and Statistics”, DeGroot “All of Statistics” & “All of nonparametric Statistics”, Wasserman Lots of individual articles 26 December 2018 © Roni Rosenfeld, 2010

Syllabus (subject to change) Overview and Grand Thoughts What Is All This Good For? source-channel formulation Words, Words, Words type vs, token, Zipf, Mandlebrot, heterogeneity of langauge Modeling Word distributions - the unigram: [estimators, ML, zero frequency, G-T] N-grams: Deleted Interpolation Model, backoff, toolkit Measuring Success: perplexity [entropy, KL-div, MI], the entropy of English, alternatives 26 December 2018 © Roni Rosenfeld, 2010

Syllabus (continued) Clustering: Latent Variable Models, EM class-based N-grams, hierarchical clustering Latent Variable Models, EM Hidden Markov Models, revisiting interpolated and class n-grams Part-Of-Speech tagging, Word Sense Disambiguation Decision & Regression Trees Stochastic Grammars (SCFG, inside-outside alg., Link grammar) Maximum Entropy Modeling exponential models, ME principle, feature induction... 26 December 2018 © Roni Rosenfeld, 2010

Syllabus (continued) Language Model Adaptation caches, backoff Dimensionality reduction latent semantic analysis Statistical Parsing Statistical Machine Translation Statistical Text Segmentation Statistical Information Retrieval Statistical Information Extraction 26 December 2018 © Roni Rosenfeld, 2010