Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Similar presentations


Presentation on theme: "Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011."— Presentation transcript:

1 Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011

2 Roadmap Looking back: Course units Tools and Techniques Looking forward Upcoming courses

3 Units #0,#1 Unit #0 (0.5 weeks): HW #1 Introduction to NLP & shallow processing Tokenization

4 Units #0,#1 Unit #0 (0.5 weeks): HW #1 Introduction to NLP & shallow processing Tokenization Unit #1 : HW #2, #3 Formal Languages and Automata (2.25 weeks) Formal languages Finite-state Automata Finite-state Transducers Morphological analysis

5 Unit #2 Unit #2 (3.25 weeks): HW #4, #5, #6, #7 Ngram Language Models and HMMs Ngram Language Models and Smoothing Part-of-speech (POS) tagging: Ngram Hidden Markov Models

6 Units #3, #4 Unit #3: Classification (3 weeks) HW #8, #9 Intro to classification & Mallets POS tagging with classifiers Chunking Named Entity (NE) recognition

7 Units #3, #4 Unit #3: Classification (3 weeks) HW #8, #9 Intro to classification & Mallets POS tagging with classifiers Chunking Named Entity (NE) recognition Unit #4: Selected Topics (1.5 weeks) HW #10 Clustering Information Extraction Summary

8 Techniques & Tools

9 Main Techniques Probability: Chain rule:

10 Main Techniques Probability: Chain rule: Bayes’ Rule:

11 Main Techniques Formal languages: Chomsky hierarchy

12 Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata

13 Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata Regular relations, finite state transducers

14 Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata Regular relations, finite state transducers Finite state morphology: two-level morphological analysis

15 Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata Regular relations, finite state transducers Finite state morphology: two-level morphological analysis Cascades of finite state transducers

16 Main techniques Language Modeling and Hidden Markov Models

17 Main techniques Language Modeling and Hidden Markov Models N-gram language models Maximum likelihood estimation

18 Main techniques Language Modeling and Hidden Markov Models N-gram language models Maximum likelihood estimation Smoothing Laplace, Good-Turing, Backoff, Interpolation

19 Main techniques Language Modeling and Hidden Markov Models N-gram language models Maximum likelihood estimation Smoothing Laplace, Good-Turing, Backoff, Interpolation Hidden Markov Models Markov assumptions Forward algorithm & Viterbi decoding

20 Main Techniques Classification for NLP:

21 Main Techniques Classification for NLP: Modeling NLP tasks as classification problems

22 Main Techniques Classification for NLP: Modeling NLP tasks as classification problems Developing feature representations for instances

23 Main Techniques Classification for NLP: Modeling NLP tasks as classification problems Developing feature representations for instances Sequence labeling tasks and algorithms

24 Main Techniques Classification for NLP: Modeling NLP tasks as classification problems Developing feature representations for instances Sequence labeling tasks and algorithms Beam search

25 Tools Developed English tokenizer: HW#1 FSA & FST acceptors: HW#2,#3 FST morphological analyzer: HW#3 N-gram language models with smoothing: HW#4,#5 Authorship identification system: HW#5 Hidden Markov Model: Training & Decoding: HW#6,7 HMM POS Tagger Classification-based text categorization, POS tagging HW#8,#9 Unsupervised POS tagger: HW#10

26 Corpora & Systems Data:

27 Corpora & Systems Data: Penn Treebank Wall Street Journal Air Travel Information System (ATIS)

28 Corpora & Systems Data: Penn Treebank Wall Street Journal Air Travel Information System (ATIS) Project Gutenberg Federalist Papers, Jane Austen novels Systems:

29 Corpora & Systems Data: Penn Treebank Wall Street Journal Air Travel Information System (ATIS) Project Gutenberg Federalist Papers, Jane Austen novels Systems: CARMEL Finite State Toolkit Mallet Machine Learning Toolkit

30 Looking Forward

31 Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation

32 Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation Ling572: Advanced Statistical Methods in NLP Roughly, machine learning for CompLing Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,…

33 Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation Ling572: Advanced Statistical Methods in NLP Roughly, machine learning for CompLing Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,… Ling567: Knowledge Engineering for Deep NLP HPSG and MRS for novel languages

34 Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation Ling572: Advanced Statistical Methods in NLP Roughly, machine learning for CompLing Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,… Ling567: Knowledge Engineering for Deep NLP HPSG and MRS for novel languages Ling575: Spoken Dialog Systems Design, analysis, and implementation of SDS

35 Tentative Outline for Ling572 Unit #0 (0.5 weeks): Basics Introduction Feature representations Classification review

36 Tentative Outline for Ling572 Unit #0 (0.5 weeks): Basics Introduction Feature representations Classification review Unit #1 (3 weeks): Classic Machine Learning K Nearest Neighbors Decision Trees Naïve Bayes

37 Tentative Outline for Ling572 Unit #3: (4 weeks): Discriminative Classifiers Feature Selection Maximum Entropy Models Support Vectors Machines

38 Tentative Outline for Ling572 Unit #3: (4 weeks): Discriminative Classifiers Feature Selection Maximum Entropy Models Support Vectors Machines Unit #4: (1.5 weeks): Sequence Learning Conditional Random Fields Transformation Based Learning

39 Tentative Outline for Ling572 Unit #3: (4 weeks): Discriminative Classifiers Feature Selection Maximum Entropy Models Support Vectors Machines Unit #4: (1.5 weeks): Sequence Learning Conditional Random Fields Transformation Based Learning Unit #5: (1 week): Other Topics Semi-supervised learning,…

40 Ling572 Information No required textbook: Online readings and articles

41 Ling572 Information No required textbook: Online readings and articles More math/stat content than 570 Probability, Information Theory, Optimization

42 Ling572 Information No required textbook: Online readings and articles More math/stat content than 570 Probability, Information Theory, Optimization Please try to register at least 2 weeks in advance

43 Beyond Ling572 Machine learning: Graphical models Bayesian approaches Online learning Reinforcement learning ….

44 Beyond Ling572 Machine learning: Graphical models Bayesian approaches Online learning Reinforcement learning …. Applications: Information Retrieval Question Answering Generation Machine translation ….

45 Ling 575: Spoken Dialog Systems Design, analysis, and implementation of SDS Will be offered online Please make sure you have a microphone Arrive early to test Loew 202: T: 3:30-5:50

46 Notes Grades should be submitted by 12/16 Any issues (errors) with grades in Gradebook, please email by 12/14

47 Notes Grades should be submitted by 12/16 Any issues (errors) with grades in Gradebook, please email by 12/14 Graduation Planning: Students in Group 1 (planning to graduate ~8/12) Due dates in early-mid January 1 st thesis proposal draft 1 st pre-internship report


Download ppt "Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011."

Similar presentations


Ads by Google