Download presentation
Presentation is loading. Please wait.
1
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011
2
Roadmap Looking back: Course units Tools and Techniques Looking forward Upcoming courses
3
Units #0,#1 Unit #0 (0.5 weeks): HW #1 Introduction to NLP & shallow processing Tokenization
4
Units #0,#1 Unit #0 (0.5 weeks): HW #1 Introduction to NLP & shallow processing Tokenization Unit #1 : HW #2, #3 Formal Languages and Automata (2.25 weeks) Formal languages Finite-state Automata Finite-state Transducers Morphological analysis
5
Unit #2 Unit #2 (3.25 weeks): HW #4, #5, #6, #7 Ngram Language Models and HMMs Ngram Language Models and Smoothing Part-of-speech (POS) tagging: Ngram Hidden Markov Models
6
Units #3, #4 Unit #3: Classification (3 weeks) HW #8, #9 Intro to classification & Mallets POS tagging with classifiers Chunking Named Entity (NE) recognition
7
Units #3, #4 Unit #3: Classification (3 weeks) HW #8, #9 Intro to classification & Mallets POS tagging with classifiers Chunking Named Entity (NE) recognition Unit #4: Selected Topics (1.5 weeks) HW #10 Clustering Information Extraction Summary
8
Techniques & Tools
9
Main Techniques Probability: Chain rule:
10
Main Techniques Probability: Chain rule: Bayes’ Rule:
11
Main Techniques Formal languages: Chomsky hierarchy
12
Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata
13
Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata Regular relations, finite state transducers
14
Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata Regular relations, finite state transducers Finite state morphology: two-level morphological analysis
15
Main Techniques Formal languages: Chomsky hierarchy Regular languages, regular expressions, regular grammars, finite state automata Regular relations, finite state transducers Finite state morphology: two-level morphological analysis Cascades of finite state transducers
16
Main techniques Language Modeling and Hidden Markov Models
17
Main techniques Language Modeling and Hidden Markov Models N-gram language models Maximum likelihood estimation
18
Main techniques Language Modeling and Hidden Markov Models N-gram language models Maximum likelihood estimation Smoothing Laplace, Good-Turing, Backoff, Interpolation
19
Main techniques Language Modeling and Hidden Markov Models N-gram language models Maximum likelihood estimation Smoothing Laplace, Good-Turing, Backoff, Interpolation Hidden Markov Models Markov assumptions Forward algorithm & Viterbi decoding
20
Main Techniques Classification for NLP:
21
Main Techniques Classification for NLP: Modeling NLP tasks as classification problems
22
Main Techniques Classification for NLP: Modeling NLP tasks as classification problems Developing feature representations for instances
23
Main Techniques Classification for NLP: Modeling NLP tasks as classification problems Developing feature representations for instances Sequence labeling tasks and algorithms
24
Main Techniques Classification for NLP: Modeling NLP tasks as classification problems Developing feature representations for instances Sequence labeling tasks and algorithms Beam search
25
Tools Developed English tokenizer: HW#1 FSA & FST acceptors: HW#2,#3 FST morphological analyzer: HW#3 N-gram language models with smoothing: HW#4,#5 Authorship identification system: HW#5 Hidden Markov Model: Training & Decoding: HW#6,7 HMM POS Tagger Classification-based text categorization, POS tagging HW#8,#9 Unsupervised POS tagger: HW#10
26
Corpora & Systems Data:
27
Corpora & Systems Data: Penn Treebank Wall Street Journal Air Travel Information System (ATIS)
28
Corpora & Systems Data: Penn Treebank Wall Street Journal Air Travel Information System (ATIS) Project Gutenberg Federalist Papers, Jane Austen novels Systems:
29
Corpora & Systems Data: Penn Treebank Wall Street Journal Air Travel Information System (ATIS) Project Gutenberg Federalist Papers, Jane Austen novels Systems: CARMEL Finite State Toolkit Mallet Machine Learning Toolkit
30
Looking Forward
31
Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation
32
Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation Ling572: Advanced Statistical Methods in NLP Roughly, machine learning for CompLing Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,…
33
Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation Ling572: Advanced Statistical Methods in NLP Roughly, machine learning for CompLing Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,… Ling567: Knowledge Engineering for Deep NLP HPSG and MRS for novel languages
34
Winter Courses Ling571: Deep Processing Techniques for NLP Parsing, Semantics (Lambda Calculus), Generation Ling572: Advanced Statistical Methods in NLP Roughly, machine learning for CompLing Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,… Ling567: Knowledge Engineering for Deep NLP HPSG and MRS for novel languages Ling575: Spoken Dialog Systems Design, analysis, and implementation of SDS
35
Tentative Outline for Ling572 Unit #0 (0.5 weeks): Basics Introduction Feature representations Classification review
36
Tentative Outline for Ling572 Unit #0 (0.5 weeks): Basics Introduction Feature representations Classification review Unit #1 (3 weeks): Classic Machine Learning K Nearest Neighbors Decision Trees Naïve Bayes
37
Tentative Outline for Ling572 Unit #3: (4 weeks): Discriminative Classifiers Feature Selection Maximum Entropy Models Support Vectors Machines
38
Tentative Outline for Ling572 Unit #3: (4 weeks): Discriminative Classifiers Feature Selection Maximum Entropy Models Support Vectors Machines Unit #4: (1.5 weeks): Sequence Learning Conditional Random Fields Transformation Based Learning
39
Tentative Outline for Ling572 Unit #3: (4 weeks): Discriminative Classifiers Feature Selection Maximum Entropy Models Support Vectors Machines Unit #4: (1.5 weeks): Sequence Learning Conditional Random Fields Transformation Based Learning Unit #5: (1 week): Other Topics Semi-supervised learning,…
40
Ling572 Information No required textbook: Online readings and articles
41
Ling572 Information No required textbook: Online readings and articles More math/stat content than 570 Probability, Information Theory, Optimization
42
Ling572 Information No required textbook: Online readings and articles More math/stat content than 570 Probability, Information Theory, Optimization Please try to register at least 2 weeks in advance
43
Beyond Ling572 Machine learning: Graphical models Bayesian approaches Online learning Reinforcement learning ….
44
Beyond Ling572 Machine learning: Graphical models Bayesian approaches Online learning Reinforcement learning …. Applications: Information Retrieval Question Answering Generation Machine translation ….
45
Ling 575: Spoken Dialog Systems Design, analysis, and implementation of SDS Will be offered online Please make sure you have a microphone Arrive early to test Loew 202: T: 3:30-5:50
46
Notes Grades should be submitted by 12/16 Any issues (errors) with grades in Gradebook, please email by 12/14
47
Notes Grades should be submitted by 12/16 Any issues (errors) with grades in Gradebook, please email by 12/14 Graduation Planning: Students in Group 1 (planning to graduate ~8/12) Due dates in early-mid January 1 st thesis proposal draft 1 st pre-internship report
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.