Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Similar presentations


Presentation on theme: "Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications."— Presentation transcript:

1 Introduction to CL Session 1: 7/08/2011

2 What is computational linguistics? Processing natural language text by computers  for practical applications ... or linguistic research Among practical applications  Sometimes the computer only needs to classify or transform the text ... but sometimes it needs to “understand”  Ex: Watson: winner of ‘Jeopardy’  CL vs. NLP (natural language processing)

3 NLP applications Automatic speech recognition (ASR): speech  text Machine translation (MT): L1  L2 Information retrieval (IR): Query + documents  a subset of doc Information extraction (IE): document  “database”

4 NLP applications (cont) Question answering (QA): Question + documents  Answer Summarization: documents  summary Natural language generation (NLG): representation  text

5 Other Applications Call Center Spam filter Spell checker Sentiment analysis: product reviews Bio-NLP: processing clinical data ….

6 Basic NLP tasks: Shallow processing Tokenization: – He visited New York in 2003. Morphological analysis: – visited  visit + -ed Part-of-speech tagging – He/Pron visited/V New/?? York/N in/Prep 2003/CD Name-entity tagging – He visited [LOCATION New York] in [YEAR 2003] Chunking – [NP He] [V visited] [NP New York] in [NP 2003]

7 Basic NLP tasks: Deep processing Parsing – (S (NP (PRON he)) (VP (V visited) ….) Semantic analysis – Semantic tagging: [AGENT He] visited [DEST New York] …. – Meaning: visit (he, New-York) Discourse – Co-reference: “He” refers to “John” – Discourse structure Dialogue Generation

8 Ambiguity Phonological ambiguity: (ASR) – “too”, “two”, “to” – “ice cream” vs. “I scream” – “ta” in Mandarin: he, she, or it Morphological ambiguity: (morphological analysis) – unlockable: [[un-lock]-able] vs. [un-[lock-able]] Syntactic ambiguity: (parsing) – John saw a man with a telescope. – Time flies like an arrow.

9 Ambiguity (cont) Lexical ambiguity: (WSD) – Ex: “bank”, “saw”, “run” Semantic ambiguity: (semantic representation) – Ex: every boy loves his mother – Ex: John and Mary bought a house Discourse ambiguity: – Susan called Mary. She was sick. (coreference resolution) – It is pretty hot here. (intention resolution) Machine translation: – “brother”, “cousin”, “uncle”, etc.

10 Ambiguity resolution Rule-based or knowledge-based: – Parsing: I saw a man with a hat I saw a man with a telescope (in my hand) – WSD: “bank” – MT: “brother”, “cousin”, “uncle” Statistical approach: – Require training data – Build a statistical model – Knowledge and rules can be incorporated into the model as features etc.

11 Major approaches to NLP Rule-based approach Statistical approach – Supervised learning – Semi-supervised learning – Unsupervised learning

12 Supervised learning algorithms Hidden Markov Model (HMM) Decision tree Decision list Naïve Bayes Transformation-based Learning (TBL) Maximum Entropy (MaxEnt) Support Vector Machine (SVM) Conditional Random Field (CRF) …

13 Data Raw text: – Monolingual: English/Chinese/Arabic Gigawords – Parallel data: UN data, EuroParl Treebank: – Syntactic treebanks: a set of parse trees – Proposition Bank: – Discourse Treebank Dictionaries WordNet FrameNet …

14 Applications Task1Task2Task_i ML1 ML_m ML2 … D1D2D_n … …

15 The role of linguistics knowledge in NLP An NLP system is language-independent. Good or bad? – Good: it can be ported to many languages without any changes. – Bad: it cannot take advantage of properties of certain languages. How to incorporate (linguistic) knowledge in statistical systems? – the design of models – as features – as filters –…–…  Building a treebank is an effective way.


Download ppt "Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications."

Similar presentations


Ads by Google