Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011.

Similar presentations


Presentation on theme: "Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011."— Presentation transcript:

1 Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011

2 Roadmap Classification problems: Definition Solutions Case studies Based on slides by F. Xia

3 Example: Text Classification Task: Given an article, predict its category Categories:

4 Example: Text Classification Task: Given an article, predict its category Categories: Sports, entertainment, news, weather,.. Spam/not spam

5 Example: Text Classification Task: Given an article, predict its category Categories: Sports, entertainment, news, weather,.. Spam/not spam What kind of information is useful for this task?

6 Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C

7 Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C Instance: (x,y) x: thing to be labeled/classified y: label/class

8 Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C Instance: (x,y) x: thing to be labeled/classified y: label/class Data: set of instances labeled data: y is known unlabeled data: y is unknown

9 Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C Instance: (x,y) x: thing to be labeled/classified y: label/class Data: set of instances labeled data: y is known unlabeled data: y is unknown Training data, test data

10 Text Classification Examples Spam filtering Call routing Sentiment classification Positive/Negative Score: 1 to 5

11 POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem?

12 POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem? Categories: N, V, Adj,… What information is useful?

13 POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem? Categories: N, V, Adj,… What information is useful? How do POS tagging, text classification differ?

14 POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem? Categories: N, V, Adj,… What information is useful? How do POS tagging, text classification differ? Sequence labeling problem

15 Word Segmentation Task: Given a string, break into words Categories:

16 Word Segmentation Task: Given a string, break into words Categories: B(reak), NB (no break) B(eginning), I(nside), E(nd) e.g. c1 c2 || c3 c4 c5

17 Word Segmentation Task: Given a string, break into words Categories: B(reak), NB (no break) B(eginning), I(nside), E(nd) e.g. c1 c2 || c3 c4 c5 c1/NB c2/B c3/NB c4/NB c5/B c1/B c2/E c3/B c4/I c5/E What type of task?

18 Word Segmentation Task: Given a string, break into words Categories: B(reak), NB (no break) B(eginning), I(nside), E(nd) e.g. c1 c2 || c3 c4 c5 c1/NB c2/B c3/NB c4/NB c5/B c1/B c2/E c3/B c4/I c5/E What type of task? Also sequence labeling

19 Solving a Classification Problem

20 Two Stages Training: Learner: training data  classifier

21 Two Stages Training: Learner: training data  classifier Testing: Decoder: test data + classifier  classification output

22 Two Stages Training: Learner: training data  classifier Testing: Decoder: test data + classifier  classification output Also Preprocessing Postprocessing Evaluation

23 Representing Input Potentially infinite values to represent

24 Representing Input Potentially infinite values to represent Represent input as feature vector x=

25 Representing Input Potentially infinite values to represent Represent input as feature vector x= What are good features?

26 Example I Spam Tagging Classes: Spam/Not Spam Input: Email messages

27 Doc1 Western Union Money Transfer office29@yahoo.com.ph One Bishops Square Akpakpa E1 6AO, Cotonou Benin Republic Website: http://www.westernunion.com/ info/selectCountry.asP Phone: +229 99388639 Attention Beneficiary, This to inform you that the federal ministry of finance Benin Republic has started releasing scam victim compensation fund mandated by United Nation Organization through our office. I am contacting you because our agent have sent you the first payment of $5,000 for your compensation funds total amount of $500 000 USD (Five hundred thousand united state dollar) We need your urgent response so that we shall release your payment information to you. You can call our office hot line for urgent attention(+22999388639)

28 Doc2 Hello! my dear. How are you today and your family? I hope all is good, kindly pay Attention and understand my aim of communicating you today through this Letter, My names is Saif al-Islam al-Gaddafi the Son of former Libyan President. i was born on 1972 in Tripoli Libya,By Gaddafi’s second wive. I want you to help me clear this fund in your name which i deposited in Europe please i would like this money to be transferred into your account before they find it. the amount is 20.300,000 million GBP British Pounds sterling through a

29 Doc3 from: web.25.5.office@att.net Apply for loan at 3% interest Rate..Contact us for details.

30 Doc4 from: acl@aclweb.orgacl@aclweb.org REMINDER: If you have not received a PIN number to vote in the elections and have not already contacted us, please contact either Drago Radev (radev@umich.edu) or Priscilla Rasmussen (acl@aclweb.org) right away. Everyone who has not received a pin but who has contacted us already will get a new pin over the weekend. Anyone who still wants to join for 2011 needs to do this by Monday (November 7th) in order to be eligible to vote. And, if you do have your PIN number and have not voted yet, remember every vote counts!radev@umich.eduacl@aclweb.org

31 What are good features?

32 Possible Features Words!

33 Possible Features Words! Feature for each word

34 Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.*

35 Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.* Errors: Spelling, grammar

36 Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.* Errors: Spelling, grammar Images

37 Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.* Errors: Spelling, grammar Images Header info

38 Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc1 x 2 =Doc2.. x n =Doc4

39 Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc1Spam x 2 =Doc2Spam.. x n =Doc4NotSpam

40 Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc1110.30Spam x 2 =Doc2Spam.. x n =Doc4NotSpam

41 Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc1110.30Spam x 2 =Doc2111.751Spam.. x n =Doc4NotSpam

42 Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc1110.30Spam x 2 =Doc2111.751Spam.. x n =Doc40002NotSpam

43 Classifier Result of training on input data With or without class labels

44 Classifier Result of training on input data With or without class labels Formal perspective: f(x) =y: x is input; y in C

45 Classifier Result of training on input data With or without class labels Formal perspective: f(x) =y: x is input; y in C More generally: f(x)={(c i,score i )}, where x is input, c i in C, score i is score for category assignment

46 Testing Input: Test data: e.g. AVM Classifier Output:

47 Testing Input: Test data: e.g. AVM Classifier Output: Decision matrix Can assign highest scoring class to each input

48 Testing Input: Test data: e.g. AVM Classifier Output: Decision matrix Can assign highest scoring class to each input x1x1 x2x2 x3x3 …. c1c1 0.1 0.2… c2c2 00.80… c3c3 0.200.7… c 4 ….. 0.70.1 …

49 Testing Input: Test data: e.g. AVM Classifier Output: Decision matrix Can assign highest scoring class to each input x1x1 x2x2 x3x3 …. c1c1 0.1 0.2… c2c2 00.80… c3c3 0.200.7… c 4 ….. 0.70.1 … x1x1 x2x2 x3x3 c4c4 c2c2 c3c3

50 Evaluation Confusion matrix: Precision: TP/(TP+FP) Gold System + - + TP FP - FN TN

51 Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) Gold System + - + TP FP - FN TN

52 Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) F-score: 2PR/(P+R) Gold System + - + TP FP - FN TN

53 Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) F-score: 2PR/(P+R) Accuracy = (TP+TN)/(TP+TN+FP+TN) Gold System + - + TP FP - FN TN

54 Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) F-score: 2PR/(P+R) Accuracy = (TP+TN)/(TP+TN+FP+TN) Why F-score? Accuracy? Gold System + - + TP FP - FN TN

55 Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN)=1/6 F-score: 2PR/(P+R)=2*1/5*1/6/(1/5+1/6)=2/11 Accuracy = 91% Gold System + - + 1 4 - 5 90

56 Evaluation Example Confusion matrix: Precision: Gold System + - + 1 4 - 5 90

57 Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN) Gold System + - + 1 4 - 5 90

58 Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN)=1/6 F-score: 2PR/(P+R)= Gold System + - + 1 4 - 5 90

59 Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN)=1/6 F-score: 2PR/(P+R)=2*1/5*1/6/(1/5+1/6)=2/11 Accuracy = 91% Gold System + - + 1 4 - 5 90

60 Classification Problem Steps Input processing: Split data into training/dev/test

61 Classification Problem Steps Input processing: Split data into training/dev/test Convert data into an Attribute-Value Matrix Identify candidate features Perform feature selection Create AVM representation

62 Classification Problem Steps Input processing: Split data into training/dev/test Convert data into an Attribute-Value Matrix Identify candidate features Perform feature selection Create AVM representation Training

63 Classification Problem Steps Input processing: Split data into training/dev/test Convert data into an Attribute-Value Matrix Identify candidate features Perform feature selection Create AVM representation Training Testing Evaluation

64 Classification Algorithms Will be covered in detail in 572 Nearest Neighbor Naïve Bayes Decision Trees Neural Networks Maximum Entropy

65 Feature Design & Representation What sorts of information do we want to encode?

66 Feature Design & Representation What sorts of information do we want to encode? words, frequencies, ngrams, morphology, sentence length, etc Issue

67 Feature Design & Representation What sorts of information do we want to encode? words, frequencies, ngrams, morphology, sentence length, etc Issue: Learning algorithms work on numbers Many work only on binary values (0/1)

68 Feature Design & Representation What sorts of information do we want to encode? words, frequencies, ngrams, morphology, sentence length, etc Issue: Learning algorithms work on numbers Many work only on binary values (0/1) Others work on any real-valued input How can we represent different information Numerically? Binary?

69 Representation Words/tags/ngrams/etc

70 Representation Words/tags/ngrams/etc One feature per item:

71 Representation Words/tags/ngrams/etc One feature per item: Binary: presence/absence Real: counts Binarizing numeric features:

72 Representation Words/tags/ngrams/etc One feature per item: Binary: presence/absence Real: counts Binarizing numeric features: Single threshold Multiple thresholds Binning: 1 binary feature/bin

73 Feature Template Example: Prevword (or w -1 )

74 Feature Template Example: Prevword (or w -1 ) Template corresponds to many features e.g. time flies like an arrow

75 Feature Template Example: Prevword (or w -1 ) Template corresponds to many features e.g. time flies like an arrow w -1 = w -1 =time w -1 =flies w -1 =like w -1 =an…

76 Feature Template Example: Prevword (or w -1 ) Template corresponds to many features e.g. time flies like an arrow w -1 = w -1 =time w -1 =flies w -1 =like w -1 =an… Shorthand for: w -1 = 0 or w -1 =time 1

77 AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 x2 x3

78 AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2 x3

79 AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2TimefliesTime flieslikeV x3

80 AVM Example Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2TimefliesTime flieslikeV x3flieslikeflies likeanP

81 AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2TimefliesTime flieslikeV x3flieslikeflies likeanP

82 Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem

83 Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B Friday/TIME-I

84 Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B Friday/TIME-I Input?

85 Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B Friday/TIME-I Input? Categories?

86 Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

87 Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem

88 Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs?

89 Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs? What are the categories?

90 Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs? What are the categories? What features would be useful?

91 HW#7 Viterbi! Implement Viterbi algorithm Use a standard, precomputed, presmoothed model Made available after HW#6 handed in Testing & Evaluation Convert output format to enable comparison w/gold Compare to gold-standard and produce a score


Download ppt "Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011."

Similar presentations


Ads by Google