WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Words, Words, Words So far we have covered methods that largely operate on tokens. Tokenizing text Stemming words and determining lemmas POS-tagging Language models based on n-gram frequencies CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Every time I fire a linguist, my performance goes up1 None of this has much of what could be considered "linguistic" knowledge or "understanding". No parsing Not much domain knowledge o "meaning" For the next two sections of the course we will talk extensively about syntax and semantics. 1. Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98). CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
What's In a Word? For this lab, we will focus on some of the things that can be done with application of the techniques we have already studied. Format will be Try a demo Discuss what techniques were needed to implement it Discuss some of what would be needed to improve it CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Gender Genie www.bookblog.net/gender/genie.html Techniques: How good is it? What might improve it? Reference: www.cs.biu.ac.il/~koppel/papers/male-female-text-final.pdf CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Pearson Knowledge Technologies Text Classification Demo www.k-a-t.com:8080/classify/ Techniques: How good is it? What might improve it? Reference: www.k-a-t.com/publications.shtml CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Google Sets labs.google.com/sets Techniques: How good is it? What might improve it? Reference: if you find one let me know. Possibly something like this: ww.arxiv.org/pdf/cs.CL/0412098 CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
AT&T Text to Speech Techniques: How good is it? What might improve it? www.research.att.com/projects/tts/demo.html Techniques: How good is it? What might improve it? Reference: www.research.att.com/projects/tts/pubs.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari