Lecture 1: Introduction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501– Natural Language Processing
CS6501– Natural Language Processing Announcements Waiting list: Start attending the first few meetings of the class as if you are registered. Given that some students will drop the class, some space will free up. We will use Piazza as an online discussion platform. Please enroll. CS6501– Natural Language Processing
CS6501– Natural Language Processing Staff Instructor: Kai-Wei Chang Email: nlp16@kwchang.net Office: R412 Rice Hall Office hour: 2:00 – 3:00, Tue (after class). Additional office hour: 3:00 – 4:00, Thu TA: Wasi Ahmad Email: wua4nw@virginia.edu Office: R432 Rice Hall Office hour: 4:00 – 5:00, Mon CS6501– Natural Language Processing
CS6501– Natural Language Processing This lecture Course Overview What is NLP? Why it is important? What will you learn from this course? Course Information What are the challenges? Key NLP components CS6501– Natural Language Processing
CS6501– Natural Language Processing What is NLP Wiki: Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. CS6501– Natural Language Processing
Go beyond the keyword matching Identify the structure and meaning of words, sentences, texts and conversations Deep understanding of broad language NLP is all around us CS6501– Natural Language Processing
CS6501– Natural Language Processing Machine translation Facebook translation, image credit: Meedan.org CS6501– Natural Language Processing
Statistical machine translation Image credit: Julia Hockenmaier, Intro to NLP CS6501– Natural Language Processing
CS6501– Natural Language Processing Dialog Systems CS6501– Natural Language Processing
Sentiment/Opinion Analysis CS6501– Natural Language Processing
CS6501– Natural Language Processing Text Classification Other applications? www.wired.com CS6501– Natural Language Processing
CS6501– Natural Language Processing Question answering credit: ifunny.com 'Watson' computer wins at 'Jeopardy' CS6501– Natural Language Processing
CS6501– Natural Language Processing Question answering Go beyond search CS6501– Natural Language Processing
Natural language instruction https://youtu.be/KkOCeAtKHIc?t=1m28s CS6501– Natural Language Processing
Digital personal assistant More on natural language instruction Semantic parsing – understand tasks Entity linking – “my wife” = “Kellie” in the phone book credit: techspot.com CS6501– Natural Language Processing
Information Extraction Unstructured text to database entries Yoav Artzi: Natural language processing CS6501– Natural Language Processing
Language Comprehension Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Q: who wrote Winnie the Pooh? Q: where is Chris lived? CS6501– Natural Language Processing
What will you learn from this course The NLP Pipeline Key components for understanding text NLP systems/applications Current techniques & limitation Build realistic NLP tools CS6501– Natural Language Processing
What’s not covered by this course Speech recognition – no signal processing Natural language generation Details of ML algorithms / theory Text mining / information retrieval CS6501– Natural Language Processing
CS6501– Natural Language Processing This lecture Course Overview What is NLP? Why it is important? What will you learn from this course? Course Information What are the challenges? Key NLP components CS6501– Natural Language Processing
CS6501– Natural Language Processing Overview New course, first time being offered Comments are welcomed Aimed at first- or second- year PhD students Lecture + Seminar No course prerequisites, but I assume programming experience (for the final project) basics of probability calculus, and linear algebra (HW0) CS6501– Natural Language Processing
CS6501– Natural Language Processing Grading No exam & HW -- hooray Lectures & forum Participate in discussion (additional credits) Review quizzes (25%): 3 quizzes Critical review report (10%) Paper presentation (15%) Final project (50%) CS6501– Natural Language Processing
CS6501– Natural Language Processing Quizzes Format Multiple choice questions Fill-in-the-blank Short answer questions Each quiz: ~20 min in class Schedule: see course website Closed book, Closed notes, Closed laptop CS6501– Natural Language Processing
Critical review report 1 page maximum Pick one paper from the suggested list Summarize the paper (use you own words) Provide detailed comments What can be improved Potential future directions Other related work Some students will be selected to present their critical reviews CS6501– Natural Language Processing
CS6501– Natural Language Processing Paper presentation Each group has 2~3 students Picked one paper from the suggested readings, or your favorite paper Cannot be the same as critical review report Can be related to your final project Register your choice early 15 min presentation + 2 mins Q&A Will be graded by the instructor, TA, other students CS6501– Natural Language Processing
CS6501– Natural Language Processing Final Project Work in groups (2~3 students) Project proposal Written report, 2 page maximum Project report (35%) < 8 pages, ACL format Due 2 days before the final presentation Project presentation (15%) 5-min in-class presentation (tentative) CS6501– Natural Language Processing
CS6501– Natural Language Processing Late Policy Credit of 48 hours for all the assignments Including proposal and final project No accumulation No more grace period No make-up exam unless under emergency situation CS6501– Natural Language Processing
CS6501– Natural Language Processing Cheating/Plagiarism No. Ask if you have concerns UVA Honor Code: http://www.virginia.edu/honor/ CS6501– Natural Language Processing
Lectures and office hours Participation is highly appreciated! Ask questions if you are still confusing Feedbacks are welcomed Lead the discussion in this class Enroll Piazza https://piazza.com/virginia/fall2016/cs6501004 CS6501– Natural Language Processing
CS6501– Natural Language Processing Topics of this class Fundamental NLP problems Machine learning & statistical approaches for NLP NLP applications Recent trend in NLP CS6501– Natural Language Processing
CS6501– Natural Language Processing What to Read? Natural Language Processing ACL, NAACL, EACL, EMNLP, CoNLL, Coling, TACL aclweb.org/anthology Machine learning ICML, NIPS, ECML, AISTATS, ICLR, JMLR, MLJ Artificial Intelligence AAAI, IJCAI, UAI, JAIR CS6501– Natural Language Processing
CS6501– Natural Language Processing Questions? CS6501– Natural Language Processing
CS6501– Natural Language Processing This lecture Course Overview What is NLP? Why it is important? What will you learn from this course? Course Information What are the challenges? Key NLP components CS6501– Natural Language Processing
Challenges – ambiguity Word sense ambiguity CS6501– Natural Language Processing
Challenges – ambiguity Word sense / meaning ambiguity Credit: http://stuffsirisaid.com CS6501– Natural Language Processing
Challenges – ambiguity PP attachment ambiguity Credit: Mark Liberman, http://languagelog.ldc.upenn.edu/nll/?p=17711 CS6501– Natural Language Processing
Challenges -- ambiguity Ambiguous headlines: Include your children when baking cookies Local High School Dropouts Cut in Half Hospitals are Sued by 7 Foot Doctors Iraqi Head Seeks Arms Safety Experts Say School Bus Passengers Should Be Belted Teacher Strikes Idle Kids CS6501– Natural Language Processing
Challenges – ambiguity Pronoun reference ambiguity Credit: http://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakes CS6501– Natural Language Processing
Challenges – language is not static Language grows and changes e.g., cyber lingo LOL Laugh out loud G2G Got to go BFN Bye for now B4N Idk I don’t know FWIW For what it’s worth LUWAMH Love you with all my heart CS6501– Natural Language Processing
Challenges--language is compositional Carefully Slide CS6501– Natural Language Processing
Challenges--language is compositional 小心: Carefully Careful Take Care Caution 地滑: Slide Landslip Wet Floor Smooth CS6501– Natural Language Processing
CS6501– Natural Language Processing Challenges – scale Examples: Bible (King James version): ~700K Penn Tree bank ~1M from Wall street journal Newswire collection: 500M+ Wikipedia: 2.9 billion word (English) Web: several billions of words CS6501– Natural Language Processing
CS6501– Natural Language Processing This lecture Course Overview What is NLP? Why it is important? What will you learn from this course? Course Information What are the challenges? Key NLP components CS6501– Natural Language Processing
CS6501– Natural Language Processing Part of speech tagging CS6501– Natural Language Processing
Syntactic (Constituency) parsing CS6501– Natural Language Processing
Syntactic structure => meaning Image credit: Julia Hockenmaier, Intro to NLP CS6501– Natural Language Processing
CS6501– Natural Language Processing Dependency Parsing CS6501– Natural Language Processing
CS6501– Natural Language Processing Semantic analysis Word sense disambiguation Semantic role labeling Credit: Ivan Titov CS6501– Natural Language Processing
Q: [Chris] = [Mr. Robin] ? Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Slide modified from Dan Roth
Co-reference Resolution Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book
CS6501– Natural Language Processing Questions? CS6501– Natural Language Processing