LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.

Slides:



Advertisements
Similar presentations
LING 388: Language and Computers Sandiway Fong Lecture 2.
Advertisements

LING 388: Language and Computers Sandiway Fong Lecture 2: 8/24.
Course Info Course Topics and approximate Schedule Assignments and Grade Breakdown The usual Stuff including "How to fail this course" Students introduce.
C SC 620 Advanced Topics in Natural Language Processing Sandiway Fong.
LING 388: Language and Computers Sandiway Fong Lecture 26: 11/29.
LING 388: Language and Computers Sandiway Fong Lecture 28: 12/6.
CS 331 / CMPE 334 – Intro to AI CS 531 / CMPE AI Course Outline.
LING 364: Introduction to Formal Semantics Lecture 1 January 12th.
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.
Computational Intelligence 696i Language Lecture 1 Sandiway Fong.
COMP 110 Introduction to Programming Jingdan Zhang June 20, 2007 MTWRF 9:45-11:15 am Sitterson Hall 014.
COMP 110 Introduction to Programming Mr. Joshua Stough August 22, 2007 Monday/Wednesday/Friday 3:00-4:15 Gardner Hall 307.
LING 388: Language and Computers Sandiway Fong Lecture 1: 8/23.
C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.
LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22.
ECS15: Introduction to Computers Fall 2013 Patrice Koehl
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong. Today’s Topics Did you read Chapter 1 of JM? – Short Homework 1 (submit by midnight Saturday) Some slides.
Welcome to CS 115! Introduction to Programming. Class URL Please write this down!
Overview of the Course. Critical Facts Welcome to CISC 672 — Advanced Compiler Construction Instructor: Dr. John Cavazos Office.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
COMP Introduction to Programming Yi Hong May 13, 2015.
LING 388: Language and Computers Sandiway Fong Lecture 4.
CST 229 Introduction to Grammars Dr. Sherry Yang Room 213 (503)
Lecturer:Prof. Elizabeth A. Ritchie, ATMO TAs:Mr. Adrian Barnard Ms. Anita Annamalai NATS 101 Introduction to Weather and Climate Section 14: T/R 2:00.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
LING 388: Language and Computers Sandiway Fong Lecture 3.
LING 388: Language and Computers Sandiway Fong Lecture 30 12/8.
Computer Science 10: Introduction to Computer Science Dr. Natalie Linnell with credit to Cay Horstmann and Marty Stepp.
Welcome to CS 101! Introduction to Computers Spring 2015 This slide is based on Dr. Keen slides for CS101 day sections, with some modifications.
Catie Welsh January 10, 2011 MWF 1-1:50 pm Sitterson 014.
CSCI 51 Introduction to Computer Science Dr. Joshua Stough January 20, 2009.
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong. Today’s Topics Did you read Chapter 1 of JM? – Short Homework 2 (submit by midnight Friday) Today is Perl.
Introduction to Databases Computer Science 557 September 2007 Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Welcome to CS 115! Introduction to Programming. Class URL Write this down!
COP3502: Introduction to Computer Science Yashas Shankar.
CMSC 671 Introduction to Artificial Intelligence Course Overview Fall 2012.
LING 388: Language and Computers Sandiway Fong Lecture 1: 8/23.
June 19, Liang-Jun Zhang MTWRF 9:45-11:15 am Sitterson Hall 011 Comp 110 Introduction to Programming.
INTRODUCTION TO PROGRAMMING ISMAIL ABUMUHFOUZ | CS 146.
IST 210: Organization of Data
XP New Perspectives on Microsoft Access 2002 Tutorial 1 1 Microsoft Access 2007.
Welcome to CS 101! Introduction to Computers Fall 2015.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Computer Science 10: Introduction to Computer Science Dr. Natalie Linnell with credit to Cay Horstmann and Marty Stepp.
COP4020 INTRODUCTION FALL COURSE DESCRIPTION Programming Languages introduces the fundamentals of the design and implementation of programming languages.
Introduction: What is AI? CMSC Introduction to Artificial Intelligence January 3, 2002.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong 1.
Introduction: What is AI? CMSC Introduction to Artificial Intelligence January 7, 2003.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Computer Science I ISMAIL ABUMUHFOUZ | CS 180. CS 180 Description BRIEF SUMMARY: This course covers a study of the algorithmic approach and the object.
Welcome to CS 115! Introduction to Programming Spring 2016.
Networking CS 3470, Section 1 Sarah Diesburg
Networking CS 3470, Section 1 Sarah Diesburg
CPT S 317: Automata and Formal Languages
Introduction to Programming
Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Welcome to CS 1010! Algorithmic Problem Solving.
Welcome to CS 1010! Algorithmic Problem Solving.
LING 388: Computers and Language
LING/C SC 581: Advanced Computational Linguistics
CS201 – Course Expectations
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
Presentation transcript:

LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21

Part 1 Administrivia

Where –S SCI 224 When –TR 12:30–1:45PM (Computer Lab) No Class Scheduled For –Thursday October 18th –Thursday November 22nd (Thanksgiving) Office Hours –catch me after class, or –by appointment –Location: Douglass 311

Administrivia Map –Office (Douglass) –Classroom (S SCI)

Administrivia Homepage – Lecture slides –available on homepage after each class –in both PowerPoint (.ppt) and Adobe PDF formats animation: in powerpoint

Administrivia Course Objectives –Theoretical Introduction to a broad selection of natural language processing techniques Survey course –Practical Acquire some expertise –Use of tools –Parsing algorithms –Write grammars and machines

Administrivia Reference Textbook Speech and Language Processing, Jurafsky & Martin, Prentice-Hall 2000 –21 chapters (900 pages) –Concepts, algorithms, heuristics –This course concentrates on the sentence level stuff Sound/speech side Prof. Y. Lin Speech Tech LING 578 (this semester) Prof. Y. Lin Statistical NLP LING 539 (Spring 2008) More advanced course –LING 581: Advanced Computational Linguistics –required for HLT Master’s Program students

Administrivia Laboratory Exercises –To run tools and write grammars –you need access to computational facilities use your PC or Mac run Windows, Linux or MacOSX –Homework exercises

Administrivia Grading –3 homeworks –Exams a mid-term a final mix of theoretical and practical exercises

Administrivia Homeworks –Homeworks will be presented/explained in class (good chance to ask questions) –Please attempt homeworks early (then you can ask questions before the deadline) –you have one week to do the homework (midnight deadline) ( submission to me) e.g. homework comes out on Thursday, it is due in my mailbox by next Thursday midnight

Administrivia Homework Policy –You may discuss your homework with others –You must write up your homework by yourself –You must cite sources and references Code of Academic Integrity –Late homeworks are subject to points deduction –Really late homeworks, e.g. a week late, will not be accepted –Emergencies and scheduled absences: inform instructor to make alternative arrangements

Administrivia Requirements: 438 vs = classroom presentation of a selected chapter from the textbook extra credit homework and exam questions are obligatory

Administrivia Requirements: 538

Class Questionnaire I’ll pass my laptop around... –Use PhotoBooth Fill in Excel spreadsheet –Name –PhotoBooth # – –Major –Any programming expertise? –Have a laptop? –Knowledge of Linguistics? click on red button to take a picture of yourself

Part 2 Introduction

Human Language Technology (HLT)... is everywhere information is organized and accessed using language

Human Language Technology (HLT) Beginnings c (just after WWII) –Electronic computers invented for numerical analysis code breaking Grand Challenges for Computers... Killer Apps: –Language comprehension tasks and Machine Translation (MT) References –Readings in Machine Translation –Eds. Nirenburg, S. et al. MIT Press –(Part 1: Historical Perspective) Read Chapter 1 of the textbook

Human Language Technology (HLT) Cryptoanalysis Basis –early optimism [Translation. Weaver, W.] Citing Shannon’s work, he asks: “If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?”

Human Language Technology (HLT) Popular in the early days and has undergone a modern revival The Present Status of Automatic Translation of Languages (Bar-Hillel, 1951) –“I believe this overestimation is a remnant of the time, seven or eight years ago, when many people thought that the statistical theory of communication would solve many, if not all, of the problems of communication” –Much valuable time spent on gathering statistics

Human Language Technology (HLT) uneasy relationship between linguistics and statistical analysis Statistical Methods and Linguistics (Abney, 1996) –Chomsky vs. Shannon Statistics and low (zero) frequency items –Smoothing No relation between order of approximation and grammaticality Parameter estimation problem is intractable (for humans) –IBM (17 million parameters)

Human Language Technology (HLT) recent exciting developments in HLT –precipitated by progress in computers: stochastic machine learning methods storage: large amounts of training data –general available of corpora (Linguistic Data Consortium) University of Arizona Library System is a subscriber you can borrow many CDROMs of data

Human Language Technology (HLT) Killer Application?

Natural Language Processing (NLP) Computational Linguistics Question: –How to process natural languages on a computer Intersects with: –Computer science (CS) –Mathematics/Statistics –Artificial intelligence (AI) –Linguistic Theory –Psychology: Psycholinguistics e.g. the human sentence processor

Natural Language Properties which properties are going to be difficult for computers to deal with? Grammar (Rules for putting words together into sentences) –How many rules are there? 100, 1000, 10000, more … –Portions learnt or innate –Do we have all the rules written down somewhere? Lexicon (Dictionary) –How many words do we need to know? 1000, 10000, …

Computers vs. Humans Knowledge of language –Computers are way faster than humans They kill us at arithmetic and chess –But human beings are so good at language, we often take our ability for granted Processed without conscious thought Exhibit complex behavior IBM’s Deep Blue

Examples Innate Knowledge? –Which report did you file without reading? –(Parasitic gap sentence) –file(x,y) –read(u,v) x = you y = report u = x = you v = y = report and there are no other possible interpretations *the report was filed without reading

Examples Changes in interpretation John is too stubborn to talk to John is too stubborn to talk to Bill talk_to(x,y) (1) x = arbitrary person, y = John (2) x = John, y = Bill

Examples Ambiguity –Where can I see the bus stop? –stop: verb or part of the noun-noun compound bus stop –Context (Discourse or situation) –Where can I see [the [ NN bus stop]]? –Where can I see [[the bus] [ V stop]]?

Examples Ungrammaticality –*Which book did you file the report without reading? –?*Which book did you file it without reading? –* = ungrammatical –ungrammatical vs. incomprehensible

Example The human parser has quirks Ian told the man that he hired a secretary Ian told the man that he hired a story Garden-pathing: a temporary ambiguity tell: multiple syntactic frames for the verb Ian told [the man that he hired] [a story] Ian told [the man] [that he hired a secretary] Ian told the agent that he unmasked a secret

Frequently Asked Questions from the Linguistic Society of America (LSA) g/info/ling-faqs.cfm

LSA (Linguistic Society of America) pamphlet by Ray Jackendoff A Linguist’s Perspective on What’s Hard for Computers to Do … –is he right?

If computers are so smart, why can't they use simple English? Consider, for instance, the four letters read ; they can be pronounced as either reed or red. How does the machine know in each case which is the correct pronunciation? Suppose it comes across the following sentences: (l) The girls will read the paper. (reed) (2) The girls have read the paper. (red) We might program the machine to pronounce read as reed if it comes right after will, and red if it comes right after have. But then sentences (3) through (5) would cause trouble. (3) Will the girls read the paper? (reed) (4) Have any men of good will read the paper? (red) (5) Have the executors of the will read the paper? (red) How can we program the machine to make this come out right?

If computers are so smart, why can't they use simple English? (6) Have the girls who will be on vacation next week read the paper yet? (red) (7) Please have the girls read the paper. (reed) (8) Have the girls read the paper?(red) Sentence (6) contains both have and will before read, and both of them are auxiliary verbs. But will modifies be, and have modifies read. In order to match up the verbs with their auxiliaries, the machine needs to know that the girls who will be on vacation next week is a separate phrase inside the sentence. In sentence (7), have is not an auxiliary verb at all, but a main verb that means something like 'cause' or 'bring about'. To get the pronunciation right, the machine would have to be able to recognize the difference between a command like (7) and the very similar question in (8), which requires the pronunciation red.

Berkeley Parser The Berkeley Parser is the most accurate and one of the fastest parsers for a variety of languages.

Berkeley Parser l) The girls will read the paper. (reed) Verb Tags (Part of Speech Labels) VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present

Berkeley Parser (2) The girls have read the paper. (red) Verb Tags (Part of Speech Labels) VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present

Berkeley Parser (3) Will the girls read the paper? (reed) Verb Tags (Part of Speech Labels) VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present

Berkeley Parser (4) Have any men of good will read the paper? (red) Verb Tags (Part of Speech Labels) VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present

Berkeley Parser (5) Have the executors of the will read the paper? (red) Verb Tags (Part of Speech Labels) VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present

Part 3 software already installed here

Your Homework for Today Download and Install Perl –Active State Perl Install SWI-Prolog

Perl Resources –tutorials etc.

Perl Resources Google is your friend: many resources out there!

Prolog Resources Useful Online Tutorials –An introduction to Prolog (Michel Loiseleur & Nicolas Vigier) attacks.org/~boklm/prolog/ attacks.org/~boklm/prolog/ –Learn Prolog Now! (Patrick Blackburn, Johan Bos & Kristina Striegnitz) prolog-now/lpnpage.php?pageid=onlinehttp:// prolog-now/lpnpage.php?pageid=online