Introduction to the course Day 1 LING 681.02 Computational Linguistics Harry Howard Tulane University.

Slides:



Advertisements
Similar presentations
Natural Language Processing (or NLP) Reading: Chapter 1 from Jurafsky and Martin, Speech and Language Processing: An Introduction to Natural Language Processing,
Advertisements

Language Processing Technology Machines and other artefacts that use language.
Leksička semantika i pragmatika 5. predavanje. Ambiguity Find at least 5 meanings of this sentence: –I made her duck I cooked waterfowl for her benefit.
Introduction to Natural Language Processing A.k.a., “Computational Linguistics”
INTRODUCTION TO THE COURSE AUG. 26, DAY 1 Brain & Language LING 4110/4890/5110/7960? NSCI 4110/4891/6110 Fall 2013.
Introduction Day 1 COLQ 201 Multiagent modeling Harry Howard Tulane University.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Oct 2009HLT1 Human Language Technology Overview. Oct 2009HLT2 Acknowledgement Material for some of these slides taken from J Nivre, University of Gotheborg,
Introduction to Computer Programming I CSE 113
Course Info Course Topics and approximate Schedule Assignments and Grade Breakdown The usual Stuff including "How to fail this course" Students introduce.
Economics 1 Principles of Microeconomics Instructor: Ted Bergstrom.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
General information CSE 230 : Introduction to Software Engineering
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
COMP 14 – 02: Introduction to Programming Andrew Leaver-Fay August 31, 2005 Monday/Wednesday 3-4:15 pm Peabody 217 Friday 3-3:50pm Peabody 217.
Spring 2012 MATH 250: Calculus III. Course Topics Review: Parametric Equations and Polar Coordinates Vectors and Three-Dimensional Analytic Geometry.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
COP4020/CGS5426 Programming languages Syllabus. Instructor Xin Yuan Office: 168 LOV Office hours: T, H 10:00am – 11:30am Class website:
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
University of South Carolina Preparing for the Course Jamil A. Khan, Ph.D., P.E. Associate Professor Mechanical Engineering.
1. Human – the end-user of a program – the others in the organization Computer – the machine the program runs on – often split between clients & servers.
Introduction to the course Computer Programming through Robotics CPST 410 Summer 2009.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
9/8/20151 Natural Language Processing Lecture Notes 1.
Introduction to Natural Language Processing Heshaam Faili University of Tehran.
CS223 Algorithms D-Term 2013 Instructor: Mohamed Eltabakh WPI, CS Introduction Slide 1.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 12, 2012.
Introduction to Programming Summer 2010 Akil M. Merchant.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
COMP 111 Programming Languages 1 First Day. Course COMP111 Dr. Abdul-Hameed Assawadi Office: Room AS15 – No. 2 Tel: Ext. ??
COMP Introduction to Programming Yi Hong May 13, 2015.
Lecture 1 Page 1 CS 111 Summer 2015 Introduction CS 111 Operating System Principles.
Business Discipline Breakout Session Summer 2000 ION Conference Facilitated By: Marcy Satterwhite.
1 Computational Linguistics Ling 200 Spring 2006.
CST 229 Introduction to Grammars Dr. Sherry Yang Room 213 (503)
MAT 3724 Applied Analysis Fall 2012
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
EEL4712 Digital Design. Instructor Dr. Greg Stitt Office Hours: TBD (Benton 323) Also, by appointment.
CS355 Advanced Computer Architecture Fatima Khan Prince Sultan University, College for Women.
Introduction to CL & NLP CMSC April 1, 2003.
Natural Language Processing Daniele Quercia Fall, 2000.
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
CS 124/LINGUIST 180 From Languages to Information
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
INTRODUCTION TO THE COURSE DAY 1 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Introduction to Python Lesson 1 First Program. Learning Outcomes In this lesson the student will: 1.Learn some important facts about PC’s 2.Learn how.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
COP4610/CGS5765 Operating Systems Syllabus. Instructor Xin Yuan Office: 168 LOV Office hours: W M F 9:10am – 10:00am, or by appointments.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
A Puzzle for You. Puzzle Someone is working for you for 7 days You have a gold bar, which is segmented into 7 pieces, but they are all CONNECTED You have.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
CS151 Introduction to Digital Design Noura Alhakbani Prince Sultan University, College for Women.
The Information School of the University of Washington Information System Design Info-440 Autumn 2002.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
IMS 4212: Course Introduction 1 Dr. Lawrence West, Management Dept., University of Central Florida ISM 4212 Dr. Larry West
Introduction to the course Aug 30, Day 1 Object-oriented Programming thru Video Games TIDE 1840 Harry Howard Tulane University.
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
CSc 1302 Principles of Computer Science II
Welcome to CS 1010! Algorithmic Problem Solving.
Online Composition with Georgie Ziff
CSCI 5832 Natural Language Processing
Introduction to the course Day 1
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
COMS 161 Introduction to Computing
CIS5930: Advanced Topics in Parallel and Distributed Systems
Information Retrieval
Presentation transcript:

Introduction to the course Day 1 LING Computational Linguistics Harry Howard Tulane University

24-Aug-2009LING , Prof. Howard, Tulane University2 Objectives  This course shows you how to make a computer perform various useful tasks with natural language.  Through it you'll learn  some linguistics,  some algorithms,  some statistics,  and some computer programming in Python.  I do not require that you know anything in particular about these areas beforehand.

24-Aug-2009LING , Prof. Howard, Tulane University3 Objectives, cont.  Hopefully you'll finish the semester with an appreciation for the intricacies of modeling human languages,  plus some practical knowledge about solving linguistic problems, such as techniques for  filtering junk ,  automatically discovering different meanings of the word "run",  efficiently encoding spelling rules,  tagging words according to their part of speech,  parsing English sentences,  and automatically translating from one language to another,  among other things.

24-Aug-2009LING , Prof. Howard, Tulane University4 Objectives, cont.  Our work will be a combination of learning new algorithms, discussing linguistics, and programming useful systems that operate on real data.  It is great training if you are interested in doing natural language processing work in industry, either in a research lab or in a startup.

24-Aug-2009LING , Prof. Howard, Tulane University5 Why should you care?  Trends  An enormous amount of information is now available in machine readable form as natural language text.  Conversational agents (automated voices that answer the phone) are becoming an important form of human-computer communication.  Much of human-human communication is now mediated by computers.

24-Aug-2009LING , Prof. Howard, Tulane University6 Powerset Commercial world Lots of exciting stuff is going on…

24-Aug-2009LING , Prof. Howard, Tulane University7 Google Translate

24-Aug-2009LING , Prof. Howard, Tulane University8 Google Translate

24-Aug-2009LING , Prof. Howard, Tulane University9 Web Q/A

24-Aug-2009LING , Prof. Howard, Tulane University10 Weblog analytics  Data-mining of Weblogs, discussion forums, message boards, user groups, and other forms of user generated media  Product marketing information  Political opinion tracking  Social network analysis  Buzz analysis (what’s hot, what topics are people talking about right now).

24-Aug-2009LING , Prof. Howard, Tulane University11 Web analytics

24-Aug-2009LING , Prof. Howard, Tulane University12 Intended audience  Students of  linguistics,  cognitive science,  psychology,  neuroscience,  mathematics,  and any other discipline with an interest in how to process natural language by computer.

24-Aug-2009LING , Prof. Howard, Tulane University13 Outcomes  For you to demonstrate how well you have attained the objectives, you will perform the following tasks:  Take a quiz or turn in a project almost every week, usually on Monday. [11-1 * 7.5% = 75%]  No quiz/project can be accepted late.  Even though these look like a lot of small grades, missing just one lowers your final grade almost an entire letter, as an unfortunate few of my students have found out the hard way.  If you know ahead of time that you will miss a quiz/project, send me an beforehand, and I will excuse you with no penalty.  Present a final project to the class on the final exam day (Dec 14) and turn in a report of your project within two days. [25%]  This may be a group effort, but the entire group will receive the same grade.

24-Aug-2009LING , Prof. Howard, Tulane University14 Participation  Note that there is no credit for class participation,  but I will change any high Y- into a low X+ if I notice you participating in class.

24-Aug-2009LING , Prof. Howard, Tulane University15 Prerequisites  There aren't any.  I do not take anything for granted and so will explain all background information, or at least suggest sources where you can find it on your own.

24-Aug-2009LING , Prof. Howard, Tulane University16 Code of academic integrity “The integrity of Newcomb-Tulane College is based on the absolute honesty of the entire community in all academic endeavors. As part of the Tulane University community, students have certain responsibilities regarding work that forms the basis for the evaluation of their academic achievement. Students are expected to be familiar with these responsibilities at all times. No member of the university community should tolerate any form of academic dishonesty, because the scholarly community of the university depends on the willingness of both instructors and students to uphold the Code of Academic Conduct. When a violation of the Code of Academic Conduct is observed it is the duty of every member of the academic community who has evidence of the violation to take action. Students should take steps to uphold the code by reporting any suspected offense to the instructor or the associate dean of the college. Students should under no circumstances tolerate any form of academic dishonesty.” For further information, point your browser at

24-Aug-2009LING , Prof. Howard, Tulane University17 Students with disabilities  Students with disabilities who need academic accommodation should:  Contact and register with the Office of Disability Services (ODS). For more information, visit the ODS website at websitehttp://  Bring official notice to me from the ODS indicating that you need academic accommodation. This should be done within the first week of class.

24-Aug-2009LING , Prof. Howard, Tulane University18 Electronic communications   I will send you on a regular basis – you must check your on a regular basis!  If you want to use a non-Tulane address, me a message to that effect from the address.  I will record and podcast every class.  The mp3 files will be available on the course website above.  I will post my PowerPoint presentation to the course website after every class.

24-Aug-2009LING , Prof. Howard, Tulane University19 The textbooks  Speech and Language Processing, 2e, (2008) by Daniel Jurafsky and James H. Martin [SLP] Speech and Language Processing, 2e, (2008) by Daniel Jurafsky and James H. Martin  Natural Language Processing with Python, 1e, (2009) by Steven Bird, Ewan Klein, and Edward Loper [NLPP] Natural Language Processing with Python, 1e, (2009) by Steven Bird, Ewan Klein, and Edward Loper  Free at

24-Aug-2009LING , Prof. Howard, Tulane University20 Natural Language Toolkit  The choice of Python as programming language for the course was motivated by the availability of the excellent tools in the Natural Language Toolkit ( which are programmed in Python.  As well as the just-published textbook that goes with it.  The authors of the NLTK choose Python, in turn, for the ease with which it lets you create NLP applications.

24-Aug-2009LING , Prof. Howard, Tulane University21 Python for beginners 1  Python for Software Design: How to Think Like a Computer Scientist. $39  Think Python: How to Think Like a Computer Scientist. Free from: 

24-Aug-2009LING , Prof. Howard, Tulane University22 Python for beginners, cont.  3e, $35.  4e released Oct 2

24-Aug-2009LING , Prof. Howard, Tulane University23 Python for those who know another language   Everyone should read "Introduction".

24-Aug-2009LING , Prof. Howard, Tulane University24 If you really want to use Perl (and Prolog)  An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German (2006) by Pierre Nugues  Free from library through SpringerLink: proxy.tulane.edu:2048/content/ m34655/?p= d897b4a7 09b8cb0af2352f96a&pi=0 proxy.tulane.edu:2048/content/ m34655/?p= d897b4a7 09b8cb0af2352f96a&pi=0  But see NLPP Preface.Preface

24-Aug-2009LING , Prof. Howard, Tulane University25 Schedule of readings and class preparation  You should come to class having completed the assignment for that day listed in the schedule.  We will spend the class going over the exercises in the assignment, answering questions that may have come up in the readings, and perhaps doing new exercises.  We will cover about 15 pages a day in SLP, plus a varying number of exercises.  This could take a considerable amount of time.

24-Aug-2009LING , Prof. Howard, Tulane University26 Computers  If you have a laptop, you will probably want to bring it to class.

24-Aug-2009LING , Prof. Howard, Tulane University27 Final exam day (Mon, Dec 14)  There is no final exam, but you must present your final project to the class on the final exam day.  You CANNOT leave town before then!  Tell your parents NOW!  You are hereby warned.  Do not tell me at the end of the semester that your parents bought you a ticket home without knowing.

24-Aug-2009LING , Prof. Howard, Tulane University28 Aesthetics  I know my slides are ugly and boring, but that is so that they print accurately.  They only use two fonts, Arial Bold and Times New Roman.

24-Aug-2009LING , Prof. Howard, Tulane University29 Contact  Prof. Harry Howard  howard at tulane dot edu  (voice mail 24 hours a day)  Newcomb Hall 322-D  Office hours: T 3-5, W 4-5 and by appointment (the link goes to my home page, which displays my Google calendar)by appointment

24-Aug-2009LING , Prof. Howard, Tulane University30 Questions? ?

24-Aug-2009LING , Prof. Howard, Tulane University31 What is the name of this course?

Speech & Language Processing §1. Introduction

24-Aug-2009LING , Prof. Howard, Tulane University33 Psychology  It should be noted that much recent research uses psychologically and even neurologically plausible algorithms for learning patterns from natural language texts,  so that we will emphasize the acquisition of linguistic knowledge, temporal processing, and the relation between perception and grammar/vocabulary.

24-Aug-2009LING , Prof. Howard, Tulane University34 Natural Language Processing  We’re going to study what goes into getting computers to perform useful and interesting tasks involving human languages.  We are also concerned with the insights that such computational work gives us into human processing of language.

24-Aug-2009LING , Prof. Howard, Tulane University35 Major topics of book I. Words: §2-6 II. Speech: §7-11 III. Syntax: §12-16 IV. Meaning: §17-20 V. Discourse: §21 VI.Applications exploiting each: §22-25

24-Aug-2009LING , Prof. Howard, Tulane University36 Applications  First, what makes an application a language processing application (as opposed to any other piece of software)?  An application that requires the use of knowledge about human language  Example: Is Unix wc (word count) an example of a language processing application?

24-Aug-2009LING , Prof. Howard, Tulane University37 Applications  Word count?  When it counts words: Yes  To count words you need to know what a word is. That’s knowledge of language.  When it counts lines and bytes: No  Lines and bytes are computer artifacts, not linguistic entities

24-Aug-2009LING , Prof. Howard, Tulane University38 Big applications  Question answering  Conversational agents  Summarization  Machine translation

24-Aug-2009LING , Prof. Howard, Tulane University39 Big applications  These kinds of applications require a tremendous amount of knowledge of language.  Consider the interaction with HAL the computer from 2001: A Space Odyssey, on the next slide.

24-Aug-2009LING , Prof. Howard, Tulane University40 HAL from 2001  Dave: Open the pod bay doors, Hal.  HAL: I’m sorry Dave, I’m afraid I can’t do that.

24-Aug-2009LING , Prof. Howard, Tulane University41 What’s needed?  Speech recognition and synthesis  Knowledge of the English words spoken  What they mean  How groups of words clump  What the clumps mean  Dialog  It is polite to respond, even if you’re planning to kill someone.  It is polite to pretend to want to be cooperative (I’m afraid, I can’t…)

24-Aug-2009LING , Prof. Howard, Tulane University42 Caveat  NLP has an AI aspect to it.  We often deal with ill-defined problems.  We don’t often come up with exact solutions/algorithms.  We can’t let either of those facts get in the way of making progress.

24-Aug-2009LING , Prof. Howard, Tulane University43 Course material  We’ll be intermingling discussions of:  Linguistic topics  E.g. Morphology, syntax, discourse structure  Formal systems  E.g. Regular languages, context-free grammars  Applications  E.g. Machine translation, information extraction

24-Aug-2009LING , Prof. Howard, Tulane University44 Topics: Linguistics  Word-level processing  Syntactic processing  Lexical and compositional semantics  Discourse processing  Dialogue structure

24-Aug-2009LING , Prof. Howard, Tulane University45 Topics: Techniques  Finite-state methods  Context-free methods  Augmented grammars  Unification  Lambda calculus  First order logic  Probability models  Supervised machine learning methods

24-Aug-2009LING , Prof. Howard, Tulane University46 Topics: Applications  Small  Spelling correction  Hyphenation  Medium  Word-sense disambiguation  Named entity recognition  Information retrieval  Large  Question answering  Conversational agents  Machine translation  Stand-alone  Enabling applications  Funding/Business plans

24-Aug-2009LING , Prof. Howard, Tulane University47 Categories of knowledge  Phonology  Morphology  Syntax  Semantics  Pragmatics  Discourse Each kind of knowledge has associated with it an encapsulated set of processes that make use of it. Interfaces are defined that allow the various levels to communicate. This usually leads to a pipeline architecture.

24-Aug-2009LING , Prof. Howard, Tulane University48 Ambiguity  Computational linguists are obsessed with ambiguity  Ambiguity is a fundamental problem of computational linguistics  Resolving ambiguity is a crucial goal

24-Aug-2009LING , Prof. Howard, Tulane University49 Ambiguity  Find at least 5 meanings of this sentence:  I made her duck

24-Aug-2009LING , Prof. Howard, Tulane University50 Ambiguity  Find at least 5 meanings of this sentence:  I made her duck  I cooked waterfowl for her benefit (to eat)  I cooked waterfowl belonging to her  I created the (plaster?) duck she owns  I caused her to quickly lower her head or body  I waved my magic wand and turned her into undifferentiated waterfowl

24-Aug-2009LING , Prof. Howard, Tulane University51 Ambiguity is pervasive  I caused her to quickly lower her head or body  Lexical category: “duck” can be a N or V  I cooked waterfowl belonging to her.  Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun  I made the (plaster) duck statue she owns  Lexical Semantics: “make” can mean “create” or “cook”

24-Aug-2009LING , Prof. Howard, Tulane University52 Ambiguity is pervasive  Grammar: Make can be:  Transitive: (verb has a noun direct object)  I cooked [waterfowl belonging to her]  Ditransitive: (verb has 2 noun objects)  I made [her] (into) [undifferentiated waterfowl]  Action-transitive (verb has a direct object and another verb)  I caused [her] [to move her body]

24-Aug-2009LING , Prof. Howard, Tulane University53 Ambiguity is pervasive  Phonetics!  I mate or duck  I’m eight or duck  Eye maid; her duck  Aye mate, her duck  I maid her duck  I’m aid her duck  I mate her duck  I’m ate her duck  I’m ate or duck  I mate or duck

24-Aug-2009LING , Prof. Howard, Tulane University54 Dealing with ambiguity  Four possible approaches : 1. Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. 2. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.

24-Aug-2009LING , Prof. Howard, Tulane University55 Dealing with ambiguity 3. Probabilistic approaches based on making the most likely choices 4. Don’t do anything, maybe it won’t matter 1. We’ll leave when the duck is ready to eat. 2. The duck is ready to eat now.  Does the “duck” ambiguity matter with respect to whether we can leave?

24-Aug-2009LING , Prof. Howard, Tulane University56 Models and algorithms  By models we mean the formalisms that are used to capture the various kinds of linguistic knowledge we need.  Algorithms are then used to manipulate the knowledge representations needed to tackle the task at hand.

24-Aug-2009LING , Prof. Howard, Tulane University57 Models  State machines  Rule-based approaches  Logical formalisms  Probabilistic models

24-Aug-2009LING , Prof. Howard, Tulane University58 Algorithms  Many of the algorithms that we’ll study will turn out to be transducers; algorithms that take one kind of structure as input and output another.  Unfortunately, ambiguity makes this process difficult. This leads us to employ algorithms that are designed to handle ambiguity of various kinds

24-Aug-2009LING , Prof. Howard, Tulane University59 Paradigms  In particular..  State-space search  To manage the problem of making choices during processing when we lack the information needed to make the right choice  Dynamic programming  To avoid having to redo work during the course of a state- space search  CKY, Earley, Minimum Edit Distance, Viterbi, Baum-Welch  Classifiers  Machine learning based classifiers that are trained to make decisions based on features extracted from the local context

24-Aug-2009LING , Prof. Howard, Tulane University60 State space search  States represent pairings of partially processed inputs with partially constructed representations.  Goals are inputs paired with completed representations that satisfy some criteria.  As with most interesting problems the spaces are normally too large to exhaustively explore.  We need heuristics to guide the search  Criteria to trim the space

24-Aug-2009LING , Prof. Howard, Tulane University61 Dynamic programming  Don’t do the same work over and over.  Avoid this by building and making use of solutions to sub-problems that must be invariant across all parts of the space.

Next time Download and install Python and NLTK See info at > Downloadhttp:// SLP 2.1 & Ex ; NLPP & Ex