9/13/1999 JHU CS 600.465/Jan Hajic 1 Introduction to Natural Language Processing AI-Lab 2003.09.03.

Slides:



Advertisements
Similar presentations
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Advertisements

Introduction to the theory of grammar
Natural Language Processing Projects Heshaam Feili
Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
1 Course Information Parallel Computing Fall 2008.
1 Course Information Parallel Computing Spring 2010.
1 Wen-Hsiang Lu ( 盧文祥 ) Department of Computer Science and Information Engineering National Cheng Kung University 2014/02/17 Multilingual and Crosslingual.
January 12, Compiler Design Hongwei Xi Comp. Sci. Dept. Boston University.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
January 19, Compiler Design Hongwei Xi Comp. Sci. Dept. Boston University.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.
1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm.
(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 *Textbooks you need Manning, C. D., Sch ü tze, H.: Foundations of Statistical Natural Language Processing. The MIT Press ISBN [required]
9/8/20151 Natural Language Processing Lecture Notes 1.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
CS 103 Discrete Structures Lecture 01 Introduction to the Course
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
CST 229 Introduction to Grammars Dr. Sherry Yang Room 213 (503)
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
1 Introduction to Natural Language Processing ( ) Linguistic Essentials: Syntax AI-lab
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Study in Less Time & Get Better Grades
Introduction to CL & NLP CMSC April 1, 2003.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Linguistic Essentials
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
ICS 482: Natural language Processing Pre-introduction
Computational Structures Tim Sheard & James Hook Portland State University Class Preliminaries.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Language and Statistics
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
9/13/1999JHU CS / Intro to NLP/Jan Hajic1 Textbooks you need Manning, C. D., Sch ü tze, H.: Foundations of Statistical Natural Language Processing.
Natural Language Processing Chapter 1 : Introduction.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 1.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.
Instructor: Robert Hiromoto Office: CAES Room 258 Phone: (CAES) Class hours: W 7:00pm - 9:45pm (Idaho Falls) Classroom:
INTRODUCTION: WELCOME TO STAT 200 January 5 th, 2009.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Semantics ENGL 407 REEM AL-MOISHEER.
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
Basic Parsing with Context Free Grammars Chapter 13
Machine Learning in Natural Language Processing
CSCI 5832 Natural Language Processing
Language and Statistics
CS4705 Natural Language Processing
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

9/13/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing AI-Lab

9/13/1999JHU CS / Intro to NLP/Jan Hajic2 Intro to NLP Instructor: Jan Hajič (Visiting Assistant Professor) –CS Dept. JHU, office: NEB 324A –Hours: Mon 10-11, Tue 3-4 –preferred contact: by Teaching Assistant: Gideon Mann –CS Dept. JHU, office: NEB 332 –Hours: TBA Room: NEB 36, MTW 2-3 (50 mins.)

9/13/1999JHU CS / Intro to NLP/Jan Hajic3 Textbooks you need Manning, C. D., Sch ü tze, H.: Foundations of Statistical Natural Language Processing. The MIT Press ISBN [required - on order] Allen, J.: Natural Language Understanding. The Benjamins/Cummins Publishing Co ISBN [required - available] Wall, L. et al.: Programming PERL. 2nd ed. O ’ Reilly ISBN [recommended - available (main store)/on order]

9/13/1999JHU CS / Intro to NLP/Jan Hajic4 Other reading Charniak, E: –Statistical Language Learning. The MIT Press ISBN Cover, T. M., Thomas, J. A.: –Elements of Information Theory. Wiley ISBN Jelinek, F.: –Statistical Methods for Speech Recognition. The MIT Press ISBN Proceedings of major conferences: –ACL (Assoc. of Computational Linguistics) –EACL (European Chapter of ACL) –ANLP (Applied NLP) –COLING (Intl. Committee of Computational Linguistics)

9/13/1999JHU CS / Intro to NLP/Jan Hajic5 Course requirements Grade components: requirements & weights: –Class participation: 7% –Midterm: 20% –Final Exam: 25% –Homeworks (4): 48% Exams: –approx. 15 questions: mostly explanatory answers (1/4 page or so), only a few multiple choice questions

9/13/1999JHU CS / Intro to NLP/Jan Hajic6 Homeworks Homeworks: 1Entropy, Language Modeling 2Word Classes 3Classification (POS Tagging,...) 4Parsing (syntactic) Organization (little) paper-and-pencil exercises, lot of programming strict deadlines (2pm on due date); only one may be max. 5 days late; turning-in mechanism: TBA absolutely no plagiarism

9/13/1999JHU CS / Intro to NLP/Jan Hajic7 Course segments Intro & Probability & Information Theory (3) –The very basics: definitions, formulas, examples. Language Modeling (3) –n-gram models, parameter estimation –smoothing (EM algorithm) A Bit of Linguistics (3) –phonology, morphology, syntax, semantics, discourse Words and the Lexicon (3) –word classes, mutual information, bit of lexicography.

9/13/1999JHU CS / Intro to NLP/Jan Hajic8 Course segments (cont.) Hidden Markov Models (3) –background, algorithms, parameter estimation Tagging: Methods, Algorithms, Evaluation (8) –tagsets, morphology, lemmatization –HMM tagging, Transformation-based, Feature-based NL Grammars and Parsing: Data, Algorithms (9) –Grammars and Automata, Deterministic Parsing –Statistical parsing. Algorithms, parameterization, evaluation Applications (MT, ASR, IR, Q&A,...) (4)

9/13/1999JHU CS / Intro to NLP/Jan Hajic9 NLP: The Main Issues Why is NLP difficult? –many “ words ”, many “ phenomena ” --> many “ rules ” OED: 400k words; Finnish lexicon (of forms): ~ sentences, clauses, phrases, constituents, coordination, negation, imperatives/questions, inflections, parts of speech, pronunciation, topic/focus, and much more! irregularity (exceptions, exceptions to the exceptions,...) potato -> potato es (tomato, hero,...); photo -> photo s, and even: both mango -> mango s or -> mango es Adjective / Noun order: new book, electrical engineering, general regulations, flower garden, garden flower,...: but Governor General

9/13/1999JHU CS / Intro to NLP/Jan Hajic10 Difficulties in NLP (cont.) –ambiguity books: NOUN or VERB? –you need many books vs. she books her flights online No left turn weekdays 4-6 pm / except transit vehicles (Charles Street at Cold Spring) –when may transit vehicles turn: Always? Never? Thank you for not smoking, drinking, eating or playing radios without earphones. (MTA bus) –Thank you for not eating without earphones?? –or even: Thank you for not drinking without earphones!? My neighbor ’ s hat was taken by wind. He tried to catch it. –...catch the wind or...catch the hat ?

9/13/1999JHU CS / Intro to NLP/Jan Hajic11 (Categorical) Rules or Statistics? Preferences: –clear cases: context clues: she books --> books is a verb –rule: if an ambiguous word (verb/nonverb) is preceded by a matching personal pronoun -> word is a verb – less clear cases: pronoun reference –she/he/it refers to the most recent noun or pronoun (?) (but maybe we can specify exceptions) –selectional: –catching hat >> catching wind (but why not?) –semantic: –never thank for drinking in a bus! (but what about the earphones?)

9/13/1999JHU CS / Intro to NLP/Jan Hajic12 Solutions Don ’ t guess if you know: morphology (inflections) lexicons (lists of words) unambiguous names perhaps some (really) fixed phrases syntactic rules? Use statistics (based on real-world data) for preferences (only?) No doubt: but this is the big question!

9/13/1999JHU CS / Intro to NLP/Jan Hajic13 Statistical NLP Imagine: –Each sentence W = { w 1, w 2,..., w n } gets a probability P(W|X) in a context X (think of it in the intuitive sense for now) –For every possible context X, sort all the imaginable sentences W according to P(W|X): –Ideal situation: best sentence (most probable in context X) NB: same for interpretation P(W) “ ungrammatical ” sentences

9/13/1999JHU CS / Intro to NLP/Jan Hajic14 Real World Situation Unable to specify set of grammatical sentences today using fixed “ categorical ” rules (maybe never, cf. arguments in MS) Use statistical “ model ” based on REAL WORLD DATA and care about the best sentence only (disregarding the “ grammaticality ” issue) best sentence P(W) W best W worst