Statistical Methods and Linguistics - Steven Abney 1998. 09. 24. Thur. POSTECH Computer Science NLP Lab 9425021 Shim Jun-Hyuk.

Slides:



Advertisements
Similar presentations
Natural Language Processing Syntax. Syntactic structure John likes Mary PN VtVt NP VP S DetPNVtVt NP VP S Every man likes Mary Noun.
Advertisements

Language, Mind, and Brain by Ewa Dabrowska Chapter 9: Syntactic constructions, pt. 1.
Introduction: The Chomskian Perspective on Language Study.
Statistical NLP: Lecture 3
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Language, Mind, and Brain by Ewa Dabrowska Chapter 10: The cognitive enterprise.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,
Semantics and Lexicology Generativist semantics. From structuralist semantics Semantic features, components.
TRANSFORMATIONAL GRAMMAR An introduction. LINGUISTICS Linguistics Traditional Before 1930 Structural 40s -50s Transformational ((Chomsky 1957.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Lecture 1 Introduction: Linguistic Theory and Theories
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Generative Grammar(Part ii)
Language: Form, Meanings and Functions
“Language is … to be considered in two contexts: on the one hand, human system of conceptualization and perception, and on the other, the actual use of.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Natural Language Understanding
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
CSD 5100 Introduction to Research Methods in CSD Observation and Data Collection in CSD Research Strategies Measurement Issues.
Natural Language Processing (NLP) I. Introduction II. Issues in NLP III. Statistical NLP: Corpus-based Approach.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Experimental Research Methods in Language Learning Chapter 1 Introduction and Overview.
1 Chapter 4 Syntax The sentence patterns of language Part I.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Introduction to CL & NLP CMSC April 1, 2003.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Cognitive Processes Chapter 8. Studying CognitionLanguage UseVisual CognitionProblem Solving and ReasoningJudgment and Decision MakingRecapping Main Points.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.
Artificial Intelligence: Natural Language
Lecture 1 Lec. Maha Alwasidi. Branches of Linguistics There are two main branches: Theoretical linguistics and applied linguistics Theoretical linguistics.
Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.
Introduction Chapter 1 Foundations of statistical natural language processing.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
FIDELITY IN TRANSLATION AND INTERPRETATION PLAN 1.Fidelity as a phenomenon in translation 2.Verbalizing a simple idea 3.Principles of fidelity 3.1. Primary.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
First Language Acquisition
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Syntax By WJQ. Syntax : Syntax is the study of the rules governing the way words are combined to form sentences in a language, or simply, the study of.
The Computational Nature of Language Learning and Evolution 10. Variations and Case Studies Summarized by In-Hee Lee
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Using Technology to Teach Listening Skills
Child Syntax and Morphology
Grammar Module 1: Grammar: what and why? (GM1)
Language, Mind, and Brain by Ewa Dabrowska
Chapter 11: Artificial Intelligence
Statistical NLP: Lecture 3
INTRODUCTION TO LINGUISTICS 1
Reading and Frequency Lists
Natural Language Processing
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk

2 CS730B - Statistical NLP Contents o Introduction o Linguistics Review under Statistical methods Language Acquisition Language Change Language Variation o Language Structure and Performance Language Property Grammaticality and Ambiguity v. Performance Non-Linguistic Factors for Performance Grammaticality and Acceptability Grammar and Computation The Frictionless Plane, Autonomy and Isolation Holy Grail

3 CS730B - Statistical NLP Contents o How Statistics Helps Disambiguation Degrees of Grammaticality Naturalness Structure Preferences Error Tolerance Learning on the Fly Lexical Acquisition o Objections Are Stochastic Methods only for engineers? Did not Chomsky debunk all this ages ago? o Conclusion

4 CS730B - Statistical NLP Introduction o Linguistics m Computation Linguistics Performance Practical Application little concerned with human language processing Rationale by the Statistical Method m Theoretical Linguistics Competence Theoretical Research with grammars and structures concerned with human language processing o Objectives m Theoretical Background of Statistical analyses m Review in the view of Linguistics m Importance of Weighted Grammar

5 CS730B - Statistical NLP 1. Linguistics Review under Statistical Models (1) o Objective m Linguistics Issues in terms of population of grammar m General population of grammar can be usefully examined by the Statistical Models o Language Acquisition (LA) m Probabilistic(stochastic) or weighted grammar in Children’s LA Process m Co-existence and decay in grammars m Algebraic(Non-stochastic) grammar as supplementation

6 CS730B - Statistical NLP 1. Linguistics Review under Statistical Models (2) o Language Change m Change in Probability of Language Construction EX) Rule, Parameter setting m Not “Abrupt”, but “Gradual” m Statistical Co-existence and Decay “ Adult monolingual speaker ” - finally the grammar is stochastic in community o Language Variance m Dialectology Arbitrary continuum of language made by geographic distance Contact Frequency and intelligibility m Typology EX) Language Feature, Conditional Probability distributions m Statistical Modeling using the stochastic grammar

7 CS730B - Statistical NLP 2. Language Structure and Performance (1) o Language m Algebraic Properties l Idealization - Adult monolingual Speaker l theoretical syntax - Linguistics Data l Structure judgments for competence m Statistical Properties l Stochastic Model - Performance data l adjustments on structure-judgement data for “performance effects” l grammaticality and ambiguity judgments about the sentences as opposed to structure

8 CS730B - Statistical NLP 2. Language Structure and Performance (2) o Grammaticality and Ambiguity v. Performance m Example The a are of I The cows are grazing in the meadow John saw Mary Ambiguity Problem under Grammatical structures m Genuine ambiguities and Spurious ambiguities Problem Is not ungrammatical but undesired analyses case1 - elided sentence case2 - rare Usage The Problem is how to identify the correct structure form the possible. Can be solved by the use of weighted grammars in computational linguistics

9 CS730B - Statistical NLP 2. Language Structure and Performance (3) o Non-Linguistic Factors for Performance m Perception is the problem of Performance and It needs Non-Linguistic Factors with Grammaticality m Grammaticality and Acceptability perceptions of grammaticality and Ambiguity - Performance data What is “ Performance data ” - find some choice of words and context to get a clear positive judgment (Acceptability) m Grammar and Computation The Problem how can we compute the linguistic data simply and absolutely Competence v. Computation m Autonomy of syntax - not same as isolation and not be reduced to semantics m Holy Grail The larger picture and ultimate goal of Generative linguistics is to make sense of language production, comprehension, acquisition, variation, and change

10 CS730B - Statistical NLP 3. How Statistics Helps (1) o Disambiguation ( 모호성 해소 ) m Describing an algorithm to compute the correct parse among the possible m correct parse - the parse that human perceive m various statistical methods exist m 예 ) “John walks” - Context-free grammar with weights of rules o Degrees of Grammaticality m Gradations of acceptability m Degrees of error in speech production m Measure of goodness is a global measure that combine the degrees of grammaticality with naturalness and structural preference m By parameter Estimation, we can get the measure of “ degrees of grammaticality”

11 CS730B - Statistical NLP 3. How Statistics Helps (2) o Naturalness m plausibility - in the sense of selectional preferences m collocational knowledge - “how do you say it” m statistical method are applied to collocations and selectional restrictions o Structural Preference m One of the parsing strategies m longest-match preference m make an important role in the dispreference for the structure o Error tolerance m Detecting the error in sentences and select the best analysis m Primary motivations for Shannon’s noisy channel model

12 CS730B - Statistical NLP 3. How Statistics Helps (3) o Learning on the Fly m much like the error correction m to admit a space of learning operations assigning a new part of speech to a word adding a new subcategorization frame to verb, etc o Lexical Acquisition m the absolute richness of natural language grammars and lexica m primary area of application for distributional and statistical approaches to acquisition m Example of distributional Approaches acquisition of Part-of-Speech Collocation selectional restriction and ETC.

13 CS730B - Statistical NLP 4. Objections to Statistical Methods o Are Stochastic Models only for Engineers? m Are the stochastic models practically always a stopgap approximation? m With a complex deterministic system and the initial conditions we can compute the state at all time m In fact, more insight and successful than identifying every deterministic factors o What Chomsky really proves? m syntactic Structures (1957) Chomsky : grammatical( s )  P n ( s ) > E no choice for “n” and “E” P n ( s ) : best n-th order approximation to English Shannon ’ s MM : grammatical( s )  lim (n  oo) P n ( s ) > E n increase, then erroneously assigned non-zero probability decease m Handbook of Mathematical Psychology (1963)

14 CS730B - Statistical NLP 5. Conclusion o Statistical method m weighted grammars, distributional induction methods m relevant to Linguistics o Performance v. Competence m Performance is not a goal but a useful tool of Computational Linguistics m Competence is needed to understand the algebraic properties of language m Algebraic methods are inadequate for understanding the human language m The Age of Computational Linguistics using Statistical Technology