The Web in Theoretical Linguistics Research: Two Case Studies Using the Linguist’s Search Engine Philip Resnik, Aaron Elkiss, Heather Taylor, and Ellen.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Semantics (Representing Meaning)
Why study grammar? Knowledge of grammar facilitates language learning
MORPHOLOGY - morphemes are the building blocks that make up words.
Topics in Cognition and Language: Theory, Data and Models *Perceptual scene analysis: extraction of meaning events, causality, intentionality, Theory of.
August 23, 2010 Grammars and Lexicons How do linguists study grammar?
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Second Language Acquisition and Real World Applications Alessandro Benati (Director of CAROLE, University of Greenwich, UK) Making.
Fundamentals: Linguistic principles
Lecture 1 Introduction: Linguistic Theory and Theories
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Phonetics, Phonology, Morphology and Syntax
Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.
Testing Writing. We have to : have representative sample of the tasks that we expect the students to perform. those task should elicit valid samples of.
Emergence of Syntax. Introduction  One of the most important concerns of theoretical linguistics today represents the study of the acquisition of language.
EFL Anthony’s model: Approach Method Technique
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
What are imperatives? Why do we care? The Solution: A brief syntactic background: Movement in X-bar theory: Paula Hagen  English Linguistics  University.
RSBM Business School Research in the real world: the users dilemma Dr Gill Green.
Linguistics, Pragmatics & Natural Grammar
Literature Review and Parts of Proposal
Relative clauses Chapter 11.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Computational Linguistics Ling 200 Spring 2006.
Concept of Power ture=player_detailpage&v=7yeA7a0u S3A.
CSD 5100 Introduction to Research Methods in CSD Observation and Data Collection in CSD Research Strategies Measurement Issues.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
Lecture 2 What Is Linguistics.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Qualitative Papers. Literature Review: Sensitizing Concepts Contextual Information Baseline of what reader should know Establish in prior research: Flaws.
SQUADS Learning Intentions - Today, I am going to address these College Readiness Standards in English: 1. Conventions of Punctuation 13-15, Topic.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Notes on Pinker ch.7 Grammar, parsing, meaning. What is a grammar? A grammar is a code or function that is a database specifying what kind of sounds correspond.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Rules, Movement, Ambiguity
Introduction to Linguistics Class # 1. What is Linguistics? Linguistics is NOT: Linguistics is NOT:  learning to speak many languages  evaluating different.
 There must be a coherent set of links between techniques and principles.  The actions are the techniques and the thoughts are the principles.
Linguistic Theory Lecture 5 Filters. The Structure of the Grammar 1960s (Standard Theory) LexiconPhrase Structure Rules Deep Structure Transformations.
FORESTUR How to work… …with this training platform? …with this methodology?
Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.
Making it stick together…
Teaching Writing.
Psychology As Science Psychologists use the “scientific method” Steps to the scientific method: - make observations - ask question - develop hypothesis.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
SYNTAX.
Levels of Linguistic Analysis
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Case Studies and Review Week 4 NJ Kang. 5) Studying Cases Case study is a strategy for doing research which involves an empirical investigation of a particular.
PSY 432: Personality Chapter 1: What is Personality?
A Linguist’s Search Engine Philip Resnik University of Maryland JHU Conference on Spatial Language and Spatial Cognition September 18, 2003.
NATURAL LANGUAGE PROCESSING
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Labov’s Principles—1972 Language in Society, Vol.1 No. 1 “ Principles ” 1.Cumulative Principle 2.The Neogrammarian Hypothesis 3.The Uniformitarian Principle.
Chapter 4 Syntax a branch of linguistics that studies how words are combined to form sentences and the rules that govern the formation of sentences.
Language, Mind, and Brain by Ewa Dabrowska
Human Computer Interaction Lecture 21,22 User Support
Writing a Literature Review
What is linguistics?.
CHAPTER 4 Designing Studies
Part I: Basics and Constituency
LING/C SC 581: Advanced Computational Linguistics
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
Traditional Grammar VS. Generative Grammar
Research Paper Step-by-step Process.
Editing Process: English 10 Spoken Language
Presentation transcript:

The Web in Theoretical Linguistics Research: Two Case Studies Using the Linguist’s Search Engine Philip Resnik, Aaron Elkiss, Heather Taylor, and Ellen Lau University of Maryland Berkeley Linguistics SocietyFebruary 20, 2005

Did that sound ok to you? * Theje dberk eobbfid dbeonc kdoeb “a small, imperfect experiment…”

Nature of Elicitation Conventional / Binary {__,?,??,?*,*,**} Magnitude estimationContrasts Nature of Grammar Probabilistic Hard / Categorical Ordered constraints Data-oriented Schütze (1996) Cowart (1997) Bard, Robertson, and Sorace (1996) Crocker and Keller (2005) Sorace and Keller (2005)

Nature of Elicitation Source of Language Sample Naturally occurring Linguist Nature of Grammar Probabilistic Hard / Categorical Ordered constraints Data-oriented ? Corpora Part-of-speech taggers Treebanks Statistical parsers Semantic role labeling …etc.

If you build it, they will come… Manning (2003): “…it remains fair to say that these tools have not yet made the transition to the Ordinary Working Linguist without considerable computer skills.” % export TGREP_CORPUS=wsj_mrg.crp % tgrep -n __ | grep. | gzip > wsj_mrg.txt.gz % tgrep2 -C -p wsj_mrg.txt wsj_mrg.t2c.g NP ! NP | >> VP]

Roadmap Motivations The Linguist’s Search Engine Case Study 1: Psycholinguistics Case Study 2: Syntax Conclusions

Pollard and Sag (1994); discussion in Manning (2003) –(a) We consider Kim to be an acceptable candidate –(b) We consider Kim an acceptable candidate –(c) We consider Kim quite acceptable –(d) We consider Kim among the most acceptable candidates –(e) *We consider Kim as an acceptable candidate –(f) *We consider Kim as quite acceptable –(g) *We consider Kim as among the most acceptable candidates –(h) *We consider Kim as being among the most acceptable candidates A Brief Illustration of the LSE

Type an example of the structure you’re interested in. LSE generates an automatic analysis (You don’t have to agree with the analysis!) Query By Example

Use the mouse to edit the tree.

A few mouseclicks later, you have a description of the structure you’re looking for. The LSE creates the query for you.

You can choose to match all morphological forms of a word.

Hit ‘search’ and the LSE retrieves sentences whose analysis matches the structure you specified.

One more click to look at a sentence in context…

… or to see the entire Web page where it occurred.

Two Case Studies Focus in this talk: –What was the study about? –How was the LSE useful? In both cases, my co-authors were naïve users of the Linguist’s Search Engine. I didn’t discover the LSE had been useful to them until after the fact.

Case Study I: Psycholinguistics Nina Kazanina, Ellen Lau, Moti Lieberman, Colin Phillips and Masaya Yoshida, “Active Dependency Formation in the Processing of Backwards Anaphora”. 17th Annual CUNY Sentence Processing Conference, University of Maryland, College Park. March

Wh-word signals upcoming dependency formation Active processing of dependency observed  filled gap effect Dependency formation constrained by grammar  island constraints Early pronoun signals upcoming dependency formation Active processing of dependency observed? Dependency formation constrained by grammar? While he was watching TV, John heard the phone ring. The teacher asked what the team was laughing about __. Active Dependency Formation

Original data for testing prediction While she was cooking dinner, John listened to the radio. She was cooking dinner while John listened to the radio. Results looked good, but there was a confound! She was cooking dinner while John listened to the radio. Needed a construction where the target position is expected; otherwise processor might simply have stopped looking for target. She was cooking dinner while John listened to the radio. Principle C rules out coreference in c-commanded position, so no mismatch effect should be observed Gender mismatch effect

Active Dependency Formation Possible solution: expletive constructions It was clear to his mother that John should go. It was clear to him that John should go. No Principle C Principle C Question: does this construction really have the right properties? Options: Rely on experimenter intuition Do a pilot study Sift through a corpus Is the second clause consistently expected? Is it consistently expletive rather than referential? It was clear to his mother that John should go. It was clear to him that John should go.

Query by example: It was clear to him Becomes It AUX [clear to NP]

Active Dependency Formation Result: Verified that virtually all results of the search did involve expletive it with a following clause. Obtained reassurance in designing the follow-up study Later double-checked using an off-line completion study The LSE made it easy to start with linguists’ intuitions and find relevant evidence in naturally occurring text. The LSE also makes it easy to look for additional relevant data that may not have occurred to the experimenter.

Any adjective PP with any preposition Query by example: It AUX Adj PP that…

clear important vital manifest interesting necessary obvious

Case Study II: Syntax Heather Taylor, “Interclausal (co)dependency: the case of the comparative correlative”, Proc. Michigan Linguistics Society, October

Comparative Correlatives* The Xer …, the Yer … –Highlighted in recent debates about the UG approach –Central question: are these constructions amenable to an analysis based on UG principles, or do they present a challenge to the UG view? Central claim here: the LSE is useful regardless of which side of the debate you’re on. *A.k.a. Conditional correlatives, correlative conditionals, “more-more” constructions

Comparative Correlatives Culicover and Jackendoff (1999)Taylor (2004) IP/CP CP [the more XP] i (that) IP … t i … CP [the more XP] j (that) IP … t j … Interclausal relationships accounted for outside the syntax CP UG analysis relating CCs to conditionals Sui generis

*Ø Comparative Correlatives McCawley’s generalization (1988, 1998): Deletion of copular main verbs in CCs is sensitive to semantic properties of the subject (generic/specific) –The better an advisor, the more successful a student is –The more obnoxious Fred, the less attention you should pay is But analysis of LSE data exposes the role of: –Phonological weight of the subject –Parallelism (copula in both clauses, deletion in both clauses) casting doubt on the generalization’s validity

Comparative Correlatives *The more obnoxious Fred, the less attention you should pay to him. ?The more obnoxious Fred’s younger brother, the less attention you should pay to him. ?The longer the day’s activities are, the sleepier the campers. ?The longer the day’s activities, the sleepier the campers are. √The longer the day’s activities, the sleepier the campers. Informant judgments confirm the tendencies indicated by naturally occurring data.

Comparative Correlatives Overt then? –The hungrier Romeo gets, then the more pizza he eats. –Cf. If Romeo gets hungrier, then he eats more pizza.

Comparative Correlatives Overt then –The hungrier Romeo gets, then the more pizza he eats. –Cf. If Romeo gets hungrier, then he eats more pizza. LSE searches suggest that overt then is not anomalous. Might this support a UG account that provides a unified treatment of CCs and conditionals? One more fact to add to the theoretical debate!

Conclusions The LSE is useful to traditional linguists –Confirming/disconfirming intuitions (theory  data) –Exposing a wider range of data (data  theory) The LSE complements new methodological trends –Magnitude estimation, etc. The LSE is available for anyone to use – Traditional?!

Backup slides

Conclusions Chomsky (1979): “You can also collect butterflies and make many observations. If you like butterflies, that’s fine; but such work must not be confounded with research, which is concerned to discover explanatory principles of some depth and fails if it does not do so.” Einstein (1940): “Science is the attempt to make the chaotic diversity of our sense-experience correspond to a logically uniform system of thought [in which] experience must be correlated with the theoretical structure… What we call physics comprises that group of natural sciences which base their concepts on measurements…”

A Web Search Tool for the Ordinary Working Linguist Must have linguist-friendly “look and feel” Must minimize learning/ramp-up time Must permit real-time interaction Must permit large-scale searches Must allow search on linguistic criteria Must be reliable Must evolve with real use

LSE Example: Text in Parallel Translation Example: seeing how English “completive particle” usages (eat up versus simply eat, indicating a telic event) are rendered in different languages.

LSE Example: Implicit Objects Resnik (1993, 1996): –Information-theoretic model of selectional constraints –Model makes predictions with respect to implicit objects Implicit objects –John ate Ø (= John ate something edible) –*John found Ø (can’t mean John found something findable). Question from audience: –“Doesn’t your model then predict that the verb titrate should permit implicit objects?” –Options Find informants for whom titrate is in the working vocabulary Slog through corpora looking for titrate used “intransitively”

Custom collection of sentences from the Web

Active Dependency Formation Gender mismatch effect (van Gompel and Liversedge, 2003) When she wasn’t busy, the girl visited the boy very often. When she wasn’t busy, the boy visited the girl very often. Gender mismatch effect reveals active processing Can grammatical information constrain the process? she the boy * Principle C: pronoun can’t co-refer with antecedent that it c-commands. Prediction: no gender mismatch effect with c-commanded positions

More on Comparative Correlatives (see Taylor, 2004) The two clauses behave like a subordinate and matrix clause, respectively –Tag questions form on clause2 and not clause1 –Only clause2 can host subjunctive case –In German, the word order is consistent with clause1 being subordinate to matrix clause2 –In Dutch there is flexibility in the word order of clause2 characteristic of matrix clauses NPI licensed in clause1 but not in clause2 Extraction is equally permissible from both

Conditionals –Presence of then –Tag questions form on clause2 and not clause1 –NPI licensed in clause1 but not in clause2 –Extraction from both clauses –Variable binding facts “shadow” each other –Lack of Condition C binding between clauses –Codependence Each clause depends on the presence of the other The licit values of X in the “comparative strings” are determined by each other Parallelism in copula deletion

the ADJer …