The Web in Theoretical Linguistics Research: Two Case Studies Using the Linguist’s Search Engine Philip Resnik, Aaron Elkiss, Heather Taylor, and Ellen Lau University of Maryland Berkeley Linguistics SocietyFebruary 20, 2005
Did that sound ok to you? * Theje dberk eobbfid dbeonc kdoeb “a small, imperfect experiment…”
Nature of Elicitation Conventional / Binary {__,?,??,?*,*,**} Magnitude estimationContrasts Nature of Grammar Probabilistic Hard / Categorical Ordered constraints Data-oriented Schütze (1996) Cowart (1997) Bard, Robertson, and Sorace (1996) Crocker and Keller (2005) Sorace and Keller (2005)
Nature of Elicitation Source of Language Sample Naturally occurring Linguist Nature of Grammar Probabilistic Hard / Categorical Ordered constraints Data-oriented ? Corpora Part-of-speech taggers Treebanks Statistical parsers Semantic role labeling …etc.
If you build it, they will come… Manning (2003): “…it remains fair to say that these tools have not yet made the transition to the Ordinary Working Linguist without considerable computer skills.” % export TGREP_CORPUS=wsj_mrg.crp % tgrep -n __ | grep. | gzip > wsj_mrg.txt.gz % tgrep2 -C -p wsj_mrg.txt wsj_mrg.t2c.g NP ! NP | >> VP]
Roadmap Motivations The Linguist’s Search Engine Case Study 1: Psycholinguistics Case Study 2: Syntax Conclusions
Pollard and Sag (1994); discussion in Manning (2003) –(a) We consider Kim to be an acceptable candidate –(b) We consider Kim an acceptable candidate –(c) We consider Kim quite acceptable –(d) We consider Kim among the most acceptable candidates –(e) *We consider Kim as an acceptable candidate –(f) *We consider Kim as quite acceptable –(g) *We consider Kim as among the most acceptable candidates –(h) *We consider Kim as being among the most acceptable candidates A Brief Illustration of the LSE
Type an example of the structure you’re interested in. LSE generates an automatic analysis (You don’t have to agree with the analysis!) Query By Example
Use the mouse to edit the tree.
A few mouseclicks later, you have a description of the structure you’re looking for. The LSE creates the query for you.
You can choose to match all morphological forms of a word.
Hit ‘search’ and the LSE retrieves sentences whose analysis matches the structure you specified.
One more click to look at a sentence in context…
… or to see the entire Web page where it occurred.
Two Case Studies Focus in this talk: –What was the study about? –How was the LSE useful? In both cases, my co-authors were naïve users of the Linguist’s Search Engine. I didn’t discover the LSE had been useful to them until after the fact.
Case Study I: Psycholinguistics Nina Kazanina, Ellen Lau, Moti Lieberman, Colin Phillips and Masaya Yoshida, “Active Dependency Formation in the Processing of Backwards Anaphora”. 17th Annual CUNY Sentence Processing Conference, University of Maryland, College Park. March
Wh-word signals upcoming dependency formation Active processing of dependency observed filled gap effect Dependency formation constrained by grammar island constraints Early pronoun signals upcoming dependency formation Active processing of dependency observed? Dependency formation constrained by grammar? While he was watching TV, John heard the phone ring. The teacher asked what the team was laughing about __. Active Dependency Formation
Original data for testing prediction While she was cooking dinner, John listened to the radio. She was cooking dinner while John listened to the radio. Results looked good, but there was a confound! She was cooking dinner while John listened to the radio. Needed a construction where the target position is expected; otherwise processor might simply have stopped looking for target. She was cooking dinner while John listened to the radio. Principle C rules out coreference in c-commanded position, so no mismatch effect should be observed Gender mismatch effect
Active Dependency Formation Possible solution: expletive constructions It was clear to his mother that John should go. It was clear to him that John should go. No Principle C Principle C Question: does this construction really have the right properties? Options: Rely on experimenter intuition Do a pilot study Sift through a corpus Is the second clause consistently expected? Is it consistently expletive rather than referential? It was clear to his mother that John should go. It was clear to him that John should go.
Query by example: It was clear to him Becomes It AUX [clear to NP]
Active Dependency Formation Result: Verified that virtually all results of the search did involve expletive it with a following clause. Obtained reassurance in designing the follow-up study Later double-checked using an off-line completion study The LSE made it easy to start with linguists’ intuitions and find relevant evidence in naturally occurring text. The LSE also makes it easy to look for additional relevant data that may not have occurred to the experimenter.
Any adjective PP with any preposition Query by example: It AUX Adj PP that…
clear important vital manifest interesting necessary obvious
Case Study II: Syntax Heather Taylor, “Interclausal (co)dependency: the case of the comparative correlative”, Proc. Michigan Linguistics Society, October
Comparative Correlatives* The Xer …, the Yer … –Highlighted in recent debates about the UG approach –Central question: are these constructions amenable to an analysis based on UG principles, or do they present a challenge to the UG view? Central claim here: the LSE is useful regardless of which side of the debate you’re on. *A.k.a. Conditional correlatives, correlative conditionals, “more-more” constructions
Comparative Correlatives Culicover and Jackendoff (1999)Taylor (2004) IP/CP CP [the more XP] i (that) IP … t i … CP [the more XP] j (that) IP … t j … Interclausal relationships accounted for outside the syntax CP UG analysis relating CCs to conditionals Sui generis
*Ø Comparative Correlatives McCawley’s generalization (1988, 1998): Deletion of copular main verbs in CCs is sensitive to semantic properties of the subject (generic/specific) –The better an advisor, the more successful a student is –The more obnoxious Fred, the less attention you should pay is But analysis of LSE data exposes the role of: –Phonological weight of the subject –Parallelism (copula in both clauses, deletion in both clauses) casting doubt on the generalization’s validity
Comparative Correlatives *The more obnoxious Fred, the less attention you should pay to him. ?The more obnoxious Fred’s younger brother, the less attention you should pay to him. ?The longer the day’s activities are, the sleepier the campers. ?The longer the day’s activities, the sleepier the campers are. √The longer the day’s activities, the sleepier the campers. Informant judgments confirm the tendencies indicated by naturally occurring data.
Comparative Correlatives Overt then? –The hungrier Romeo gets, then the more pizza he eats. –Cf. If Romeo gets hungrier, then he eats more pizza.
Comparative Correlatives Overt then –The hungrier Romeo gets, then the more pizza he eats. –Cf. If Romeo gets hungrier, then he eats more pizza. LSE searches suggest that overt then is not anomalous. Might this support a UG account that provides a unified treatment of CCs and conditionals? One more fact to add to the theoretical debate!
Conclusions The LSE is useful to traditional linguists –Confirming/disconfirming intuitions (theory data) –Exposing a wider range of data (data theory) The LSE complements new methodological trends –Magnitude estimation, etc. The LSE is available for anyone to use – Traditional?!
Backup slides
Conclusions Chomsky (1979): “You can also collect butterflies and make many observations. If you like butterflies, that’s fine; but such work must not be confounded with research, which is concerned to discover explanatory principles of some depth and fails if it does not do so.” Einstein (1940): “Science is the attempt to make the chaotic diversity of our sense-experience correspond to a logically uniform system of thought [in which] experience must be correlated with the theoretical structure… What we call physics comprises that group of natural sciences which base their concepts on measurements…”
A Web Search Tool for the Ordinary Working Linguist Must have linguist-friendly “look and feel” Must minimize learning/ramp-up time Must permit real-time interaction Must permit large-scale searches Must allow search on linguistic criteria Must be reliable Must evolve with real use
LSE Example: Text in Parallel Translation Example: seeing how English “completive particle” usages (eat up versus simply eat, indicating a telic event) are rendered in different languages.
LSE Example: Implicit Objects Resnik (1993, 1996): –Information-theoretic model of selectional constraints –Model makes predictions with respect to implicit objects Implicit objects –John ate Ø (= John ate something edible) –*John found Ø (can’t mean John found something findable). Question from audience: –“Doesn’t your model then predict that the verb titrate should permit implicit objects?” –Options Find informants for whom titrate is in the working vocabulary Slog through corpora looking for titrate used “intransitively”
Custom collection of sentences from the Web
Active Dependency Formation Gender mismatch effect (van Gompel and Liversedge, 2003) When she wasn’t busy, the girl visited the boy very often. When she wasn’t busy, the boy visited the girl very often. Gender mismatch effect reveals active processing Can grammatical information constrain the process? she the boy * Principle C: pronoun can’t co-refer with antecedent that it c-commands. Prediction: no gender mismatch effect with c-commanded positions
More on Comparative Correlatives (see Taylor, 2004) The two clauses behave like a subordinate and matrix clause, respectively –Tag questions form on clause2 and not clause1 –Only clause2 can host subjunctive case –In German, the word order is consistent with clause1 being subordinate to matrix clause2 –In Dutch there is flexibility in the word order of clause2 characteristic of matrix clauses NPI licensed in clause1 but not in clause2 Extraction is equally permissible from both
Conditionals –Presence of then –Tag questions form on clause2 and not clause1 –NPI licensed in clause1 but not in clause2 –Extraction from both clauses –Variable binding facts “shadow” each other –Lack of Condition C binding between clauses –Codependence Each clause depends on the presence of the other The licit values of X in the “comparative strings” are determined by each other Parallelism in copula deletion
the ADJer …