Download presentation
Presentation is loading. Please wait.
Published byCarmella Waters Modified over 9 years ago
1
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds
2
Generative Lexicon Account of non-standard uses of words So: we need a dataset Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing2
3
Method Sample of words Sample of corpus instances for each Choose a dictionary Sense-tag Identify mismatches to dict senses For each Does it fit the GL model? Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing3
4
Resources Words (random sample) modest disability steering seize sack (v) sack (n) onion rabbit handbag Corpus instances between 82 and 718 for each word Total: 2276 Dictionary: HECTOR OUP/Xerox project in corpus lexicography Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing4
5
Tagging Three professional lexicographers Assign sense to each corpus instance For this exercise If anything other than 3-way agreement Re-examine 390 of 2276 cases (17%) Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing5
6
modest Any two dictionaries divide up space differently HECTOR: 9 CIDE: 3 LDOCE: 4 COBUILD: 5 tagger agreement – less than half Messy but no GL-like cases Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing6
7
Szeged, Jan 2008Kilgarriff, Global WordNet7 What is language?
8
steering 2 senses Activity: his steering was careless Mechanism: they overhauled the steering 16 re-examined, most underspecified it has the Peugeot’s steering feel One more complex case After nearly fifty years [as a bus driver] Mr. Hannis stepped down from behind the steering wheel Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing8
9
onion Two senses: plant and food 34 cases re-examined 10 bridged divide Plant the sets two inches apart to produce a good yield of medium-sized onions Others – medicine, decorative feature, dye, cliché of Frenchness It’s not all frogs legs and strings of onions in the South of France Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing9
10
sack (n) 2 x sack race One metaphor Santa Claus Ridley pulled another doubtful gift from his sack Ridley: British politician Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing10
11
sack (v) And Labour MP, Mr. Bruce George, has called for the firm to be sacked from duty at Prince Andrew’s £5 million home at Sunningwell Park near Windsor Non-standard because end-employment needs PERSON as direct object. Candidate for GL treatment Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing11
12
handbag She moved from handbags through gifts to the flower shop [handbag department in department store] Candidate for GL treatment Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing12
13
Results 2276 corpus instances 390 re-examined 41 non-standard uses 2 potentially accounted for by GL Conclusion GL will never account for a large share of non- standard word use Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing13
14
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing14 What is language?
15
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing15 What is language? In our heads
16
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing16 What is language? In our heads In texts and sound signals
17
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing17 What is language? In our heads In texts and sound signals Both
18
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing18 Methodology Study language in our heads Introspection Semantic analysis Experiments with human subjects “rationalist” (Leibniz, Chomsky) Problems: coverage, arbitrariness
19
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing19 Methodology Study text “empiricist” (Locke, Hume) Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals
20
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing20 Empiricist linguistics A new way to find out about language 20 years of rapid ascent Computers Corpora bigger and bigger data sets available Language technology tools lemmatizers, POS-taggers, parsers, machine learning for pattern finding
21
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing21 Preliminaries over What is a word sense
22
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing22 Preliminaries over What is a word sense (my PhD in 5 slides)
23
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing23 Preliminaries over What is a word sense (my PhD in 5 slides) Where do you find them?
24
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing24 Preliminaries over What is a word sense (my PhD in 5 slides) Where do you find them? Dictionaries!
25
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing25 The lexicographers They create them Methods Introspection Other dictionaries Corpus Atkins, Hanks
26
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing26 What is a word sense (1) SFIP Sufficiently frequent insufficiently predictable (a glass of) whisky x (a glass of) tequila
27
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing27 What is a word sense (2) homonymy analogy polysemy rules phraseology
28
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing28 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers
29
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing29 What is a word sense (3)
30
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing30 What is a word sense (3)
31
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing31 What is a word sense (3)
32
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing32 What is a word sense (3)
33
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing33 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting
34
Theory Hanks Norms and exploitations Task of lexicographer Record the norms Speakers may always exploit norms to say something new Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing34
35
Boring question Homonymy or polysemy We all know it’s a kline Interesting question Norm or exploitation Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing35
36
metaphor see meaning understand Norm I travelled the path From life towards art Desire the horse Depression the cart Leonard Cohen Exploitation Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing36
37
How do they do it? honeymoon Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing37
38
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing38
39
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing39
40
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing40
41
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing41
42
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing42
43
The Sketch Engine Corpus query tool Used for making dictionaries at OUP, CUP, Collins, Macmillan, Le Robert, Cornelsen, Elhuyar Foundation Also Universities Linguistic research Teaching Linguistics, also languages Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing43
44
60 languages covered Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing44
45
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing45
46
Individual licences (£4.99/month) University site licences Free trial – self register Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing46
47
Build instant corpora form the web WebBootCaT Install your corpora Compare corpora http://www.sketchengine.co.uk Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing47
48
Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing48 Thank you homonymy analogy polysemy rules phraseology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.