Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.

Similar presentations


Presentation on theme: "1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex."— Presentation transcript:

1 1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex

2 May 2007 Adam Kilgarriff 2 Overview Research programme Examples: Corpus lexicography Word sketching Collocationality Thesauruses The long road

3 May 2007 Adam Kilgarriff 3 What is language?

4 May 2007 Adam Kilgarriff 4 What is language? In our heads

5 May 2007 Adam Kilgarriff 5 What is language? In our heads In texts and sound signals

6 May 2007 Adam Kilgarriff 6 What is language? In our heads In texts and sound signals Both

7 May 2007 Adam Kilgarriff 7 Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)

8 May 2007 Adam Kilgarriff 8 Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz) Odd method for objective science Practical problems: coverage, arbitrariness

9 May 2007 Adam Kilgarriff 9 Methodology Study text “empiricist” (Locke, Hume) Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals

10 May 2007 Adam Kilgarriff 10 It goes against the grain What is important about a sentence? its meaning Corpus methodology: Throw away individual sentence meaning Find patterns

11 May 2007 Adam Kilgarriff 11 Computer power Corpora bigger and bigger data sets Language technology tools lemmatizers, POS-taggers, parsers Machine learning, pattern-finding 18 years of rapid ascent

12 May 2007 Adam Kilgarriff 12 A virtuous circle Pattern finding Lexicon Linguistic processing Corpus Part-of-speech tagging Parsing Lemmatizing More data → gets richer each time round

13 May 2007 Adam Kilgarriff 13 Example: corpus lexicography - four ages

14 May 2007 Adam Kilgarriff 14 Age 1: Pre-computer Oxford English Dictionary: 20 million index cards

15 May 2007 Adam Kilgarriff 15 Age 2: KWIC Concordances From 1980 Computerised

16 May 2007 Adam Kilgarriff 16 Age 2: KWIC Concordance

17 May 2007 Adam Kilgarriff 17 Age 2: KWIC Concordances From 1980 Computerised COBUILD project was innovator the coloured-pens method

18 May 2007 Adam Kilgarriff 18 1 political association 4 person in an agreement/dispute 2 social event 5 to be party to something... 3 group of people The coloured pens method

19 May 2007 Adam Kilgarriff 19 Age 2: limitations as corpora get bigger: too much data 50 lines for a word: read all 500 lines: could read all, takes a long time 5000 lines: no

20 May 2007 Adam Kilgarriff 20 Age 3: Collocation statistics Problem: too much data - how to summarise? Solution: list of words occurring in neighbourhood of headword, with frequencies Sorted by salience

21 May 2007 Adam Kilgarriff 21 Collocation listing For collocates of save (>5 hits), window 1-5 words to right of nodeword word forestslife $1.2dollars livescosts enormousthousands annuallyface jobsestimated moneyyour

22 May 2007 Adam Kilgarriff 22 Age 4: The word sketch A corpus-derived one-page summary of a word’s grammatical and collocational behaviour

23 May 2007 Adam Kilgarriff 23 Age 4: The word sketch Large well-balanced corpus Parse to find subjects, objects, heads, modifiers etc One list for each grammatical relation Statistics to sort each list, as before

24 May 2007 Adam Kilgarriff 24 Macmillan English Dictionary For Advanced Learners Ed: Rundell, 2002

25 May 2007 Adam Kilgarriff 25 Euralex 2002

26 May 2007 Adam Kilgarriff 26 Euralex 2002 Can I have them for my language please

27 May 2007 Adam Kilgarriff 27 The Sketch Engine Input: any corpus, any language Lemmatised, part-of-speech tagged specification of grammatical relations Word sketches integrated with corpus query system Developer: Pavel Rychly, Brno

28 May 2007 Adam Kilgarriff 28 Users: Dictionary publishers Oxford UP, Collins, Chambers, Macmillan Universities Teaching, research Framenet Language teaching http://www.sketchengine.co.uk/ Self-registration for free trial account

29 May 2007 Adam Kilgarriff 29 Collocationality Which words are most ‘collocational’ Dictionary publishers Where to put ‘collocation boxes’ Language learners

30 May 2007 Adam Kilgarriff 30 VerbFreqMLE Prob x log = entropy Take2084-.469 Gain131-.169 Offer117-.157 See110-.150 Enjoy67-.104 ……… Clarify1-0.0031 ……… Total3730-3.909 Calculation of entropy for advantage (object relation).

31 May 2007 Adam Kilgarriff 31

32 May 2007 Adam Kilgarriff 32 place (17881), attention (8476), door (8426), care (4884), step (4277), advantage (3730), rise (3334), attempt (2825), impression (2596), notice (2462), chapter (2318), mistake (2205), breath (2140), hold (1949), birth (1016), living (953), indication (812), tribute (720), debut (714), button (661), eyebrow (649), anniversary (637), mention (615), glimpse (531), suicide (486), toll (472), refuge (470), spokesman (453), sigh (436), birthday (429), wicket (412), appendix (410), pardon (399), precaution (396), temptation (374), goodbye (372), fuss (366), resemblance (350), goodness (288), precedence (285), havoc (270), tennis (266), comeback (260), farewell (228), prominence (228), go-ahead (202), sip (198),

33 May 2007 Adam Kilgarriff 33 Thesaurus a resource that groups words according to similarity

34 May 2007 Adam Kilgarriff 34 Manual and automatic Manual Roget, WordNets, many publishers Automatic Sparck Jones (1960s), Lin (1998) aka distributional two words are similar if they occur in same contexts Are they comparable?

35 May 2007 Adam Kilgarriff 35 Thesauruses in NLP sparse data

36 May 2007 Adam Kilgarriff 36 Thesauruses in NLP sparse data does x go with y? don’t know, they have never been seen together New question: does x+friends go with y+friends indirect evidence for x and y thesaurus tells us who friends are “backing off”

37 May 2007 Adam Kilgarriff 37 Relevant in: Parsing PP-attachment conjunction scope Bridging anaphors Text cohesion Word sense disambiguation (WSD) Speech understanding Spelling correction

38 May 2007 Adam Kilgarriff 38 Conjunction scope Compare old boots and shoes old boots and apples

39 May 2007 Adam Kilgarriff 39 Conjunction scope Compare old boots and shoes old boots and apples Are the shoes old?

40 May 2007 Adam Kilgarriff 40 Conjunction scope Compare old boots and shoes old boots and apples Are the shoes old? Are the apples old?

41 May 2007 Adam Kilgarriff 41 Conjunction scope Compare old boots and shoes old boots and apples Are the shoes old? Are the apples old? Hypothesis: wide scope only when words are similar

42 May 2007 Adam Kilgarriff 42 (demo)

43 May 2007 Adam Kilgarriff 43 Words and word senses automatic thesauruses words

44 May 2007 Adam Kilgarriff 44 Words and word senses automatic thesauruses words manual thesauruses simple hierarchy is appealing homonyms

45 May 2007 Adam Kilgarriff 45 Words and word senses automatic thesauruses words manual thesauruses simple hierarchy is appealing homonyms “aha! objects must be word senses”

46 May 2007 Adam Kilgarriff 46 Problems Theoretical Practical

47 May 2007 Adam Kilgarriff 47 Theoretical

48 May 2007 Adam Kilgarriff 48

49 May 2007 Adam Kilgarriff 49

50 May 2007 Adam Kilgarriff 50 Wittgenstein Don’t ask for the meaning, ask for the use

51 May 2007 Adam Kilgarriff 51 Practical

52 May 2007 Adam Kilgarriff 52 Problems Practical a thesaurus is a tool if the tool organises words senses you must do WSD before you can use it WSD: state of the art, optimal conditions: 80%

53 May 2007 Adam Kilgarriff 53 Problems “To use this tool, first replace one fifth of your input with junk”

54 May 2007 Adam Kilgarriff 54 Avoid word senses

55 May 2007 Adam Kilgarriff 55 Avoid word senses This word has three meanings/senses

56 May 2007 Adam Kilgarriff 56 Avoid word senses This word has three meanings/senses This word has three kinds of use well founded empirical we can build on it

57 May 2007 Adam Kilgarriff 57 Avoid word senses This word has three meanings/senses This word has three kinds of use well founded empirical we can build on it

58 May 2007 Adam Kilgarriff 58 sorry, roget

59 May 2007 Adam Kilgarriff 59 sorry, AI

60 May 2007 Adam Kilgarriff 60 sorry, AI AI model for NLP: NLP turns text into meanings AI reasons over meanings word meanings are concepts in an ontology a Roget-like thesaurus is (to a good approximation) an ontology Guarino: “cleansing” WordNet If a thesaurus groups words in their various uses (not meanings) not the sort of thing AI can reason over

61 May 2007 Adam Kilgarriff 61 “The Future of Search” Berkeley, 4 May 2007 NLP panel Pell, Hodjat, Russell The AI dream for 50 years Interpret Search on meanings Implicit: word senses

62 May 2007 Adam Kilgarriff 62 Quine “No entity without identity” Foundations for sound science Identity conditions for word senses? No word uses? Yes

63 May 2007 Adam Kilgarriff 63 “linguistics expressions prompt for meanings rather than express meanings” Fauconnier and Turner 2003 It would be nice if … But …

64 May 2007 Adam Kilgarriff 64 A virtuous circle Pattern finding Lexicon Linguistic processing Corpus Part-of-speech tagging Parsing Lemmatizing More data → gets richer each time round Word sketching Thesaurus

65 May 2007 Adam Kilgarriff 65 The long journey from text towards meaning Raw text Pure meaning Rationalists Empiricists

66 May 2007 Adam Kilgarriff 66 The long journey from text towards meaning Raw text Pure meaning Rationalists Empiricists lemmatizer POS-tagger parser thesaurus thematic relations/frame elements

67 May 2007 Adam Kilgarriff 67 Thank you http://www.sketchengine.co.uk


Download ppt "1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex."

Similar presentations


Ads by Google