Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.

Similar presentations


Presentation on theme: "Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff."— Presentation transcript:

1 Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff

2 May 2011 Adam Kilgarriff What is language?

3 May 2011 Adam Kilgarriff What is language? In our heads

4 May 2011 Adam Kilgarriff What is language? In our heads In texts and sound signals

5 May 2011 Adam Kilgarriff What is language? In our heads In texts and sound signals Both

6 May 2011 Adam Kilgarriff Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)‏

7 May 2011 Adam Kilgarriff Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)‏ Odd method for objective science Practical problems: coverage, arbitrariness

8 May 2011 Adam Kilgarriff Methodology Study text “empiricist” (Locke, Hume)‏ Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals

9 May 2011 Adam Kilgarriff It goes against the grain What is important about a sentence? its meaning Corpus methodology: Throw away individual sentence meaning Find patterns

10 May 2011 Adam Kilgarriff Computer power Corpora bigger and bigger data sets Language technology tools lemmatizers, POS-taggers, parsers Machine learning, pattern-finding 20 years of rapid ascent

11 May 2011 Adam Kilgarriff All the linguisticses Theoretical Socio Psycho Developmental Law and Computational Contrastive Applied... linguistics

12 May 2011 Adam Kilgarriff Developmental CHILDES, TalkBank How children learn language Parents record all interactions Since 1980s Prof. Brian MacWhinney, Carnegie-Mellon Many languages Largest chunk: English, 23m words

13 May 2011 201 Adam Kilgarriff

14 May 2011 201 Adam Kilgarriff

15 May 2011 201 Adam Kilgarriff

16 May 2011 201 Adam Kilgarriff

17 May 2011 201 Adam Kilgarriff

18 May 2011 201 Adam Kilgarriff

19 May 2011 201 Adam Kilgarriff

20 May 2011 Adam Kilgarriff Language change Brown family Small but perfectly formed I m words 500 x 2000-word samples the same 15 text types Supports comparison American and British English 1931, 1961, 1991, 2006

21 May 2011 201 Adam Kilgarriff

22 May 2011 201 Adam Kilgarriff

23 May 2011 201 Adam Kilgarriff

24 May 2011 201 Adam Kilgarriff

25 May 2011 201 Adam Kilgarriff

26 May 2011 Adam Kilgarriff Language and gender When you see a dentist... What is now normal? Recent study they now the norm themself now needed despite what spellcheck says BNC (most text from 1989) 0.2/million EnTenTen (mostly 2009) 0.4/million

27 May 2011 Adam Kilgarriff Language and law Trade marks Hoover and similar trademark or generic Cases sabatier, botox, kettle chips Key evidence Do people tend to capitalize?

28 May 2011 Adam Kilgarriff English nouns: % capitalized

29 May 2011 Adam Kilgarriff Syntax and semantics

30 May 2011 201 Adam Kilgarriff

31 May 2011 201 Adam Kilgarriff

32 May 2011 Adam Kilgarriff DANTE Detailed account of English lexis Corpus-driven From word sketches Lexicographers assign to senses High precision Available at http://webdante.com

33 May 2011 Adam Kilgarriff What data shall I use?

34 May 2011 Adam Kilgarriff Think hard

35 May 2011 Adam Kilgarriff Sometimes... Just-in-time corpus from the web Use case: Translator, French-to-English Translation task volcanoes In French I understand it OK, but I'm no vulcanologist, I don't know the English terminology BootCaT, Baroni and Bernardini

36 May 2011 201 Adam Kilgarriff

37 May 2011 201 Adam Kilgarriff

38 May 2011 201 Adam Kilgarriff

39 May 2011 201 Adam Kilgarriff

40 May 2011 201 Adam Kilgarriff

41 May 2011 201 Adam Kilgarriff

42 May 2011 201 Adam Kilgarriff

43 May 2011 201 Adam Kilgarriff

44 May 2011 Adam Kilgarriff Corpora in Sketch Engine Access-to-all 60 languages All major world languages Mostly large, web-crawled Various other CHILDES, Brown,... “My corpora” BootCat and other

45 May 2011 Thank you http://www.sketchengine.co.uk Adam Kilgarriff


Download ppt "Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff."

Similar presentations


Ads by Google