Labels: automation Adam Kilgarriff. Auckland 2012Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English?  Most often.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.
Advertisements

Terminology-finding in the Sketch Engine Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý, Vit Suchomel Lexical Computing Ltd., Brighton,
Finding multiwords of more than two words Adam Kilgarriff, Pavel Rychly, Vojtech Kovar, Vıt Baisa Lexical Computing Ltd; Masaryk Univ., Cz.
Corpus Processing and NLP
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University.
Sorting CMSC 201. Sorting In computer science, there is often more than one way to do something. Sorting is a good example of this!
Measuring Distance between Language Varieties Adam Kilgarriff, Jan Pomikalek, Pavel Rychly, Vit Suchomel Supported by EU Project PRESEMT.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
Corpus Creation for Lexicography Adam Kilgarriff, Michael Rundell Lexicography MasterClass, UK Elaine Ui Dhonnchadha ITE (Linguistics Institute of Ireland)
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
Labels: automation Adam Kilgarriff. Kivik 2013Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English? Keywords, already.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK.
Today Writing: using the comma –Writing task Corpus linguistics talk, Part 2 Re-organize groups –Group news discussion.
Corpus Linguistics Lexicography. Questions for lexicography in corpus linguistics How common are different words? How common are the different senese.
The Parts of the Sentence.  Every complete sentence must have at least one subject and one verb.  Although it is not necessary to have one in a sentence,
What's on the Web? The Web as a Linguistic Corpus Adam Kilgarriff Lexical Computing Ltd University of Leeds.
Simple Maths for Keywords Adam Kilgarriff Lexical Computing Ltd.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Tomaž Erjavec 1, Adam Kilgarriff 2, Irena Srdanović Erjavec 3 1 Jožef Stefan Institute, Slovenia 2 Lexical Computing Ltd. and University of Leeds, UK 3.
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
1 Linguistic evidence within and across languages, word frequency lists and language learning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Index Spiral Items # B Topics This PowerPoint will let you know the information that I expect to see on each card. REQUIRED: Each card should include.
Terminology, translation, and PRESEMT; word frequency lists and KELLY 1 Adam Kilgarriff Lexical Computing Ltd SKEW-2, March 2011Kilgarriff: PRESEMT and.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
Terminology-finding in the Sketch Engine Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý, Vit Suchomel Lexical Computing Ltd., Brighton,
CL 2005, Birmingham Web as Corpus Workshop Intro: Adam Kilgarriff 1 Web as Corpus Workshop Co-chairs: Marco Baroni Adam Kilgarriff Sebastian Hoffman.
Interface Guidelines & Principles Consider Function First Presentation Later.
Subcorpus configuration Adam Kilgarriff. Feb 2010Kilgarriff: IWSG: Subcorpora2 “you can’t get away from genre” Bonnie Weber, Keynote Lecture ICON (Indian.
SQL Select Statement IST359.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
How to copy and paste. BY Zachary Hamer. Step One First find the document or writing passage you want to copy and paste. Then RIGHT click on the item.
1 SQL Chapter 9 – 8 th edition With help from Chapter 2 – 10 th edition.
Requirements for Specification of workflows in CLARIN Scenario design driven.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
GDEX: Automatically finding good dictionary examples in a corpus.
Using the Correct Order Template Copy the presentation to your hard drive. Open the slides using slide sorter and copy slides #3, 4, and 5 for each question.
Definite and indefinite articles
Lecture 7 Summary Survey of English morphology
Making useful wordlists for ELT
Evaluating word sketches and corpora
Verbs.
Tomaž Erjavec1, Adam Kilgarriff2, Irena Srdanović Erjavec3
  30 A 30 B 30 C 30 D 30 E 77 TOTALS ORIGINAL COUNT CURRENT COUNT
Corpora, Language Technology and Maltese
Using Ss in English Rules.
(Type Answer Here) (Type Answer Here) (Type Answer Here)
Presentation transcript:

Labels: automation Adam Kilgarriff

Auckland 2012Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English?  Most often in plural? For eg English nouns  Most often used in gerund? For eg Spanish verbs

Auckland 2012Kilgarriff / Labels: automation3 Common issue for lexicographers  Ordinary cases No need to say anything in dictionary  Extreme cases (“most X”) Needs saying

Auckland 2012Kilgarriff / Labels: automation4 Not hard in principle  Given the right corpus For each word  Count, under condition 1 Eg plural instances  Count, under condition 2 Eg all instances  Compute ratio Sort all words according to ratio Words at top of list are most X

Auckland 2012Kilgarriff / Labels: automation5 In practice  Programming task  Big corpora: big and slow  Slightly different each time  Very rarely done (except Keywords in WordSmith)  Now: automated in Sketch Engine demo

Auckland 2012Kilgarriff / Labels: automation6 FindX specification file: type 1 =plural name Q1 [lempos="%s" & tag="NN2"] Q2 [lempos="%s" & tag="NN[10]"] The two queries to compare frequencies for lempos="%s" means the list we want is a list of lempos (lemma + pos) RE ^[a-z]+-n$ items matching this RegExp only (here: all-lower-case nouns)

Auckland 2012Kilgarriff / Labels: automation7 FindX specification file: type 2 =passive name HR passives human-readable name (optional) WS passive use the word sketch relation ‘passive’ RE -v$ only for items matching this RegExp (here: only verbs) (optional)