Some more corpora 13.10.2016.

Slides:



Advertisements
Similar presentations
Corpus Linguistics Richard Xiao
Advertisements

Disambiguation of homographic adjective and adverb forms in Croatian texts Danijela Merkler*, Daša Berović*, Željko Agić** * Department of Linguistics.
UNIT 3- Love/ Poetry
MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora Tomaž Erjavec Department of Knowledge Technologies Jožef.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
International Conference “Corpus linguistics – 2013” St. Petersburg, June 25–27, 2013 Roland Mittmann, M.A. Institute of Empirical Linguistics.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
What is a national corpus. Primary objective of a national corpus is to provide linguists with a tool to investigate a language in the diversity of types.
The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA.
New Slovene corpora within the »Communication in Slovene« project Nataša Logar BergincSimon Krek University of LjubljanaAmebis, Kamnik Faculty of Social.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
Russian National Corpus today: overview and perspectives Vladimir A. Plungian (Moscow)
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Claudia Borg, Institute of Linguistics Ray Fabri, Institute of Linguistics Albert Gatt, Institute of Linguistics Mike Rosner, Department of Intelligent.
Tools for Historical corpus research, and a corpus of Latin Barbara McGillivray Oxford University Press Adam Kilgarriff Lexical Computing Ltd.
1 Statistical NLP: Lecture 6 Corpus-Based Work. 2 4 Text Corpora are usually big. They also need to be representative samples of the population of interest.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
AP Literature and Composition: Course Overview AP Literature and Composition: Course Overview.
Corpus Lingustics 2013, Lancaster University, July 25th 2013 Digital corpora and other electronic resources for Maltese Albert Gatt Institute of Linguistics,
Learning a token classification from a large corpus (A case study in abbreviations) Petya Osenova & Kiril Simov BulTreeBank Project (
Language Data Resources About Corpora. J. Sinclair: “Language looks rather different when you look at a lot of it at once.“ P. Eisner: “Znáte jej, ten.
1/(13) Using Corpora and Evaluation Tools Diana Maynard Kalina Bontcheva
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Electronic Journals Full text journals.
Tokenization & POS-Tagging
Elena Tarasheva, PhD New Bulgarian University. Conclusions at last year’s BETA conference.
Corpus lexicography in Russia: recent trends and perspectives Maria Khokhlova St.Petersburg State University Philological Faculty
APA Style Bibliographies. Internet articles based on a print source VandenBos, G., Knapp, S., & Doe, J. (2001). Role of reference elements in the selection.
Today’s Objectives ▪ Select literature circle roles for Friday ▪ Discuss thematic connections between two works ▪ Read and analyze “Hope, Despair, and.
Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Federal Statutes Sue Lyons Rutgers Law Library - Newark.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent.
WP2: Hellenic NERC Vangelis Karkaletsis, Dimitra Farmakiotou Paris, December 5-6, 2002 Institute of Informatics & Telecommunications NCSR “Demokritos”
AMANY ALKHAYAT PSCW ENG371 INTRODUCTION TO CORPUS PROCESSING Corpus Processing Ch1.
Language Identification and Part-of-Speech Tagging
Corpus Linguistics Anca Dinu February, 2017.
POS Tagging and Morphological Analysis
Computational and Statistical Methods for Corpus Analysis: Overview
Phil Durrant Mark Brenchley Debra Myhill
Session 6. overview of annotated outline of new guidelines
Natural Language Processing (NLP)
SMPS EXHIBITOR
Response-To-Reading Flip Journals
A Statistical Model for Parsing Czech
Text Analytics Giuseppe Attardi Università di Pisa
Corpus-Based ELT CEL Symposium Creating Learning Designers
Year 2004 e-subsytems: - Register of regulations of the Republic of Slovenia - Customs (TARIC, KVOTE, e-services for legal persons)
Lemma: canonical (citation) form of a lexeme, which conventionally represents the set of related words Lexeme: the set of related words But….
The European Union case law corpus (EUCLCORP)
A Latin corpus for Sketch Engine
Inf 722 Information Organisation
Search in Token-annotated Corpora Search in Treebanks
Corpora of social media in minority Uralic languages
Statistical NLP: Lecture 6
Natural Language Processing (NLP)
CSA2050: Introduction to Computational Linguistics
A new web-based corpus management and analysis platform
LO: “Picking up the points”
BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH
Natural Language Processing (NLP)
Presentation transcript:

Some more corpora 13.10.2016

Bulgarian Bulgarian National Corpus, BulPosCor, BulSemCor http://search.dcl.bas.bg/ Bulgarian Treebank (available after registration) Dependency: http://www.bultreebank.org/dpbtb/ Morphologically tagged, HPSG: http://www.bultreebank.org/btbmorf/ On-line: Bulgarian National Reference Corpus,+BulTreebank: http://www.webclark.org/ Bulgarian National Corpus

Ukrainian http://unlc.icybcluster.org.ua/virt_unlc/ - (2005-2010), 100 000 tokens, different styles and registers, lemmatized, no morphosyntactic information http://www.mova.info/corpus.aspx?l1=209 – 13 000, 4 subcorpora, poetry, literature, folklore, journalism, official texts. Morphosyntactically annotated No KWIC, no statistics, no filtering http://corpora.informatik.uni-leipzig.de/en?corpusId=ukr_mixed_2014 http://www.corpora.heliohost.org/download.html http://corpus.leeds.ac.uk/internet2.html

Serbian Tuebinger BKS-Korpus - Bosnisch/Kroatisch/Serbisches Korpus - TUSNELDA electronic version http://tusnelda.sfb.uni-tuebingen.de/tusnelda-query.html#b8 http://parcolab.univ-tlse2.fr/en/about/resources/ http://korpus.matf.bg.ac.rs/prezentacija/paralelni.html https://catalog.ldc.upenn.edu/LDC94T5

Montenegro http://www.eiprevod.gov.me/korpus/