NooJ international Conference, Komotini, May 2010 Portability of Armenian Corpus by Nooj Anaid Donabedian & Victoria Khurshudian Institut National des.

Slides:



Advertisements
Similar presentations
Aristoteles Latinus Database
Advertisements

Why is the Times Literary Supplement Historical Archive an essential resource? The TLS is the worlds leading newspaper for cultural studies Over 100 years.
The added value information service that focuses on the European Union, the countries of Europe, and on the issues of concern to citizens, stakeholders.
An introductory guide to OED ONLINE.... Unrivalled breadth and depth Includes definitions of over 600,000 words Usage illustrated by over 2.5 million.
Introducing COMPARA The Portuguese-English Parallel Corpus Ana Frankenberg-Garcia ISLA, Lisbon & Diana Santos SINTEF, Oslo.
Teaching Translation at University Level James Dickins Prof. of Arabic University of Leeds.
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Presenting over 150 years of history, satire and humour, as recorded in the pages of Punch magazine Historical Archive
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 3: Thursday (corpora)
Content analysis in scientific narrative psychology (NARRCAT) János László Institute of Psychology of the HAS and University of Pécs.
IAEA International Atomic Energy Agency United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) September 2013, Geneva.
What is a national corpus. Primary objective of a national corpus is to provide linguists with a tool to investigate a language in the diversity of types.
IAEA International Atomic Energy Agency ICSTI 2013 Annual Members’ Meeting March 2013.
New Slovene corpora within the »Communication in Slovene« project Nataša Logar BergincSimon Krek University of LjubljanaAmebis, Kamnik Faculty of Social.
E-resources for the social sciences A brief overview of general resources for the social sciences: –Bibliographic databases –Resources for news and statistics.
Russian National Corpus today: overview and perspectives Vladimir A. Plungian (Moscow)
Mining Gazetteer Data from Digital Library Collections David Smith Perseus Project Tufts University.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
Dr. Denison MacPherson, MacIsaac, Gowans Quotations Purpose and Integration.
Online the Library Michaelmas Term 2011 Trinity College Library Dublin 1 1.
English e-Resources in Tip Use Gale Artemis Literary Sources Gale Artemis Literary Sources to search across both Literature Resource.
New Web of Science Rachel Mangan Customer Education
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Lingua inglese II Broadcast news discourse. Aims of course By the end of the course you will have gained Awareness of text features Knowledge of metalanguage.
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
1 Corpora: Annotating and Searching LING 5200 Computational Corpus Linguistics Martha Palmer.
Historical linguistics Historical linguistics (also called diachronic linguistics) is the study of language change. Diachronic: The study of linguistic.
A Web Application for Customized Corpus Delivery Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science Vassar College USA.
ELibrary Curriculum Edition The ultimate K-12 curriculum & reference resource August 2006.
Open Health Natural Language Processing Consortium (OHNLP)
The Great Vowel Shift Continued The reasons behind this shift are something of a mystery, and linguists have been unable to account for why it took place.
KLUWER JOURNALS
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
Important History Databases. America: History and Life Contains citations and abstracts to scholarly books and periodicals for United States and Canadian.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
CAMBRIDGE UNIVERSITY PRESS
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Sources (I). Sources for antiquity How do we know what happened in the Greek and Roman world? How do we know when it happened? How do we know what events.
Science Direct. Go to Search Article Data Bases (Blue Box) Scroll Down Or Click “S” Science Direct is Third.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Rutgers Multimedia Chinese Teaching System (RMCTS) MERLOT International Conference, August 7-10, 2008.
OULS WISER Humanities E- Books Hilla Wait Colin Cook Philosophy Faculty Library.
W orkshops in I nformation S kills and E lectronic Resources Electronic Resources for Humanities Johanneke Sytsema Subject consultant Linguistics
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Elena Tarasheva, PhD New Bulgarian University. Conclusions at last year’s BETA conference.
SPRINGER ONLINE
ASW Evidence for Module 3 Capturing 2 Points in Time World Languages November 2014.
Corpus lexicography in Russia: recent trends and perspectives Maria Khokhlova St.Petersburg State University Philological Faculty
Introduction to Literature What is literature? Class 2.
The study of the history of words: How meaning develops Dr. L. CastaldoGreen.
16 APRIL ARHOLIAD CYMRAEG. FORMAT OF THE EXAM. The exam is worth 30% There will be two parts: language and background history of Wales.
University of Sheffield, NLP Module 6: ANNIC Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
AMERICAN CHEMICAL SOCIETY
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Video Active Presentation Agenda: –Demonstration of videoactive.eu Frontend and Backend fiatifta.dk Copenhagen September 2008.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
JST Chinese Bibliographic Database January, 2007 Japan Science and Technology Agency (JST) Office of Science and Technology Information.
WISER: What’s new in Science SCOPUS, SCIRUS and Google Scholar Kate Williams and Juliet Ralph May 2006.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016.
WISER: Resources for Research in Classics and Ancient History Tuesday 22 May 2007 Charlotte Goodall.
itranslit (Indic Transliteration Tool)
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.

Computational and Statistical Methods for Corpus Analysis: Overview
Why is the Times Literary Supplement Historical Archive an essential resource?
Topics in Linguistics ENG 331
Using GOLD to Tracking L2 Development
Presentation transcript:

NooJ international Conference, Komotini, May 2010 Portability of Armenian Corpus by Nooj Anaid Donabedian & Victoria Khurshudian Institut National des Langues et Civilisations Orientales (INALCO), Paris

Armenian: preliminaries  an Indo-European language  right-branching  of an accusative type  typically with an SOV structure and  dominantly with an agglutinative morphology

Historical Armenia

Republic of Armenia

Periodization  prealphabetical  alphabetical (405 A.D. – up to present). 1.Old Armenian or Grabar (V-XI); 2. Middle Armenian (XII-XVI); 3.Modern Armenian (XVII – up to present) Western Eastern (based on Constantinople dialect) (based on Ararat dialect) dialects…dialects….

Objective Provide data compatibility and portability between Nooj and Eastern Armenian National Corpus (EANC) platform

What is Eastern Armenian National Corpus Corpus Technologies Michael Daniel, Victoria Khurshudian, Dmitri Levonian, Vladimir Plungian, Alexey Polyakov,Sergey Rubakov

8 Source texts PARSER Annotated texts Annotation algorithm Grammatical dictionary

EANC History Moscow, Russia  March 2006: Project Launch  July 2007: 1 st Release  May 2008: 2 nd Release  March 2009: 3 rd release

Eastern Armenian National Corpus (EANC) is: about 110 million tokens  morphological and other markup  English translations for frequent tokens  covers SEA from the mid-19th century to the present  both written and oral discourse  full-text view for over 100 Armenian classic titles  open internet access

Written Discourse  over 106 mln. tokens  510 authors ( )  1039 fiction texts (including 206 translated texts)  7858 press issues  non-fiction (scientific and other) texts

Spontaneous discourse Polylogues Task-oriented discourse TV-shows transcripts Movies … ☼ EANC oral corpus has all been recorded and transcribed by the project. Oral Discourse (3.5 mln. tokens)

13 EANC Functionality

14 Search Functionality  Token queries  Context queries  Subcorpus selection

15 Simple token queries: lexeme search wordform search gram search translation search lexeme + gram search Search Functionality

16 Advanced options for token queries:  case-sensitivity  punctuation marks  position in the sentence  wildcard (*)  logical functions (e.g. ‘or' |)  negated features  grammatical/lexical homonymy inclusion/exclusion Search Functionality

17 Subcorpus selection by:  time  author(s) / title(s)  genres  types of texts (translated vs. original)  superposition of any of the above Search Functionality

18 Display options  context expanding  ‘sort by’ (time, lexeme, wordform etc.)  Latin transliteration  glossed display  KWIC (key word in the context) Search Functionality

19 Transliterated samples:

20 Glossed samples:

21 KWIC samples:

Main Current Tasks:  Make Nooj-based Western Armenian morphological annotation compatible with EANC grammatical dictionary structure  Make EANC and Nooj Western Armenian platforms interportable  Mutual full coverage of Nooj and EANC capacities (e.g. syntactical annotation of Nooj)