A new web-based corpus management and analysis platform

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Part Two: Using Xaira to explore corpora Richard Xiao
IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Lesson 15 Presentation Programs.
CLARIN licensing schemes Anje Müller Gjesdal & Gunn Inger Lyse, University of Bergen.
ICAME in CLARIN - a software demo of Corpuscle Knut Hofland Uni Research Computing Bergen, Norway ICAME 35, Nottingham.
Discovering Babel : How to make your language resources discoverable.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
LUDOS Leeds University Digital ObjectS save it share it discover 7 th January 2009.
Using Corpora in Linguistics
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Data Grid: GRASP Mike Smorul. Grid Retrieval and Search Platform Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the.
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) Definition of Digital Libraries.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Research methods in corpus linguistics Xiaofei Lu.
Marie-Luce Bourguet Projects in the areas of: Multimedia processing Multimedia / Web design.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
CLARINO WP2 National Registry and Long- Term Archiving Freddy Wetjen and Oddrun Pauline Ohren National Library of Norway Bergen, 12. September 2013.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
The S&I Tools & Repository April 12 th, S&I Tools and Repository Agenda: siframework.org S&I Repository repository.siframework.org.
Development of a virtual knowledge network Geraldine Velandria Social Affairs Officer Division for Gender Affairs 1 December 2010.
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
IST Programme - Key Action III Semantic Web Technologies in IST Key Action III (Multimedia Content and Tools) Hans-Georg Stork CEC DG INFSO/D5
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Remote Usability Testing S&I Framework Browser: Overview.
英 3B 戴偲婷. WConcord is a fast and easy to use concordancer for unlimited amounts of text. It allows the user to load multiple plain text files (.txt)
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Pedagogic Corpora for Content & Language Integrated Learning Applied English Linguistics Group Tübingen This project has been funded with support from.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Rencontres TEI Council Lyon 2009 Serge Heiden ICAR Laboratory / Lyon University Council, ENS-LSH, Lyon (France), 1 April 2009.
SCALING OUT FOR EXTREME SCALE CORPUS DATA MATTHEW COOLE, PAUL RAYSON & JOHN MARIANI
Is This Website A Useful Resource? Helpful Tips. A Useful Website Has... Quality, depth and usefulness of content clear statement of the content, including.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CLARIN and CLARINO resources Knut Hofland Uni Research Computing Bergen, Norway Workshop ICAME 37, Hong Kong,
XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent.
PDF Recovery Tool Fix Portable Document File Format.
Joel Priestley, Text Laboratory Oxford, April 2016
Corpuscle Knut Hofland Uni Research Computing Bergen, Norway
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Working meeting of WP4 Task WP4.1
Visual Information Retrieval
Introduction Multimedia initial focus
AntConc is a freeware, multiplatform of application suitable for all types of users
Computational and Statistical Methods for Corpus Analysis: Overview
Topics in Linguistics ENG 331
CHAPTER 8 Multimedia Authoring Tools
Nordic CLARIN Network Bente Maegaard University of Copenhagen 11 December 2017.
VI-SEEM Data Repository
Thanks to Bill Arms, Marti Hearst
Tomaž Erjavec Dept. Of Knowledge Technologies Jožef Stefan Institute
CLARIN ERIC and the science cloud
The European Union case law corpus (EUCLCORP)
I UNDERSTAND CONCEPTS OF MULTIMEDIA
Using GOLD to Tracking L2 Development
ftp://ftp.mrc-lmb.cam.ac.uk/mosflm
Introduction to Search Engines
Chapter 31 - The Global Digital Library
ME 123 Computer Applications I Lecture 38: More on HTML 5/20/03
Presentation transcript:

A new web-based corpus management and analysis platform Analysis functionality Powerful query syntax with textual and graphical queries Wordlists, concordances and collocations Searchable manual corpus annotation Distribution statistics, showing the frequencies of query results relative to chosen parameters, including annotations http://clarino.uib.no/corpuscle Paul Meurer, Uni Research Computing, Bergen, Norway, paul.meurer@uni.no Corpus management Support for hierarchically structured data, parallel corpora and multimedia data (audio, video, images aligned with text) Handling of large corpora (> 2 billion tokens) CLARIN integration, harvestable metadata, persistent identifiers so corpora are sustained and can be easily found Federated login (EduGAIN, OpenIdP) with fine-grained authorization Query Corpus Metadata and Overview Basic search Advanced search Design advanced query with graphical tool Concordance Distribution Collocations Annotation ICAME has been distributing corpora since 1979. This will now continue using Corpuscle: ICAME resources are being made available for on-line analysis, and for download. The further development of Corpuscle and the ongoing integration of ICAME resources have been made possible through CLARINO – the Norwegian part of CLARIN (Common Language Resources and Technology Infrastructure).