Download presentation
Presentation is loading. Please wait.
1
USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA
2
USP workshop First steps Get a username and password You will receive one automatically
3
USP workshop
5
Working with the Corpógrafo Corpógrafo is a suite of integrated tools for INDIVIDUAL or GROUP research All research done ONLINE Each username/password = separate space on our server At present > anyone can work with it using 10 MB space for FREE BUT - you get an empty space + tools + tutorial!
6
USP workshop Help Files Introdução à utilização do Corpógrafo - um pequeno tutorial A tutorial – to be translated into English – describing the whole process of terminiology research using the Corpógrafo. Available in PDF.Introdução à utilização do Corpógrafo - um pequeno tutorial Corpógrafo Roadmap In English and Portuguese – a panoramic view of the Corpógrafo and how it works. Available in PDF.Corpógrafo Roadmap The Corpógrafo in Easy Stages In English and Portuguese – User’s guide to the Corpógrafo and FAQ. Available in PDF.The Corpógrafo in Easy Stages Also Note > on entry page there is a Glossary of terms and instructions PT > EN
7
USP workshop File Manager Area where each individual or group can: –upload texts to space on server –convert various text formats to.txt –‘clean’ them of unnecessary material –check tokenization and sentence divisions –register full information on source, domain and text type –group – and re-group - texts into corpora
8
USP workshop File Manager 1. Files >List Files on Server >Add Files >Add Files from URL (Experimental!) 2. Corpora > List Corpora > Compile New Corpus
9
USP workshop
11
EXTEX Tool for converting file formats to.txt at: http://poloclup.linguateca.pt/ferramentas
12
USP workshop
17
General corpus analysis Corpora analysis area: Concordancing tools for regular expressions –at sentence level –KWIC concordancing –Collocations N-gram tool –Case-sensitive –Alphabetical or frequency ordering
18
USP workshop
22
Corpora + TDB Choose corpus Choose related TDB = All terms, examples, definitions extracted from corpus (semi) automatically transferred to TDB = All metadata on texts in corpus can be automatically transferred to TDB
23
USP workshop Term extraction N-grams –Unfiltered –Filtered with restrictions on term in PT,EN,FR,IT,ES,DE –Filtered with restrictions on term and context in PT,EN,FR,IT,ES,DE –Singular + plural terms can be combined –Existing terms in TDB need not appear
24
USP workshop
25
Term selection from n/grams Consultation of list of n-grams Check term status of each n-gram via underlying concordances Check sources Send to TDB
26
USP workshop
29
Search for definition candidates Already possible via TDB Under development Research area for Mestrado dissertations and bolseiros
30
USP workshop TDB - Terminology database Databases are designed to be multilingual –Terms listed alphabetically + language tag –General data –Morphological data –Source metadata: Authors, texts etc –Definitions + search for candidates –Translation equivalents –Semantic relations
31
USP workshop
43
Future developments – general policy General testing and improvement Development of new ideas or functions – using isomorphic relationships between researchers’ needs and our possibilities Coordination of individual corpus projects into bigger projects, when possible or necessary
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.