Download presentation
Presentation is loading. Please wait.
Published byLuke Webb Modified over 10 years ago
1
Digital Italian An overview of Italian corpora
2
A linguistic corpus: a body of texts / transcripts collected for linguistic purposes, computerized, representative for the variety studied, balanced, annotated.
3
Annotation Linguistic annotation can be useful or restrictive Extra-linguistic annotation useful for sociolinguistic research
4
Italian corpora General Written Diachronic Specialized Spoken Synchronic
5
General corporaWritten Italian Corpus e lessico di frequenza dellitaliano scritto (COLFIS) Corpus di riferimento dellitaliano scritto / Corpus dinamico dellitaliano scritto (CORIS/CODIS)
6
COLFIS - structure COLFIS (over three and a half million words) NewspapersPeriodicalsBooks Il Corriere della Sera La Repubblica La Stampa Other, arts, science and technology, cars and boats, children and youngsters, home and hobby, womens magazines, photo love story, general information, society, radio and television, sport, travels and ecology. Other, arts, children, SF, detective and spy stories, hobby and travel, classics, modern narrative, romance, essays, natural and exact sciences, human and social sciences, theatre and poetry. Economy, news of local interest, society, crime news, internal / external affairs, science, show biz and sports.
7
CORIS/CODIS – structure CORIS / CODIS (one hundred million words) PressFictionAcademic Prose Legal and Administrati ve Prose Miscella -nea Epheme- ra Newspaper, periodical, supplement Novels, short stories Human sciences, natural sciences, physics, experimental sciences Legal, bureaucratic, administrative Books on religion, travel, cookery, hobbies, etc. Letters, leaflets, instruction National, local/ specialist, non- specialist / connotated, non- connotated Italian, foreign, for adults, for children, crime, adventure, SF, women literature Books, reviews, scientific, popular history, philosophy, arts, literary criticism, law, economy, biology, etc. Books, reviews Private, public/ Printed form, electronic form
8
General corporaSpoken Italian Lessico di frequenza dellitaliano parlato (LIP) -> Bancadati dellitaliano parlato (BADIP). Archivio delle varietà dellitaliano parlato (AVIP). LABLITA
9
Spoken and written Italian: Corpora e lessici dellitaliano parlato e scritto (CLIPS) CLIPS (the spoken corpus) Radio and television speech Field recordings ReadingsTelephone speech Entertainment, informative transmissions, cultural and educational transmissions, commercials. Map task dialogues and spot the difference game. Readings by the speakers themselves or by professional dubbing actors. Conversations between a fake tour-operator and three hundred people.
10
Specialized corpora Corpus di italiano televisivo (CIT) La Repubblica
11
CIT – structure CIT Current affairs Entertain ment (games, talk-show, varieties) Commer- cials Sports newsNewscast Com- menta -ries. Play- by- play Studio broadcast. On-field broadcast. TextText. Slogans. Studio broad- cast On- field broad- cast TextHeadlines. Studio broadcast. On-field broadcast
12
Corpus di italiano televisivo
13
La Repubblica – structure La Repubblica Year1985 - 2000 GenreNews Comment TopicReligion Culture Economics Education News Politics Science Society Sport Weather Unclassified
14
La Repubblica
15
Thank you! Anne-Marie OBRETIN Mres in European Languages and Cultures University of Exeter ao231@exeter.ac.uk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.