Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale – Pisa Andrea Bozzi NEH/CNR Meeting Washington DC October 5, 2007
Presentation contents 1.An EU supported system for Greek papyrology 2.A special application for browsing and searching demotic documents on ostraka; 3.A philological workstation for digital medieval manuscripts; 4.CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts; 5.How to integrate all these modules in a web- based open source application.
Presentation contents
The philological workstation: image and text transcription
Image segmentation and semi-automatic word linking
Annotations and critical apparatus
Wordforms list and specific indexes
The web philological workstation to manage documents of the Istituto Papirologico Vitelli in Florence (restricted use)
Presentation contents Andrea Bozzi NEH/CNR Meeting, Washington October 5, 2007
OMM 1381: E. Bresciani, S. Pernigotti, M.C. Betrò, Ostraka demotici da Narmuti, Pisa, 1983, pp ; OMM 300: Gallo P., Ostraca demotici e ieratici dall’archivio bilingue di Narmouthis, Pisa, 1997, pp ; OMM 393: R. Pintaudi, P.J. Sijpesteijn, Ostraka greci da Narmuthis, Pisa, 1993, p. 40. Special system for teaching and retrieving linguistic information from demotic texts on ostraka
L’archivio delle immagini digitali e la tabella dei segni demotici
Research results: see the blue parts (arrow) where the selected symbol has been found
Presentation contents Andrea Bozzi NEH/CNR Meeting, Washington October 5, 2007
Textual criticism for medieval manuscripts Link to the list of collated sources
Selection of the variant eixens Evaluation of the variant reading in the collated source
Recording of the variant Eixens in the Critical apparatus
Variants search in different ancient printed editions of the same work Link to the list of collated books
Image of the corresponding page
Presentation contents Andrea Bozzi NEH/CNR Meeting, Washington October 5, 2007
Lemmatization results (C. Sallustius Crispus, De coniuratione Catilinae, 1-2)
Lemmatization results of selected wordforms
Presentation contents Andrea Bozzi NEH/CNR Meeting, Washington DC October 5, 2007
Pinakes Aim: web-based open source application to manage cultural heritage historical data in digital format. Partners: –Fondazione Rinascimento Digitale, Florence; –Istituto e Museo della Storia della Scienza, Florence; –Ministero per i Beni Culturali, Rome –CNR, Istituto di Linguistica Computazionale, Pisa
Technology –Programming language: JAVA (Jdk1.5) –Servlet Engine: Tomcat 5.5.x + Apache HTTP Connectors. –Web server: Apache httpd server 2.2.x. –Web Applications Framework: Jakarta Struts –Web Service Framework: Apache Axis 1.4 –Database Engine: Postgres 8.1 –Programming environment: NetBeans –Final development: Hibernate
Standards DCMI (Dublin Core Metadata Initiative) TEI (Text Encoding Initiative) OWL (Ontology Web Language) RDF-XML (Resource Description Framework) SPARQL (Query Language fo RDF) UTF8 (Unicode Transformation Format).