Digitisation and Access to Archival Collections: A Case Study of the Sofia Municipal Government (1878 – 1879) Maria Nisheva-Pavlova, Pavel Pavlov Faculty.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

METS: An Introduction Structuring Digital Content.
Designing Multimedia with Fuzzy Logic Enrique Diaz de Leon * Rene V. Mayorga ** Paul D. Guild *** * ITESM, Guadalajara Campus, Mexico ** Faculty of Engineering,
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Service-based architecture for personalized and adaptive access to the knowledge in digital library Desislava Paneva Institute of Mathematics and Informatics.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Knowledge-Based Search in Collections of Digitized Manuscripts: First Results Pavel Pavlov, Maria Nisheva-Pavlova Faculty of Mathematics and Informatics,
EAD in A2A Bill Stockting, Senior Editor A2A and EAD Working Group: Central Archives of Historical Records, Warsaw, 26 April 2003.
Adaptability of learning objects by appropriate knowledge representation Anastas Misev Institute of Informatics Faculty of Natural Science and Mathematics.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
University of Jyväskylä – Department of Mathematical Information Technology Computer Science Teacher Education ICNEE 2004 Topic Case Driven Approach for.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Guest Lecture LIS 656, Spring 2011 Kathryn Lybarger.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Towards Online Accessibility of Valuable Phenomena of the Bulgarian Folklore Heritage Radoslav Pavlov 1 Konstantin Rangochev 1 Desislava Paneva-Marinova.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Galina Bogdanova, Konstantin Rangochev, Desislava Paneva-Marinova, Nikolay Noev Institute of Mathematics and Informatics, Bulgarian Academy of Sciences.
ONTOLOGICAL MODEL OF THE KNOWLEDGE IN FOLKLORE DIGITAL LIBRARY Desislava Paneva Institute of Mathematics and Informatics – Bulgarian Academy of Sciences.
Methods and Tools Supporting the Lifecycle of Rich Digital Content Maria Nisheva-Pavlova, Pavel Pavlov Faculty of Mathematics and Informatics, Sofia University.
Information Technologies for Presentation of Bulgarian Folk Songs with Music, Notes and Text in a Digital Library Lozanka Peycheva, Nikolay Kirov, Maria.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
Aligning library-domain metadata with the Europeana Data Model Sally CHAMBERS Valentine CHARLES ELAG 2011, Prague.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Introduction to metadata
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Overviews of the Library of Texas & ZLOT Project Dr. William E. Moen Principal Investigator.
Lesson No:12 Introduction to Internet CHBT-01 Basic Micro process & Computer Operatio.
Page 1 Development of Metadata System at Croatian Bureau of Statistics Development of Metadata System at Croatian Bureau of Statistics Presented by Maja.
The AECID digital library and its possibilities to contribute to sustainable international development Araceli GARCÍA IFLA Satellite Conference. August.
EAD 101: An Introduction to Encoded Archival Description XML and the Encoded Archival Description: Providing Access to Collections Oregon Library Association.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Presented by: Amy Carson, Trisha Hansen and Jonathan Sears.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Radoslav Pavlov, Galina Bogdanova, Desislava Paneva- Marinova, Todor Todorov, Konstantin Rangochev
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
7th Annual Hong Kong Innovative Users Group Meeting
Utility of an OAI Service Provider Search Portal
Prepared by: Galya STATEVA, Chief expert
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Active Data Management in Space 20m DG
VI-SEEM Data Repository
Metadata to fit your needs... How much is too much?
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Presentation transcript:

Digitisation and Access to Archival Collections: A Case Study of the Sofia Municipal Government (1878 – 1879) Maria Nisheva-Pavlova, Pavel Pavlov Faculty of Mathematics and Informatics, Sofia University and Institute of Mathematics and Informatics, Bulgarian Academy of Sciences Nikolay Markov, Maya Nedeva Nikolay Markov, Maya Nedeva General Department of Archives at the Council of Ministers of Republic of Bulgaria

Introduction The paper presents an ongoing project aimed at the development of a methodology and corresponding software tools intended for building of proper environments giving up means for semantics oriented, web-based access to digitized archival collections. We suppose that these collections are heterogeneous, i.e. they may include diverse types of materials (official handwritten, typewritten or printed documents, letters, photographs, newspapers, maps etc.) and the texts of the documents within them may be written in different languages.

The practical experiments have been performed on a collection of archival documents from the period of the organization of the Sofia Municipal Government (1878 – 1879).

International encoding standards as well as Semantic web methods and technologies have been used. The main difference with other similar projects is in the exploration of the idea that the usage of proper general-purpose and domain-specific ontologies can minimize the resources necessary for the development of tools for adequate, semantics oriented access to heterogeneous (including distributed) multilingual archival collections.

Main objectives of the project To define suitable metadata to accompany digitised documents from archival collections in accordance with the international standards, the Bulgarian traditional experience and the needs of the target groups of users. To study the various aspects of creation of an appropriate ontology for the mentioned collection (e.g. the scope of the ontology, the corresponding linguistic problems etc.).

To explore the necessities of the typical users of the discussed archival collection (experts in various domains and general public) in order to give proper kinds of access to this collection. In particular, providing versatile, user-friendly access to the collection based on the semantics of its content. To develop a framework (that will be intended for users who are professional archivists) for application of Semantic Web methods and technologies to digitised collections of archival documents.

Representation of the archival documents The discussed archival collection consists of approximately 980 original handwritten documents from the period around and after the end of the Russo-Turkish war (1877 – 1878). This is the period when the building of the fundamentals of the Bulgarian state and municipal institutions has been initiated and the basic rules of the contemporary Bulgarian language have yet to be drawn up.

Thus the documents within the collection are of great scientific, historical and social value and are of interest to archivists, historians, linguists etc. Because of these reasons we consider it expedient to include in the electronic version of our collection not only digital images of the chosen archival documents but also structured electronic transcriptions of their full texts and proper descriptions of the collection as a whole as well as descriptions of its parts (known as archival units) and all particular documents in it.

Description of the structural parts of the archival collection These descriptions have been prepared in conformity with the traditional practice of Bulgarian archivists. The structure of Bulgarian archives consists of four levels of hierarchy: archival funds, inventory lists, archival units and individual documents. The descriptions at all levels have been structured and accompanied with proper sets of metadata according to the requirements of the EAD encoding scheme.

For example, the EAD – compliant description of an archival fund contains data about the type of the fund, the dates (starting and final years) of creation of its documents, its logical structure and physical extent, the genre(s) and language(s) of its documents, the substances, technologies and methods of creation of documents and other materials in it as well as some short information about the administrative history of the corresponding corporate body, the history of the fund etc.

Part of the description of archival fund 1K according to EAD standard

Representation of electronic transcriptions of full texts of archival documents We maintain two different digital forms of each original archive document: its digital image and an electronic transcription of its full text (in XML format). The digital images of the original documents are intended mainly for visualization purposes while the electronic transcripts of the documents and their EAD encoded descriptions will be used to support various types of search and document retrieval activities.

For the representation of the structured electronic transcriptions of the full texts of archival documents we use the TEI standard. The structure and the contents of the various kinds of documents within the collection (instructions, orders, reports, records of sessions, letters, requests, petitions etc.) were explored and a generalized model of these documents was created. A proper set of elements and attributes from the TEI document type definition was adopted to describe this model.

Part of the element of the electronic transcription of an archival document

TEI – conformant representation of the text of an archival document

Access to the collection The user has the opportunity to switch between two types of interface to the chosen collection: The first one is based on the principles of the “standard” archivist’s view to an archival collection. The second type of provided on-line access to the collection may be described as the semantics oriented one.

The homepage providing various types of access to the collection

The interface to the archival collection oriented to the standard archivist’s point of view allows the user to browse the hierarchical structure of the collection as a whole. At the archival fund and inventory list levels the user has an access to the EAD encoded description of the corresponding unit (in XML format) and to a properly visualized form of the same metadata (in PDF format).

Interface to the collection supporting the standard archivist’s view (at archival fund level)

The user interface at archival unit level allows one to browse five different forms of each particular document in the corresponding archival unit: the EAD encoded description of the document (in XML format), a proper visualization of this description (in PDF format), the TEI encoded electronic transcription of the full text of the document (in XML format), a proper visualization of the electronic transcription of the document (in PDF format) and a digital image of the original document (again in PDF format).

Interface to the collection supporting the standard archivist’s view (at archival unit level)

The other type of provided access to the discussed archival collection is based on the use of explicitly represented knowledge describing different aspects of the semantics of the collection as a whole and its structural parts. A set of access tools (often called “finding aids”) realizing various types of document search and retrieval (chronological, oriented to the kinds of documents within the collection, subject oriented etc.) has been under development for the purpose.

The search engines of most of these tools use the values of the corresponding elements of the TEI encoded versions of archival documents. In particular, the subject oriented document retrieval is based on the use of the semantic annotation of the documents. The semantic annotation consists of appropriate words and phrases (chosen from a subject ontology) which describe the content of the document.

In our project we use a subject ontology (covering the main types of municipal activities) especially developed for the purpose. This ontology is prepared using Protégé/OWL.

Interface to the collection supporting the subject oriented document retrieval

The topics viewed on the last screenshot belong to a subset of the concepts at the highest two levels of the mentioned ontology based on the assumption for typical requests for information according to the characteristics of the discussed historical period. Our future plans include some ideas to generate automatically the list of searchable topics using the results of the preliminary examination of the professional needs of the main groups of potential users.

The semantic annotations of the documents within the collection contain proper terms from all levels of the same ontology. When the user chooses a topic from the list shown on the last figure, the corresponding access tool finds all documents which contain in their semantic annotations terms matching the user query (i.e. identical to the term chosen by the user or semantically related with it).

A tool for search in the full texts of the document transcriptions is provided as well. We intend to use in its implementation some our former results concerning the development of tools for ontology driven search in collections of digitised manuscripts.

Conclusion The most valuable expected results of our project could be formulated as follows: A methodology for application of international standards, ontological knowledge and Semantic web technologies for the development of software tools providing semantics oriented access to heterogeneous multilingual collections of archival documents.

A model and a prototype of a website which gives the users an interface supporting various types of access to a chosen archival collection.

The main advantage of our approach is the proper use of ontological knowledge describing the semantics of the individual documents in the archival collection as well as the semantics of the collection as a whole and the semantics of its structural parts. It allows users with different profiles to study and analyze the documents within the corresponding collection from multiple points of view using a single environment for the purpose.

Acknowledgements This work has been funded by the EC FP6 Project “Knowledge Transfer for Digitisation of Cultural and Scientific Heritage in Bulgaria” (KT- DigiCULT-BG) coordinated by the Institute of Mathematics and Informatics of the Bulgarian Academy of Sciences. The authors are thankful to Dr. Matthew Driscoll from Copenhagen University, Denmark, for the useful advices concerning the TEI encoding of the electronic copies of archival documents.