ICS-FORTH July, Classifying Historical Documents Maria Theodoridou, Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion - Crete
ICS-FORTH July, The classification problem p automatic transcription not possible u inaccurate OCR software u interpretation dependent p manual keyword assignment u time consuming process u keywords not necessarily unique u inconsistent between users u not obvious for users in retrieval p complete classification only on parts of data base u by different aspects u at different times u by different people
ICS-FORTH July, pDublin Core METAdata Elements pEAD Encoded Archival Description Document Type Definition pISAD (G) General International Standard Archival Description Archival standards
ICS-FORTH July, Task Analysis pArchivist maintains the inventory u Organizes fonds and subfonds (manageable units and provenance) u assigns identification numbers to ensure integrity u documents provenance, chronology of collective units. p Handling of the material is hazardous to health and to the material. u Replace access by electronic surrogate u Preserve electronic copies for preservation of contents p Researchers are granted access to study parts u focused studies - resulting in publications u primary information partially overlaps between studies.
ICS-FORTH July, Idea of Operation p Scanned images replace access to originals. p Researchers should leave core documentation on partial contents p Ergonomic classification user interface (minutes per document) p Thesauri assist classification
ICS-FORTH July, Classification structure p Classification by semantic net of metadata. uAnalysis of entities of the archive material uClassification of documents by: u(1) Date and type of administrational act u(2) described activities usyntactic structure to describe multiple and nested activities uNotion of identity of persons, places, objects uCoherent classification on instance and concept level
ICS-FORTH July, classification subfonds_of Fonds CurrentFondsFilmArchive ArchivalType Subfonds Current Subfonds Historical Subfonds HistoricalFonds copy_of copy_of_ part belongs_to attribute generalisation derived_from part_of corresponding ArchivalDescription structural Historical Archives Modelling collections
ICS-FORTH July, Physical ArchivalType Conceptual ArchivalType UnitOfDescription Item classification subfonds_of (s) Fonds CurrentFondsFilmArchive ArchivalType Subfonds Current Subfonds Historical Subfonds HistoricalFonds copy_of (d) copy_of_ part (d) belongs_to (s) attribute generalisation ArchivalDescription structural (s) derived_from (d) corresponding (c) originates_from (c) kept_in (c) part_of (c) Historical Archives Modelling collections and objects
ICS-FORTH July, Physical ArchivalType Conceptual ArchivalType UnitOfDescription Microfilm Sheet File Book ItemUnit Series Item DocumentPicture BookPage Shot SheetPage Photograph contains (s) contains_first (s) contains_second (s) copy_of (d) corresponds_to (c) ArchivalDescription structural (s)derived_from (d)corresponding (c) classification attribute generalisation Historical Archives Modelling objects vs. contents
ICS-FORTH July, EventTypeDescriptionType UnitOfDescription SheetPage Fonds ItemUnit Item DocumentPicture ScanningEditing Transcription Occurence history ConceptualArchival Type PhysicalArchival Type ArchivalTypeElectronicDocument Type ActionType ElectronicProcessing Type ArchivalDescription structuralderived_fromcorresponding ElectronicProcessingElectronicDocument product Translation ScannedPage produced_from result corresponds_to classification attribute generalisation Historical Archives Modelling processes
ICS-FORTH July, pFor levels: uThe act of documentation uThe act of administration uThe targeted social activity uOther related activities and items pQuestions that need to be answered: uWho? Persons and organizations uWhere? Places uWhen? Time uWhat? Objects uHow? Activities and actions Historical Archives The Facets
ICS-FORTH July, Facet Polyhierarchies Instances (metadata) Manuscripts’ Digital Library Historical Archives Faceted classification by concepts
ICS-FORTH July, Instances (metadata) Manuscripts’ Digital Library Historical Archives Faceted classification by concepts- An example Persons and Organisations Individuals Martin Houses Places house nr.415 live in Facet Polyhierarchies is Martin’s
ICS-FORTH July, Historical Archives The ARCHON classification Item has type: Document Type has publication date: Date has creation date: Date has description: Activity has activity type: Activity Type has actor type: Actor Type has object type: Object Type has place type: Place Type happened at: Date has actor: Actor has type: Actor Type has place: Place has type: Place Type has object: Object has type: Object Type has related activity: Activity
ICS-FORTH July, Historical Archives The ARCHON classification pWhere: uActivity Type = marriage, selling, condemnation, tax regulation, statistics.. uActor Type = Pasha, judge, farmer,…., but also: Witness, u Place Type= City, village, monastry, prefecture…. uObject Type= house, payment, privilege….
ICS-FORTH July, ARXONHierarchy Περιγραφή ΤόποςΈγγραφοΑντικείμενοΔραστηριότηταΧρόνος ARXONFacet classification attribute generalization Δράστης Είδος Facet Κτίσματα Χριστιανικός Μήνας Μουσουλμανικός Μήνας Κινητό Διοικητικός Τόπος Ακίνητο Μη Υλικό Φυσικός Τόπος Περιεχόμενο Διοικητικές Πράξεις Δικαστικές Περιπτώσεις Ρόλος στην υπόθεση Πρόσωπο Φορέας Παρουσία στην υπόθεση Εκδότης/Παραλήπτης Άλλα
ICS-FORTH July, Classifying Historical Documents Conclusions pFaceted classification by concepts uhas high precision umaintains identity of concepts and not keywords ucreates a base of domain knowledge upreserves the syntactic structure of the expression used for the classification