Archaeology and Terminology Ceri Binding Hypermedia Research Unit, University of Glamorgan, Wales, UK
Translation: “I am not in the office at the moment. Send any work to be translated” [Pictures from BBC news website] Translation: “Pedestrians look left” Any process needs (human) validation...
STAR project - overview AHRC funded project in collaboration with English Heritage Centre for Archaeology, Portsmouth Aim: to investigate the potential of semantic technologies for widening access to digital archaeology resources, including disparate datasets and associated grey literature.
STAR - general architecture RRAD RPRE RDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS) Grey Literature reports Grey Literature reports EH thesauri, glossaries LEAP STAN IADB Data Mapping / Normalisation Conversion (SKOS) Indexing Data access layer - Web Services, SQL, SPARQL Applications – Server Side, Rich Client, Browser Archaeological Datasets
The Archaeological Archipelagos [Keith May, English Heritage]
English Heritage controlled vocabularies 27 glossaries – from English Heritage recording manuals (2006) 6 main thesauri used: – Monument Types thesaurus – Archaeological Sciences thesaurus – Evidence thesaurus – Main Building Materials thesaurus – MDA Object Types thesaurus – Timelines thesaurus Converted to SKOS format for use within STAR
Expressive vs. controlled vocabulary “…how many of those writing [grey literature] reports would think to describe what they are recording/writing about using the same thesauri? […] it would have been a lot quicker and easier if standardised terminology had been used in the report text when describing types of monument, event and artefact, as well as dates/periods etc.” [G. Falkingham] “Grey Literature is very often the only place where field workers have any opportunity to engage in creating their own narrative of the site, both of the archaeological event and of the archaeological story of the site itself. I think it would be throwing the baby out with the bath water to concentrate solely on the data without continuing to offer highly skilled and experienced fieldworkers the opportunity to actually tell us what they think the data means...” [S. Jeffrey]
Descriptive, semi-controlled vocabulary… Deposit ColourDeposit Texture Deposit Compaction (Reddy) Brown 9Reddy) brown Brown Brown red Brown/reddy Dark brown Dark brown/orange Dark grey brown Dark orange brown Dark orange brown with darker patches Dark orange loam Dark orange/brown Dark red brown Grey brown Grey/brown Light brown Light yellow brown Medium brown Mid brown Mid red brown Orange brown Orange/brown Orangy brown Orangy brown, very light brown on edges and sides of profile Red /brown Red brown Red/brown Reddish brown Reddy brown Varies Very light brown White Yellow brown Yellow/orange brown Firm Friable Friable to loose Friable/loose Friable-loose Loose Loose/friabe Loose/friable Plastic Sticky Sticky (wet) Sticky/firm Varies Worst of all worlds? “…another of my examples has something about some flint that is ‘snuff coloured’ & I don’t know if I’ve ever seen snuff, let alone know what colour it is, or might have been over 150 years ago, and I would think it would make sense to take some kind of integrated approach from the outset, rather than the usual ‘bricolage’ of having one route for the archivists, another for those interested in searching spreadsheets, another for people interested in googling graphics, etc.” [G. Carver]
Terminology control for time periods Centuries BC / AD years 3 age system Monarchs / Roman emperors Cultural styles Geological periods Prefixes: pre, post, mid etc. Any combinations of these
Time period alignment – data cleansing / semantic enrichment Object NoPeriodMIN YEARMAX YEAR 1519AD st century AD nd century st century AD AD Mid 1st century AD First half 1st centu Mid first century AD c. AD First half first cen AD AD AD Medieval nd century AD ?1st century AD AD Medieval Romano-British Modern? post-mediaeval
Time period relationships Period P1 Time occurs before P1* meets P1 overlaps P1 starts P1* equal to P1* occurs during P1* finishes P1* overlapped by P1 met by P1 occurs after P1* includes P1* started by P1* finished by P1* [*Transitive]
Time Period Comparison – Closeness Calculation Time IU Period P1 Period P2 NMP MP Period P3 DNMP Match(P1, P2) = W1 (MP / IU) + W2 (IU / (NMP + IU)) + W3 (IU / (D + IU))
SKOS Concepts + CRM Entities skos:Concept skos:broader rdf:type Time period concepts also have implicit spatio-temporal context crm:E4.Period crm:E52.Time-Span crm:E53.Place crm:E2.TemporalEntity rdfs:subClassOf crm:P4F.has_time-span crm:P7F.took_place_at rdf:type crm:P119F.meetscrm:P118F.overlapscrm:P119F.meets crm:P115F.finishes crm:P116F.starts
Time period alignment – data processing Align data relative to closest period concepts from English Heritage ‘Timelines’ thesaurus
Data records relative to closest ‘known’ periods Time period alignment - results
Data aligned to closest ‘ known’ periods
Timeline service test client
Semantic enrichment Borderline between data cleansing and data creation… “Possibly fragment of belt buckle or nail” BELT Belt Clasp -> use STRAP FITTING BUCKLE Buckle Plate -> use BUCKLE NAIL HOBNAIL SHOEING NAIL BELT Belt Clasp -> use STRAP FITTING BUCKLE Buckle Plate -> use BUCKLE NAIL HOBNAIL SHOEING NAIL “The single most useful thing you can do to ensure the long-term preservation of your data is to plan for it to be re-used” [Archaeology Data Service] “The single most useful thing you can do to ensure the long-term preservation of your data is to plan for it to be re-used” [Archaeology Data Service]
Aligning controlled vocabularies Different scope notes, same concepts? Different thesauri, same concepts? RCHME Monument Types SARCOPHAGUS SUNDIAL WALL PAINTING WHIPPING POST RCHME Monument Types SARCOPHAGUS SUNDIAL WALL PAINTING WHIPPING POST Archaeological Objects SARCOPHAGUS SUNDIAL WALL PAINTING WHIPPING POST Archaeological Objects SARCOPHAGUS SUNDIAL WALL PAINTING WHIPPING POST RCHMS Monument Types RCHMW Monument Types
STAR general architecture STAR web services English Heritage thesauri (SKOS) Archaeological Datasets (CRM) Windows applications Browser components Full text search Browse concept space Navigate via expansion Cross search archaeological datasets Windows applications Browser components Full text search Browse concept space Navigate via expansion Cross search archaeological datasets STAR client applications STAR datasets Grey literature indexing
Windows Client Applications Browse available thesauriSearch across multiple thesauriNavigate via semantic expansion
Interactive tools to aid data entry
Interactive selection from glossary/thesaurus concepts Filtered to concepts actually used in indexing Group / context types – from (enhanced) cuts and deposits glossary Context find materials – from building materials thesaurus Context find types – from MDA Object types thesaurus Context sample types – from existing data values... Controlled types used in main search interface
Interactive tools to aid data entry
Summary Tension between expressive vs. controlled vocabulary, but general agreement on benefits of control Better coordination and alignment of controlled vocabularies would be beneficial Web services and interactive tools to aid data entry and search Issues encountered are not about particular technologies – more fundamental KO issues
Archaeology and Terminology Ceri Binding Hypermedia Research Unit, University of Glamorgan, Wales, UK
Accommodating Approximation crm:E52.Time-Span – modelling uncertainty Approximate time period Uncertainty Earliest start dateLatest start dateEarliest end dateLatest end date crm: P81F.ongoing_throughout crm: P82F.at_some_time_within