© University of South Wales Classical Art Semantics Information Extraction: CASIE Pilot Project Dr. Andreas Vlachidis Hypermedia Research Unit University.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Cultural Heritage in REGional NETworks REGNET Status of Task 2.1 Project Management Group Meeting
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Collection-level description & collection management: tool for the trade or information trade-off? Collection Description Focus Workshop 4 Newcastle, 8.
An Introduction to GATE
Korean Place Name Information Service on the Web 2.0 Environment
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Digital Collections: Use, Value and Impact Lorna Hughes University of Wales Chair in Digital Collections, National Library of Wales Aberystwth University.
STELLAR Introduction Ceri Binding, Douglas Tudhope Hypermedia Research Unit, University of Glamorgan.
Book of the Dead Project: A new approach to Digital Editions of Ancient Manuscripts using CIDOC-CRM, FRBRoo and RDFa Dr. Barry Norton, Development Manager,
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Archaeology & Cultural Heritage AWG Manolis Vavalis CERETETH.
STELLAR Introduction Douglas Tudhope Hypermedia Research Unit, University of Glamorgan.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
Semantic Mediation & OWS 8 Glenn Guempel
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
KOS-based tools for archaeological dataset interoperability: NKOS Workshop, ECDL 2010 C. Binding, K. May 1, D. Tudhope, A. Vlachidis Hypermedia Research.
Harmonising without Harm: towards an object-oriented formulation of FRBR aligned on the CIDOC CRM ontology Maja Žumer (University of Ljubljana) & Patrick.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
Using ISO/IEC to Help with Metadata Management Problems Graeme Oakley Australian Bureau of Statistics.
Semantic Technologies for Cultural Heritage Ongoing Projects at Ontotext Mariana Damova, PhD September, 2011.
Survey of Semantic Annotation Platforms
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Experiences with UIMA from a User’s Perspective Dietmar Rösner, Manuela Kunze, Hany Mahgoub University of Magdeburg C Knowledge Based Systems and Document.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Uniting Libraries And Archives: How An Integrated Metadata Strategy Can Produce a Common Research Environment Richard Gartner, King's College London.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
1 NumericNumeric Developing a statistical framework for measuring the digitisation of Europe’s cultural heritage  Numeric  Phillip Ramsdale The study.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
Semantic Technologies & GATE NSWI Jan Dědek.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Semantic Annotation of Grey Literature from an Archaeological Digital Library Andreas Vlachidis, Doug Tudhope Hypermedia Research Unit University of Glamorgan.
Supporting Further and Higher Education Collection description as Middleware The Information Environment Service Registry (IESR) Rachel Bruce, Information.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Ontology-driven VoiceXML Dialogues Generation Marta Gatius, Meritxell González TALP Research Center, Technical University of Catalonia, Barcelona Berlin,
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Knowledge Technologies for Description of the Semantics of the Bulgarian Iconographical Artefacts Lilia Pavlova-Draganova Laboratory of Telemаtics – BAS,
Antoine Isaac Europeana – VU University Amsterdam Dagstuhl Multilingual Semantic Web seminar.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Towards the Adaptive Semantic Web Peter Dolog Nicola Henze Wolfgang Nejdl Michael Sintek.
STAR, STELLAR and SKOS Ceri Binding, Phil Carlisle, Keith May, Doug Tudhope, Andreas Vlachidis University of Glamorgan and English Heritage.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
MICHAEL Culture Association WP4 Integration of existing data structure into Europeana ATHENA, WP4 Working group technical meeting Konstanz, 7th of May.
MICHAEL and the European Digital Library: promoting teaching, learning and research The MICHAEL Project is funded under the European Commission eTEN Programme.
INHA UNIVERSITY, KOREA Rainer Simon Austrian Institute of Technology.
Dr. Barry Norton, Development Manager, ResearchSpace*
TextCrowd – Collaborative semantic enrichment of text-based datasets
Presented by: Hassan Sayyadi
Component Based Software Engineering
2. An overview of SDMX (What is SDMX? Part I)
Accommodating local cataloguing traditions in a global context
C. Binding, K. May1, R. Souza, D. Tudhope, A. Vlachidis
Antoine Isaac SEMIC conference
Presentation transcript:

© University of South Wales Classical Art Semantics Information Extraction: CASIE Pilot Project Dr. Andreas Vlachidis Hypermedia Research Unit University of South Wales The Beazley Archive – University of Oxford Classical Art Research Online Services CLAROS

Motivation Apply semantic technologies to make digital humanities material easily discoverable and available for reuse and comparative analysis purposes. © University of South Wales Introduction CASIE Project Aims Automatic extraction of information about cultural objects from classical art scholarly texts Represent information in terms of the CIDOC-CRM (ISO 21127:2006) metadata standard for cultural heritage Focus Scholarly Text: Corpus Vasorum Antiquorum (CVA)

CVA The oldest research project of the Union Académique CVA Initiated in 1922 Contains 300 high-quality catalogues (fascicules) of ancient Greek painted pottery 100,000 vases Illustrations 120 collections In 26 different countries. 2004: Beazley archive completed the digitisation of the CVA fascicules Digitised result available from CVA online ( But in bitmap format  © University of South Wales Background CASIE Project

The CLAROS Project Classical Art Research Online Services (CLAROS) An international interdisciplinary research initiative Focused on semantic integration of world classical art records Beazley Archive, the German Archaeological Institute, the Ashmolean Museum, the Eastern Art, Jameel Collection, the National Archaeological Museum of Greece, and other Delivers searchable semantic web interface CIDOC-CRM to enable semantic interoperability © University of South Wales Background CASIE Project

Semantic Annotation Specific metadata which are usually generated with respect to a given ontology and are aimed to automate identification of concepts and their relationships in documents © University of South Wales Method CASIE Project Development Approach Semantic annotation process driven by a rule-based Information Extraction (IE) techniques supported by domain- oriented vocabulary

© University of South Wales Method CASIE Project General Architecture for Text Engineering Java Pattern Engine CVA Fascicules Ontology -CIDOC CRM-EH Domain Vocabulary

CVA Fascicules (high quality catalogues) 12 Fascicules originating from The British Museum (8) The Ashmolean Museum (3) Thessaloniki Archaeological Museum (1) Published between 1925 – 1998 Structure: reasonably consistent among fascicules of same origin Inconsistency in terms of Dimension abbreviations Catalogue reference format Size of descriptive passages of artefacts © University of South Wales CASIE Pilot Project CASIE Project

British Museum © University of South Wales CVA Fascicules CASIE Project Ashmolean Museum

Sample Text © University of South Wales Information Extraction Focus CASIE Project E22.Man-Made_object E54.Dimension E42.Intenifier P3_has_note

Pre-processing Prepared images for OCR (Photoshop) Performed OCR (Abby Fine Reader 9) © University of South Wales Development Phases CASIE Project Main Information Extraction Phase Developed the main IE pipeline GATE Gazetteer development IE rules development Iterative process Necessary adjustments (fascicules oriented) Conversion of Semantic Annotations to RDF triples Bespoke PHP script using DOM Delivery of RDF expressions consistent with CLAROS (CIDOC- CRM) format

Adjustment of Image Levels (Photoshop) Improve contrast between white background and black text to minimise OCR errors © University of South Wales Pre-processing Phase CASIE Project OCR Result 2a and 2b. Amphora, (a) Athena ; on 1. Hermes ; on r. bearded man with staff, perhaps Zeus, (b) Winner of horse-race ; procession of youth bearing wreath and tripod^ mounted youth and bearded herald announcing A V N EI -KETV:HIPOZ:NIKAI, Av(o>eu?jT(°)" ttr(ii)os wkS. Ht From Vulci ; 1849.—Bibl. Cat. B 144

GATE Gazetteer Gazetteer support the IE with domain vocabulary Vase form listing originates from CVA online Supportive project specific list also created to support Extraction of dimension Extraction of Catalogue Reference © University of South Wales Main Information Extraction Phase CASIE Project

CASIE Pipeline A cascading mechanism of NLP components including GATE modules and bespoke JAPE rules © University of South Wales Main Information Extraction Phase CASIE Project

JAPE Rules Advanced finite state transducer implementing elaborate regular expressions © University of South Wales Main Information Extraction Phase CASIE Project {Lookup.majorType==shape}| {Token contains Lookup.majorType==shape}| ((({Token.category==RB}| {Token.category==NNP}| {Token.category==JJ}) ({SpaceToken.kind==space})?)[1,3] ({SpaceToken.kind==space})* {Lookup.majorType==shape}) The above rule will match cases such as, “Amphora”, “Neck-Amphora” and “Fragment of belly of amphora”

Semantic Annotation Result in GATE © University of South Wales Main Information Extraction Phase CASIE Project

Semantic Annotation Result in GATE © University of South Wales Main Information Extraction Phase CASIE Project

PHP Script Converted the semantic annotations to RDF expressions consistent with CLAROS (CIDOC-CRM) format © University of South Wales RDF Conversion Phase CASIE Project

© University of South Wales RDF Conversion Phase CASIE Project

© University of South Wales RDF Conversion Phase CASIE Project

RDF Expressions Man-made Object, Note © University of South Wales RDF Conversion Phase

© University of South Wales RDF Conversion Phase CASIE Project RDF Expressions Fascicule (Document), Catalogue Reference

© University of South Wales RDF Conversion Phase CASIE Project RDF Expressions Dimension

CASIE Pilot Project Delivered reasonably good results Rule-base IE can support the task Availability of Domain vocabulary (English) Pilot investigation paved the way for a potential large scale project Future development should address Multilingual characteristics Writing style of individual fascicules © University of South Wales Conclusion CASIE Project

© University of South Wales Classical Art Semantics Information Extraction: CASIE Pilot Project Dr. Andreas Vlachidis Hypermedia Research Unit University of South Wales The Beazley Archive – University of Oxford Classical Art Research Online Services CLAROS