Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stuart Jeffrey, Julian Richards, Fabio Ciravegna Stewart Waller, Sam Chapman, Ziqi ZhangTony Austin. STAR/Archaeotools Workshop, York, 9 th May 2008. Stuart.

Similar presentations


Presentation on theme: "Stuart Jeffrey, Julian Richards, Fabio Ciravegna Stewart Waller, Sam Chapman, Ziqi ZhangTony Austin. STAR/Archaeotools Workshop, York, 9 th May 2008. Stuart."— Presentation transcript:

1 Stuart Jeffrey, Julian Richards, Fabio Ciravegna Stewart Waller, Sam Chapman, Ziqi ZhangTony Austin. STAR/Archaeotools Workshop, York, 9 th May 2008. Stuart Jeffrey, Julian Richards, Fabio Ciravegna, Stewart Waller, Sam Chapman, Ziqi Zhang, Tony Austin. STAR/Archaeotools Workshop, York, 9 th May 2008. The Archaeotools project: faceted classification and natural language processing in an archaeological context.

2 AHRC-EPSRC-JISC eScience research grants scheme: AIM: To allow archaeologists to discover, share and analyse datasets and legacy publications which have hitherto been very difficult to integrate into existing digital frameworks BUILDS UPON: Common Information Environment Enhanced Geospatial browser PARTNERS: Natural Language Processing Research Group, Department of Computer Science, University of Sheffield Joint Information Systems Committee

3 Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media).Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media). Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingWorkpackage 2 – Natural language processing /Data-mining of Grey Literature; plus tagging Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkWorkpackage 3 – Data-mining of Historic Literature; plus geoXwalk Three distinct Workpackages:

4 Datasets include: –National Monuments Records (Scotland, Wales, England) –Excavation Index (EH) –Archive Holdings –Local Authority Historic Environment Records Thesauri include: –Thesaurus of Monuments Types (TMT) –Thesaurus of Object Types –MIDAS Period list –UK Government list of administrative areas, County, District, Parish (CDP) – Not MIDAS

5 Oracle RDBMS MIDAS XML Record Information Extraction RDF Resource Knowledge triple store XML Docs of Thesaurus Query User Interface Information Extraction When, Where, What ontologies as entries to faceted index Input

6

7

8

9

10 Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media).Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media). Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingWorkpackage 2 – Natural language processing /Data-mining of Grey Literature; plus tagging Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkWorkpackage 3 – Data-mining of Historic Literature; plus geoXwalk

11

12

13

14

15

16 Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media).Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When and Media). Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingWorkpackage 2 – Natural language processing /Data-mining of Grey Literature; plus tagging Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkWorkpackage 3 – Data-mining of Historic Literature; plus geoXwalk

17

18

19 http://ads.ahds.ac.uk/project/archaeotools /

20 “WHAT” Records that have no subject information Records that use terms not found in TMT, so these records cannot be indexed (6,442 unique terms) Records (1,001,407) 19,269 records (2%) Records (1,001,407) 101,507 records (10.1%)

21 “WHEN” Records that have no temporal information Records that use period terms not found in MIDAS so these records cannot be indexed (457 types of irresolvable dates) Records (1,001,407) 292,793 records (29.2%) Records (1,001,407) 114,505 (11.4%) 1066, 1001-1100,11 th Centuary, C11, 11C, Eleventh Century

22 “WHERE” Records that have no spatial information Records that use terms not found in CDP, so these records cannot be indexed. Records (1,001,407) 11,126(1.1%) Records (1,001,407) 245,601 records (24.5%)


Download ppt "Stuart Jeffrey, Julian Richards, Fabio Ciravegna Stewart Waller, Sam Chapman, Ziqi ZhangTony Austin. STAR/Archaeotools Workshop, York, 9 th May 2008. Stuart."

Similar presentations


Ads by Google