A Semantic Knowledge Base for the UK Government Web Archive Tom Storrar & Claire Newing Applying records management processes principles to the open government.

Slides:



Advertisements
Similar presentations
Town Meeting Aims Introduce the project and partners Present our baseline technologies Outline current and planned work Understand your perspectives on.
Advertisements

Large Scale Knowledge Management across Media Prof. Fabio Ciravegna, Department of Computer Science University of Sheffield
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research, Cambridge UK SOFSEM 2004 January 28, 2004.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
1 Co-developing access to the UK Web Archive Helen Hockx-Yu Head of Web Archiving, British Library.
The PageRank Citation Ranking “Bringing Order to the Web”
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
How Search Engines Work Source:
Samad Paydar Web Technology Laboratory Computer Engineering Department Ferdowsi University of Mashhad 1389/11/20 An Introduction to the Semantic Web.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
How to Use Internet Marketing to Grow your Company David Steele, Partner Intrada Technologies.
OPTIMISING AND PROMOTING YOUR WEBSITE Michael Heraghty, Heraghty Internet Consultants
Databases & Data Warehouses Chapter 3 Database Processing.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Nuovo servizio di arricchimento di OPAC. CATALOGUE ENRICHMENT OPACs are now much more than just catalogues Thanks to the internet, library users expect.
1 Large Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood University of Sheffield, UK.
1 The BT Digital Library A case study in intelligent content management Paul Warren
HEALTH DEVELOPMENT AGENCY ONLINE INFORMATION RESOURCES Heidi Livingstone Marta Calonge Contreras.
Disseminating Survey Information in the Networked World: A UK Resource Julie Lamb Department of Sociology University of Surrey
Master Thesis Defense Jan Fiedler 04/17/98
TIP TOP TIPS A presentation by: John H. Gonzaga. TIP TOP TIPS Displaying File Information Key Matches Link Checkers Monthly Analytics Report.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Team Members  RONG-JYUE FANG  HUNG -JEN YANG  HUA- LIN TSAI  CHI -JEN LEE  TIEN-SHENG TSAI  DAI-HUA LI.
Week 3 LBSC 690 Information Technology Web Characterization Web Design.
University of Florida CTSI: Consuming and disambiguating publications data from Microsoft Academic Search in VIVO. Nicholas Rejack 1, Erik Schmidt 1, Michael.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Introducing Intute: Social Sciences Your Guide to the Best of the Web.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Endangered Species A Collaborative Teaching Unit.
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Planning and Designing a Website Index Page Use it as a way to introduce yourself, and describe your website. Use it as a way to introduce yourself,
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
Video Active Presentation Agenda: –Demonstration of videoactive.eu Frontend and Backend fiatifta.dk Copenhagen September 2008.
Program Assessment User Session Experts (PAUSE) Information Sessions: RSS & Subscription Services October , 2006.
Achieving Semantic Interoperability through Controlled Annotations Michael Gertz Department of Computer Science University of California, Davis
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Resources of a Resource By, Anupama Atmakur Pooja Adudodla.
Web coordinator workshop. Introduction Meet and greet –Who are you and what was the last website you visited? Comms team – here for support + our role.
Semantic and geographic information system for MCDA: review and user interface building Christophe PAOLI*, Pascal OBERTI**, Marie-Laure NIVET* University.
1 Chapter 5 (3 rd ed) Your library is an excellent resource tool. Your library is an excellent resource tool.
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
Classifieds Script - classified software (PHPSCRIPTSMALL) classifieds-script/
HOW TO USE GOOGLE WEBMASTER TOOLS TO IMPROVE SEO ? GOOGLE WEBMASTEER.
Bielefeld Academic Search Engine
CCT356: Online Advertising and Marketing
Chapter Five Web Search Engines
Search Engine Optimisation
Search Market and Technologies
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Data Management: Documentation & Metadata
BOOSTING IMAGE RETRIEVAL
Web archive data and researchers’ needs: how might we meet them?
Multimedia Information Retrieval
Semantic Annotation service
ISI Web of Knowledge update: April 2009
International Marketing and Output Database Conference 2005
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
ece 627 intelligent web: ontology and beyond
Web archives as a research subject
Information Retrieval and Web Design
Presentation transcript:

A Semantic Knowledge Base for the UK Government Web Archive Tom Storrar & Claire Newing Applying records management processes principles to the open government record

Overview The National Archives’ Digital Strategy: An overview of the SKB project, including: 1.The Problem 2.The Solution 3.Next Steps

Introducing the UK Government Web Archive More than 18,000 crawls of over 3,000 websites from Approximately 90tb of data, 3.5 billion resources More than 875,000 ARC files More than 20 million pageviews and 2-3 million visits per month

4

User surveys on website: all banners and index pages Established that UKGWA is regularly visited by a great variety of users. The biggest area for dissatisfaction was found to be the existing search functions. We constructed user stories so we could test the improvements. Who are our users and what do they want? 6

Full Text Search – its limitations Our full text search is very useful and very much used, but is limited by how the live sites were at crawl time noisy as it contains much duplicate or near-duplicate material reliant on keyword matching most useful when combined with specialist knowledge

Aim was to improve access to information in the UKGWA by providing far richer information about what it contains The semantic web is a start to tackling a limitation of the web Becomes a dataset in its own right Borrows from and contributes to the web Technology open and machine-readable. APIs allow the data to be easily queried and integrated with other services Awarded to a consortium led by Ontotext AD, the University of Sheffield and System Simulation Semantic Search – What it allows 8

UKGWA: a good candidate for semantic search? 9 Each resource already has a persistent HTTP URI UKGWA is both limited and diverse Generic and domain-specific meanings can be attributed to otherwise loose terms, e.g: Facts can be modelled and refined to show the linkages between entities and how they change over time 2010 general election was opportunity to demonstrate concept

Making UKGWA semantic – How? 10 Image: Ontotext AD, University of Sheffield and System Simulation.

What we learned and next steps 11 We will deliver it as an internal system to develop further It’s not AI! 60-70% annotation accuracy not bad at this scale! Concept can be difficult to explain, and even harder for those unfamiliar with computer science to use (SPARQL etc) prefix skb: prefix xsd: select distinct ?URL ?title where { ?page ?doc_feature. ?doc_feature ?URL. ?doc_feature "WEBARCHIVEURL". ?page ?title. FILTER regex(str(?title), "Foot and Mouth", "i"). FILTER regex(str(?title), "Prime Minister", "i"). ?page } So, integrating the system with other services is a must.

Any Questions? Contact us: Visit: nationalarchives.gov.uk/webarchive Applying records management processes principles to the open government record