Download presentation
Presentation is loading. Please wait.
Published byMegan Donna York Modified over 9 years ago
1
http://www.sekt-project.com 1 The BT Digital Library A case study in intelligent content management Paul Warren paul.w.warren@bt.com
2
http://www.sekt-project.com 2 Semantics in content management limitations of conventional technology the users’ view using the technology enhancing the experience the starting point
3
http://www.sekt-project.com 3 Semantics in content management Intelligent content management
4
http://www.sekt-project.com 4 The need for semantics Content management systems need to: index by meaning, not just text combine information from heterogeneous sources Users need information: identified by semantics, not just keywords precise and complete selected by their interests and their task context defined semantically from heterogeneous sources, accessed uniformly semantics in content management
5
http://www.sekt-project.com 5 Higher precision, greater recall Precision Find me information about Washington the man, not the state or city Find me information about a company called X which operates in industry Y Recall Finding all relevant documents E.g. ask for information about ‘George W Bush’ and be given documents on ‘the President’ semantics in content management
6
http://www.sekt-project.com 6 Interests and context Need information about Jaguar? interested in cars, the natural world, South America … with a context defined by current activities Not just about searching interest & context to share information … … and to push information to user … plus many integrated applications semantics in content management
7
http://www.sekt-project.com 7 Too much relevant information Documents with duplicate information. Goal to: extract what is unique from each document help users prioritise their reading Need to: aggregate from disparate sources remove duplication present meaningfully classified summarised semantics in content management
8
http://www.sekt-project.com 8 The starting point The BT digital library before SEKT
9
http://www.sekt-project.com 9 The BT digital library the starting point Two major document databases 5 million articles – abstracts plus some full text Originally text-based with some attribute- based querying: e.g. author, date information spaces defined by queries
10
http://www.sekt-project.com 10 An information space the starting point Query-defined alerts Emailed weekly as database updated Public info spaces anyone can subscribe forming communities Private info spaces defined by user
11
http://www.sekt-project.com 11 Personalisation the starting point Personalised entry page shows user’s info spaces, journals of interest, recent reading and ‘jottings’ (bookmarks)
12
http://www.sekt-project.com 12 Limitations of conventional technology Why we need semantics
13
http://www.sekt-project.com 13 Queries Text string ‘knowledge management’ 4161 ABI + 5029 Inspec records Descriptor ‘knowledge management’ 3213 ABI + 2783 Inspec So careful query formulation needed … … but average query length is 1.8 words Little use of ‘advanced’ functions … … 80% queries use no query modifier limitations of conventional technology
14
http://www.sekt-project.com 14 Poor relevancy of results A simple keyword search tends to offer high recall and low precision. Ambiguity in the query, e.g. synonymy where several terms could describe the same concept, homonymy where a word has many different meanings. Relevant documents retrieved |A| Non relevant documents retrieved |B| Non relevant Documents |C| Relevant Documents |D| Relevant documents Retrieved documents Recall = |A|/(|A|+|D|) (proportion of relevant documents retrieved) Precision = |A|/(|A|+|B|) (proportion of retrieved documents that are relevant) limitations of conventional technology
15
http://www.sekt-project.com 15 Presenting results Searches Only 17% results read after 1 st page … no more than 10 results checked Same query, same results regardless of user’s preference & context Document descriptors Lots – many irrelevant to readership Where relevant, not fine-grained e.g. knowledge management limitations of conventional technology
16
http://www.sekt-project.com 16 Enhancing the experience What semantics can offer a digital library
17
http://www.sekt-project.com 17 A new experience enhancing the experience Hybrid searching concepts, instances, information spaces, and text search results meaningfully classified Automatic annotation identifying companies, people, … hyperlinked to a knowledgebase Topics – finer grained than document descriptors semi-automatically generated automatic document classification An extended corpus crawling the Web for related pages Web pages added to share knowledge
18
http://www.sekt-project.com 18 A better experience Semantics to improve precision & recall Washington the man, not city or state references to the President not just George W Bush Information spaces defined on semantic queries not just text queries Taking account of interests and context semantically defined Natural language results enhancing the experience
19
http://www.sekt-project.com 19 The users’ view What users want
20
http://www.sekt-project.com 20 Initial questionnaire & focus group Users want: Improved searching and indexing based on a user’s profile integrated into working environment To stay in control advise but not decide frustrated by too many email alerts the users’ view
21
http://www.sekt-project.com 21 Features – what the users think very important / important summarising results of search with personal interests and preferences advanced attribute-based search looking beyond the library suggesting candidate topic areas highlighting & hyperlinking named entities natural language queries the users’ view
22
http://www.sekt-project.com 22 After that … Important / minor importance retrieving similar articles re-using old queries agent searches access from a range of devices the users’ view
23
http://www.sekt-project.com 23 Using the technology Applying semantics to the BT Digital Library
24
http://www.sekt-project.com 24 Search: knowledge management using the technology knowledge management as: info space topic term With clustered results
25
http://www.sekt-project.com 25 A complex query using the technology microsoft 2 companies term semantic web info space topic term sem web info space Microsoft-authored Microsoft as term
26
http://www.sekt-project.com 26 Querying a concept alloy a term but also - concept in ontology … with properties … definition … sub-concepts using the technology
27
http://www.sekt-project.com 27 Document with markup using the technology Identified: Bhargava Waterbury Connecticut USA IEE Click for related documents, e.g. by Bhargava
28
http://www.sekt-project.com 28 Categorising results … using the technology
29
http://www.sekt-project.com 29... and more categories using the technology
30
http://www.sekt-project.com 30 In summary Semantic technology - provides intelligence in content management - enhances the user experience - satisfies proven user needs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.