Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
2 Agenda Introduction Semantics, Taxonomy, and Faceted Navigation Key Ideas Review of Media Sites – Key Elements – Common Themes – What Works and What doesn’t Development Guide – Semantics and Faceted Navigation Conclusion
3 KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants – Partners – Business Objects SA, Endeca, Interwoven, FAST, etc. Consulting, Strategy, Knowledge architecture audit Taxonomies: Enterprise, Marketing, Insurance, etc. Services: – Taxonomy development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories
4 Semantics and Facets: Key Ideas Real Key – All of the above Facet – orthogonal dimension of metadata Taxonomy - Subject matter / aboutness Ontology – Relationships / Facts – Subject – Verb - Object Software - Text analytics, auto-categorization People – tagging, evaluating tags, fine tune rules and taxonomy, social tagging, suggestions Enterprise Search Summit Sourcebook – A Knowledge Architecture Approach to Search
5 Essentials of Facets Facets are not categories – Categories are what a document is about – limited number – Facets are types of metadata attributes Facets are orthogonal – mutually exclusive – dimensions – An event is not a person is not a document is not a place. Facets – variety – of units, of structure – Numerical range (price), Location – big to small – Alphabetical, Hierarchical – taxonomic Facets are designed to be used in combination Wine where color = red, price = excessive, location = Calirfornia, And sentiment = snotty
6 Advantages of Faceted Navigation More intuitive – easy to guess what is behind each door Simplicity of internal organization 20 questions – we know and use Dynamic selection of categories Allow multiple perspectives Systematic Advantages – fewer elements – 4 facets of 10 nodes = 10,000 node taxonomy – Ability to Handle Compound Subjects Flexible – can be combined with other navigation elements
7 Essentials of Taxonomies Formal Taxonomy – parent – child relationship – Is-A-Kind-Of ---- Animal – Mammal – Zebra – Partonomy – Is-A-Part-Of ---- US-California-Oakland Browse Classification – cluster of related concepts – Food and Dining – Catering – Restaurants Taxonomies deal with semantics & documents – Multiple meanings and purposes – Essential attributes of documents are not single value Taxonomies combined with facets – Supports an essential way of thinking – Can get value with smaller taxonomies – Formal taxonomies tend to work better
8 Essentials of Ontologies Facts – Subject – Verb – Object – Fred isa Vice-President Relationships – Vice-Presidents - Have Employees & Bosses Implications Vice-Presidents - Make more than managers Knowledge Representation – XML, RDF / OWL / Inference Rules Knowledge Based Reasoning Applications Technology in search of a business model – Knowledge is really hard
9 Dynamic Classification / Faceted navigation Search and browse better than either alone – Categorized search – context – Browse as an advanced search Dynamic search and browse is best – Can’t predict all the ways people think Panda, Monkey, Banana – Can’t predict all the questions and activities China and Biotech Economics and Regulatory
10 Sample eCommerce Sites Pure Facets – Product Catalogs – Library Catalogs Traditional Search Search and Categories Facets, Taxonomies, and Semantics,
11 Three Environments: E-Commerce
12 Three Environments: E-Commerce
13
14
15
16
17
18
19
20
21
22
23 eCommerce Common Themes Balance of commerce and information Source and Type are basics Standard Facets – People, Companies, Place, Industry Interactive interface – sliders, date ranges Taxonomy – just another facet? – Keywords vs. simple taxonomy Semantics still hardest – summaries, related, rank Tag Clouds / Clusters – how useful?
24 eCommerce: Issues Balance of information and ads – Advertiser dominance – No – Auto-ads – Obituary for Obama 1 or 2 filters (source / type) – No – Intersection of facets is source of power Facets not orthogonal – topics and issues Good Information Architecture – Space wars – summary or full facet display – Simplicity vs. research power Integrated design – Complex, not complicated
25 Integrated Design – Facets & Semantics Design Issues - General What is the right combination of elements? – Faceted navigation, metadata, browse, search, categorized search results, file plan What is the right balance of elements? – Dominant dimension or equal facets – Browse topics and filter by facet When to combine search, topics, and facets? – Search first and then filter by topics / facet – Browse/facet front end with a search box
26 Semantics and Facets: Development Elements – More Metadata! Text Analytics Software – Entity / Noun Phrase – metadata value of a facet feeds facets, signature, ontologies – Taxonomy and categorization rules Auto-categorization – feeds subject facets Variation of eCommerce and Enterprise – When and how add metadata, additional facets – CM – Hybrid of taggers, software, and policy – Software offers suggested categorization, facet values – Relevance – best bets to ontology based relevance
27 Semantics and Facets: Development Software Tools – Auto-categorization Auto-categorization – Training sets – Bayesian, Vector Machine – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Advanced – saved search queries (full search syntax) – NEAR, SENTENCE, PARAGRAPH – Boolean – X NEAR Y and Not-Z Advanced Features – Facts / ontologies /Semantic Web – RDF + – Sentiment Analysis – positive, negative, neutral
28 Semantics and Facets: Development Software Tools – Entity Extraction Dictionaries – variety of entities, coverage, specialty – Cost of update – service or in-house – Inxight – 50+ predefined entity types – Nstein – 800,000 people, 700,000 locations, 400,000 organizations Rules – Capitalization, text – Mr., Inc. – Advanced – proximity and frequency of actions, associations – Need people to continually refine the rules Entities and Categorization – Total number and pattern of entities = a type of aboutness of the document – Bar Code, Fingerprint
29 Conclusions Documents – more complicated than products, later start – Need facets plus taxonomies, semantics Integrated design is essential – not facets as add on Semantics is still not there – hardest, but some progress Text Analytics (Entity extraction and auto-categorization) are essential Future – new kinds of applications: – Text Mining, research tools, sentiment Future of Search – smart ways to refine results, not better relevance – Real problem with 10 mil hits – no way to get to target – Include facets, taxonomies, semantics, & lots of metadata
Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services