Download presentation
Presentation is loading. Please wait.
1
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
2
2 Agenda Introduction: Elements – Facets, Taxonomies, Software, People 3 Environments – E-Commerce, Enterprise, Internet Design Issues – Facets and Entities Conclusion – Integrated Solution
3
3 KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants – 12-15 Partners – Inxight, FAST, etc. Consulting, Strategy, Knowledge architecture audit Taxonomies: Enterprise, Marketing, Insurance, etc. Services: – Taxonomy development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories
4
4 Elements Facet – orthogonal dimension of metadata Entity / Noun Phrase – metadata value of a facet Entity extraction – feeds facets, signature, ontologies Taxonomy and categorization rules Auto-categorization – aboutness, subject facets People – tagging, evaluating tags, fine tune rules and taxonomy
5
5 Essentials of Facets Facets are not categories – Categories are what a document is about – limited number – Entities are contained within a document – any number Facets are orthogonal – mutually exclusive – dimensions – An event is not a person is not a document is not a place. Facets – variety – of units, of structure – Numerical range (price), Location – big to small – Alphabetical, Hierarchical – taxonomic Facets are designed to be used in combination Wine where color = red, price = excessive, location = Calirfornia, And sentiment = snotty
6
6 Advantages of Faceted Navigation More intuitive – easy to guess what is behind each door Simplicity of internal organization 20 questions – we know and use Dynamic selection of categories Allow multiple perspectives Ability to Handle Compound Subjects Systematic Advantages – fewer elements – 4 facets of 10 nodes = 10,000 node taxonomy – Ability to Handle Compound Subjects Flexible – can be combined with other navigation elements
7
7 Essentials of Taxonomies Internal Organization Formal Taxonomy – parent – child relationship – Is-A-Kind-Of ---- Animal – Mammal – Zebra – Partonomy – Is-A-Part-Of ---- US-California-Oakland Browse Classification – cluster of related concepts – Food and Dining – Catering – Restaurants Taxonomies deal with complex, not compound – Conceptual relationships – category membership – Contextual relationships – Computers & Software Taxonomies deal with semantics & documents – Multiple meanings and purposes – Essential attributes of documents are not single value
8
8 Developing Facets: Tools and Techniques Software Tools Text Analytics – Taxonomy management, entity extraction, categorization, sentiment Search – Integrated features, at index, Internet sources CM – Enterprise environment, taggers and policy Programmable Rules – Business and Subject matter expertise – Auto-populate variety of metadata – author, title, date, etc. – Relevance – best bets to weights and classes of documents People – refine, monitor – it’s not automatic
9
9 Developing Facets: Tools and Techniques Software Tools – Auto-categorization Auto-categorization – Training sets – Bayesian, Vector Machine – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Advanced – saved search queries (full search syntax) – NEAR, SENTENCE, PARAGRAPH – Boolean – X NEAR Y and Not-Z Advanced Features – Facts / ontologies /Semantic Web – RDF + – Sentiment Analysis – positive, negative, neutral
10
10 Developing Facets: Tools and Techniques Software Tools – Entity Extraction Dictionaries – variety of entities, coverage, specialty – Cost of update – service or in-house – Inxight – 50+ predefined entity types – Nstein – 800,000 people, 700,000 locations, 400,000 organizations Rules – Capitalization, text – Mr., Inc. – Advanced – proximity and frequency of actions, associations – Need people to continually refine the rules Entities and Categorization – Total number and pattern of entities = a type of aboutness of the document – Bar Code, Fingerprint
11
11 Elements: People Programmers, Librarians, Taxonomists, Metadata specialist – Integrate, design, develop rules, monitor activity & quality Authors, Subject Matter Experts – Input into design (important facets), rules, activity meaning Users – Web 2.0 – Feedback – quality and usability – Suggestions – missing terms, bad categorization & entity – Tags Clouds & folksonomy – for social networking features, not for information retrieval
12
12 Three Environments E-Commerce – Catalogs, small uniform collections of entities – Uniform behavior – buy this Enterprise – More content, more types of content – Enterprise Tools – Search, ECM – Publishing Process – tagging, metadata standards Internet – Wildly different amount and type of content, no taggers – General Purpose – Flickr, Yahoo – Vertical Portal – selected content, no taggers
13
13 Three Environments: E-Commerce
14
14 Three Environments: E-Commerce
15
15 Enterprise Environment – When and how add metadata Enterprise Content – different world than eCommerce – More Content, more kinds, more unstructured – Not a catalog to start – less metadata and structured content – Complexity -- not just content but variety of users and activities Combination of human and automatic metadata – ECM – Software aided - suggestions, entities, ontologies Enterprise – Question of Balance / strategy – More facets = more findability (up to a point) – Fewer facets = lower cost to tag documents Issues – Not enough facets – Wrong set of facets – business not information – Ill-defined facets – too complex internal structure
16
16 Facets and Taxonomies Enterprise Environment – Case One – Taxonomy, 7 facets Taxonomy of Subjects / Disciplines: – Science > Marine Science > Marine microbiology > Marine toxins Facets: – Organization > Division > Group – Clients > Federal > EPA – Instruments > Environmental Testing > Ocean Analysis > Vehicle – Facilities > Division > Location > Building X – Methods > Social > Population Study – Materials > Compounds > Chemicals – Content Type – Knowledge Asset > Proposals
17
17 External Environment – Text Mining, Vertical Portals Internet Content – Scale – impacts design and technology – speed of indexing – Limited control – Association of publishers to selection of content to none – Major subtypes – different rules – metadata and results Complex queries and alerts – Terrorism taxonomy + geography + people + organizations Text Mining – General or specific content and facets and categories – Dedicated tools or component of Portal – internal or external Vertical Portal – Relatively homogenous content and users – General range of questions
18
18 Internet Design Subject Matter taxonomy – Business Topics – Finance > Currency > Exchange Rates Facets – Location > Western World > United States – People – Alphabetical and/or Topical - Organization – Organization > Corporation > Car Manufacturing > Ford – Date – Absolute or range (1-1-01 to 1-1-08, last 30 days) – Publisher – Alphabetical and/or Topical – Organization – Content Type – list – newspapers, financial reports, etc.
19
19
20
20
21
21
22
22 Integrated Facet Application Design Issues - General What is the right combination of elements? – Faceted navigation, metadata, browse, search, categorized search results, file plan What is the right balance of elements? – Dominant dimension or equal facets – Browse topics and filter by facet When to combine search, topics, and facets? – Search first and then filter by topics / facet – Browse/facet front end with a search box
23
23 Integrated Facet Application Design Issues - General Homogeneity of Audience and Content Model of the Domain – broad – How many facets do you need? – More facets and let users decide – Allow for customization – can’t define a single set User Analysis – tasks, labeling, communities Issue – labels that people use to describe their business and label that they use to find information Match the structure to domain and task – Users can understand different structures
24
24 Automatic Facets – Special Issues Scale requires more automated solutions – More sophisticated rules Rules to find and populate existing metadata – Variety of types of existing metadata – Publisher, title, date – Multiple implementation Standards – Last Name, First / First Name, Last Issue of disambiguation: – Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford – Same word, different entity – Ford and Ford Number of entities and thresholds per results set / document – Usability, audience needs Relevance Ranking – number of entities, rank of facets
25
25 Putting it all together – Infrastructure Solution Facets, Taxonomies, Software, People Combine formal power with ability to support multiple user perspectives Facet System – interdependent, map of domain Entity extraction – feeds facets, signatures, ontologies Taxonomy & Auto-categorization – aboutness, subject People – tagging, evaluating tags, fine tune rules and taxonomy The future is the combination of simple facets with rich taxonomies with complex semantics / ontologies
26
Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.