Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant.

Similar presentations


Presentation on theme: "Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant."— Presentation transcript:

1 Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant

2 Knowledge Retrieval l Taxonomy: What, Why, How? l Taxonomy and Auto-Categorization –Approaches and Companies l Applied Taxonomies: –Content Management, Search l Future Directions –Information Retrieval to Knowledge Retrieval

3 Taxonomy: What l What is a Taxonomy? n Organization: Hierarchical, web, etc. n Card Catalog, Yahoo n Creates a context within which facts are related n Find, Identify, Describe information, relations, context.

4 Taxonomy: What l Is this a Taxonomy? –Things that begin with the letter A –Things that have 4 legs –Things that are used to write with –Fantasy Animals –Large Orange Objects –Objects used by non-humans for undisclosed purposes. l Jorge Luis Borges

5 Taxonomy: What l What makes a good taxonomy? l The Library of Congress catalog? –No. Not unless your intranet contains as much information as the LC. l An understandable organization of content that enables people to find information and which supports knowledge discovery.

6 Taxonomy: Why l Search Stinks l Professionals spend more time looking for information than using it. l Solution: Browse and Search l Need a Taxonomy l It’s ain’t easy, so why do it?

7 Taxonomy: Why l Cost of poor Search and Content Management –If its not organized, you can’t find it. –If you can’t find it, you can’t use it. –If you can’t find it, you waste a lot of time. –If you can’t find it, you could lose an account. –If you can’t find it, you could look stupid. –If you can’t find it, it doesn’t exist.

8 Taxonomy: Why l How does a Taxonomy improve Search and Content Management? –Browse and Search works better than Search l ecommerce - 56% of all searches fail = lost income l Intranet - lost time, lost business, lost ideas –Improved Publishing Model: By category, not department –Rich semantic web of concepts, not a unstructured collection of documents

9 Taxonomy: Why l How does Content Management improve Taxonomies? n CM supports intelligent distributed categorization: –Work Flow: Central and local –Multiple roles: IA, SME, author, editor n CM supports automatic meta data and categorization

10 Taxonomy: How l Old Answer: Manual –hire a bunch of librarians and IA’s –Costly, difficult to maintain l New Answer: –Cyborg: Manual and Automatic Categorization –Integrate Content Management and Taxonomy –Integrate central IA’s and local authors

11 Automatic vs. Humanatic l Humans are better, but not as consistent –General bin, understandable mistakes –Bring outside contexts to the document l Purpose, similar documents, common sense l Computers are faster and cheaper. –Faster yes, Cheaper ? –Cost of poorer quality categorization l Intranet: 20,000 users taking 60 seconds longer = $20,000 a week

12 News Feeds - Corporate Intranets l News Feeds and Content providers –uniform content, size and structure –professional writers –Simple or standard vocabulary l Corporate intranet –Wildly varied content –Mix of good, bad, and ugly writers –Tower of Babel: Acronyms, special meanings

13 Auto-Categorization: the How l Automatic Methods n Catalog by Example –Training Sets (5-500) –Bag of Words or language and concepts n Statistical Clustering –Set of Documents & Taxonomy Level l Semi-Automatic: Rules

14 Auto-Categorization: the How l Next Generation n Support Vector Machines n Machine Learning n World Knowledge l Incremental Improvement n From 75% to 85% l Critical Issue: Integration

15 Categorization Explosion l Autonomy l Semio l Verity l Inxight l Topical Net l Mohomine l Simile l H5Technologies l YellowBrix l GammaSite l MetaTagger l Applied Semantics l Sageware l SmartLogik l Quiver l Stratify l Vivisimo l Other - Tacit

16 Auto-Categorization: Features l The Categorization Algorithm n SVM – Vector space is an improvement n Higher Accuracy n Fewer documents for training set n White Box – customize recall & precision n Categorize multiple file types & sizes l Clustering – Taxonomy Builder

17 Auto-Categorization: Features l Support Distributed Activities –Distributed work flow: authors, subject matter experts, information architects –Provisional categorization, keywords, meta data –Automatic summarization –Ease of Use, Integration with CM and Search l Integrate with Rules, Meta Data n Content to Context

18 Auto-Categorization: Features l Platform for Knowledge Retrieval n World Knowledge –Pre-Built Categories –Rich Semantic Net (WordNet+) –Entity Extraction n Integration –Specialized Audiences & Vocabularies –Content, Expertise, Communities, Activities

19 The Answer is Cyborg l Automatic Categorization is Not. l Professional Services: Initial Taxonomy l Cyborg: Human and Automatic Integration –Distributed Work Flow l Cyborg Integration with Content Management, Search

20 Content Management and Taxonomy l Taxonomic Publishing Model n Publish by Category, not web site l Web site the wrong unit of organization n 10 pages to 10,000 pages n 10 users to 20,000 users n 1 activity to 100’s of activities

21 Content Management and Taxonomy l Content Re-Organization n Support Browse by Topic, Type, Task n Rich Web of Related Content –Product information l Basic Info + background contexts l Legal / Policy contexts l Technical Contexts l Customer / Task contexts

22 Content Management and Taxonomy l Content Re-Organization: Next Steps n Document can be wrong unit of organization n Information / Learning objects n XML based objects: reuse, combine (relations and contexts) in more flexible and sophsticated ways.

23 Content Management: Re-organize Authoring l Streamline Authoring n Minimize IT / Web Developer Bottleneck l Integrated Work Flow & Categorization n Central: Librarian and/or Information Architects n Distributed: content owners, authors, SME’s n Distributed Categorization, Meta Data

24 Applied Taxonomy: Search l Intranet Environments l Case Studies: n Meta Data n Browse / Search Model

25 Intranet Environments l Global, Distributed l Variety of Documents, People, Activities l 100’s independent Web Sites l Documents, Databases, Applications © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)

26 Meta Data: Dublin Core+ l Title l Description l Keywords l Creator l Publisher l ContentType l Audience l SectionName l Language l Contributor l Contributor.Technical l Date.Created l Date.Review l Format l Identifier l Rights © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)

27 Controlled Vocabularies l ContentType n Application n Calendar n Form n FAQ n Mission n Reference n Training l Audience n Function –Project Manager –Trainer n Enterprise –Retail –Technology n Role –Admin Assistant –Officer © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)

28 First Generation Browse Taxonomy l News l Education & Training l HR / Benefits l Employee Services & Programs l Departments l Communities l Tools, Forms, Calendars l How To/ FAQ’s l Products l Reference & Resources © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)

29

30

31

32

33 Future Directions l Extending Taxonomies n Richer World Knowledge n Smarter Learning n Additional Content: Databases, Word Docs on network drive, Email n Integration of external content

34 Future Directions l Integration: Creation to Retrieval n Collaborative Filtering and Categorization l Integration throughout the Enterprise n People, Communities, Expertise l Contexualizing content n Related topics and related contexts l Categories for Stories


Download ppt "Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant."

Similar presentations


Ads by Google