Semantic Infrastructure Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Slides:



Advertisements
Similar presentations
Expertise Analysis Sentiment Plus Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advertisements

Taxonomy Development in an Enterprise Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Buy, Build, Automate: Why you should Buy Your Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Search, Browse, and Faceted Navigation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development Case Studies
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Overview of Search Engines
Cyborg Categorization Salvation for Search? Tom Reamy Information Architect Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights.
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics And Text Mining Best of Text and Data
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Taxonomies and Faceted Navigation Getting the Best of Both
Basic Level Categories for Knowledge Representation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Mashup Mindset Moving Mashups to Next Level Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Social Media Social Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
Faceted Navigation Design Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Tom Reamy Chief Knowledge Architect KAPS Group
Enterprise Social Networks A New Semantic Foundation
Taxonomies, Lexicons and Organizing Knowledge
Expertise Location Basic Level Categories
Presentation transcript:

Semantic Infrastructure Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2 Agenda  Search and Semantic Infrastructure – Elements /Rich Dynamic Results – Different Environments – Design Issues  Platform for Information Applications – Multiple Applications – Case Study – Categorization & Sentiment – Case Study – Taxonomy Development – Case Study – Expertise & Sentiment & Beyond  Conclusions

3 A Semantic Infrastructure Approach to Search Elements  Multiple Knowledge Structures – Facet – orthogonal dimension of metadata – Taxonomy - Subject matter / aboutness – Ontology – Relationships / Facts Subject – Verb - Object  Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining  People – tagging, evaluating tags, fine tune rules and taxonomy  People – Users, social tagging, suggestions  Rich Search Results – context and conversation

4 A Semantic Infrastructure Approach to Search: Rich Results  Elements – Faceted Navigation – Categorization – metadata and/or dynamic – Tag Clouds – clustering – User Tags, personalization – Related topics – discovery  Supports all manner of search behaviors and needs – Find known items – zero in with facets – Discovery – Tags clouds, user tags, related topics – Deep dive - categorization

5

6

7

8 A Semantic Infrastructure Approach to Search: Three Environments  E-Commerce – Catalogs, small uniform collections of entities – Conflict of information and Selling – Uniform behavior – buy this  Enterprise – More content, more types of content – Enterprise Tools – Search, ECM – Publishing Process – tagging, metadata standards  Internet – Wildly different amount and type of content, no taggers – General Purpose – Flickr, Yahoo – Vertical Portal – selected content, no taggers

9 A Semantic Infrastructure Approach to Search: Enterprise Environment –Taxonomy, 7 facets  Taxonomy of Subjects / Disciplines: – Science > Marine Science > Marine microbiology > Marine toxins  Facets: – Organization > Division > Group – Clients > Federal > EPA – Instruments > Environmental Testing > Ocean Analysis > Vehicle – Facilities > Division > Location > Building X – Methods > Social > Population Study – Materials > Compounds > Chemicals – Content Type – Knowledge Asset > Proposals

10 A Semantic Infrastructure Approach to Search: Internet Design  Subject Matter taxonomy – Business Topics – Finance > Currency > Exchange Rates  Facets – Location > Western World > United States – People – Alphabetical and/or Topical - Organization – Organization > Corporation > Car Manufacturing > Ford – Date – Absolute or range ( to , last 30 days) – Publisher – Alphabetical and/or Topical – Organization – Content Type – list – newspapers, financial reports, etc.

11

12 Rich Search Results Design Issues - General  What is the right combination of elements? – Faceted navigation, metadata, browse, search, categorized search results, file plan  What is the right balance of elements? – Dominant dimension or equal facets – Browse topics and filter by facet  When to combine search, topics, and facets? – Search first and then filter by topics / facet – Browse/facet front end with a search box

13 Rich Search Results Design Issues - General  Homogeneity of Audience and Content  Model of the Domain – broad – How many facets do you need? – More facets and let users decide – Allow for customization – can’t define a single set  User Analysis – tasks, labeling, communities Issue – labels that people use to describe their business and label that they use to find information  Match the structure to domain and task – Users can understand different structures

14 Rich Search Results Automatic Facets – Special Issues  Scale requires more automated solutions – More sophisticated rules  Rules to find and populate existing metadata – Variety of types of existing metadata – Publisher, title, date – Multiple implementation Standards – Last Name, First / First Name, Last  Issue of disambiguation: – Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford – Same word, different entity – Ford and Ford  Number of entities and thresholds per results set / document – Usability, audience needs  Relevance Ranking – number of entities, rank of facets

15 Semantic Infrastructure for Search Based Apps Multiple Applications  Platform for Information Applications – Content Aggregation – Duplicate Documents – save millions! – Text Mining – BI, CI – sentiment analysis – Combine with Data Mining – disease symptoms, new Predictive Analytics – Social – Hybrid folksonomy / taxonomy / auto-metadata – Social – expertise, categorize tweets and blogs, reputation – Ontology – travel assistant – SIRI  Use your Imagination!

16 Semantic Infrastructure for Search Apps Multiple Applications  SIRI – Travel Assistant

Semantic Infrastructure for Search Apps Case Study – Categorization & Sentiment  Call Motivation – Categorization – Motivation Taxonomy – Purpose of previous calls to understand current call – Issues of scale, small size of documents, jargon, spelling  Customer Sentiment – Telecom Forums – Feature level – not just products – Issue of context - sarcasm, jargon  Knowledge Base – Categorization, Product extraction, expertise-sentiment analysis – Social Media as source for solutions 17

Case Study – Categorization & Sentiment 18

Case Study – Categorization & Sentiment 19

20 Sentiment Analysis Development Process  Combination of Statistical and categorization rules  Start with Training sets – examples of positive, negative, neutral documents  Develop a Statistical Model  Generate domain positive and negative words and phrases  Develop a taxonomy of Products & Features  Develop rules for positive and negative statements  Test and Refine  Test and Refine again

21

22

23

24

25

Semantic Infrastructure for Search Apps Case Study – Taxonomy Development  Problem – 200,000 new uncategorized documents  Old taxonomy –need one that reflects change in corpus  Text mining, entity extraction, categorization  Content – 250,000 large documents, search logs, etc.  Bottom Up- terms in documents – frequency, date,  Clustering – suggested categories  Clustering – chunking for editors  Entity Extraction – people, organizations, Programming languages  Time savings – only feasible way to scan documents  Quality – important terms, co-occurring terms 26

Case Study – Taxonomy Development 27

Case Study – Taxonomy Development 28

Case Study – Taxonomy Development 29

30 Semantic Infrastructure Applications Expertise Analysis  Sentiment Analysis to Expertise Analysis(KnowHow) – Know How, skills, “tacit” knowledge  No single correct categorization – Women, Fire, and Dangerous Things – Types of Animals Those that belong to the Emperor Embalmed Ones Suckling Pigs Fabulous Ones Those that are included in this classification Those that tremble as if they were mad Other

31 Semantic Infrastructure Applications Expertise Analysis – Basic Level Categories  Mid-level in a taxonomy / hierarchy  Short and easy words  Maximum distinctness and expressiveness  First level named and understood by children  Level at which most of our knowledge is organized  Levels: Superordinate – Basic – Subordinate – Mammal – Dog – Golden Retriever – Furniture – chair – kitchen chair

32 Semantic Infrastructure Applications Expertise Analysis  Experts prefer lower, subordinate levels – In their domain, (almost) never used superordinate  Novice prefer higher, superordinate levels  General Populace prefers basic level  Not just individuals but whole societies / communities differ in their preferred levels  Issue – artificial languages – ex. Science discipline  Issue – difference of child and adult learning – adults start with high level

33 Semantic Infrastructure Applications Expertise Analysis  What is basic level is context(s) dependent – Document/author expert in news health care, not research  Hybrid – simple high level taxonomy (superordinate), short words – basic, longer words – expert Plus  Develop expertise rules – similar to categorization rules – Use basic level for subject – Superordinate for general, subordinate for expert  Also contextual rules – “Tests” is general, high level – “Predictive value of tests” is lower, more expert – If terms appear in same sentence - expert

34 ExpertGeneral Research (context dependent)Kid StatisticalPay Program performanceClassroom ProtocolFail Adolescent AttitudesAttendance Key academic outcomesSchool year Job training programClosing American Educational Research AssociationCounselor Graduate management educationDiscipline Education Terms

35 ExpertGeneral MouseCancer DoseScientific ToxicityPhysical DiagnosticConsumer MammographyCigarette SamplingSmoking InhibitorWeight gain EdemaCorrect NeoplasmsEmpirical IsotretinionDrinking EthyleneTesting SignificantlyLesson Population-baseKnowledge PharmacokineticMedicine MetaboliteSociology PolymorphismTheory SubsyndromicExperience RadionuclideServices EtiologyHospital OxidaseSocial CaptoprilDomestic Pharmacological agents Dermatotoxicity Mammary cancer model Biosynthesis Healthcare Terms

36 Expertise Analysis Expertise – application areas  Taxonomy / Ontology development /design – audience focus – Card sorting – non-experts use superficial similarities  Business & Customer intelligence – add expertise to sentiment – Deeper research into communities, customer s  Text Mining - Expertise characterization of writer, corpus  eCommerce – Organization/Presentation of information – expert, novice  Expertise location- Generate automatic expertise characterization based on documents  Experiments - Pronoun Analysis – personality types – Essay Evaluation Software - Apply to expertise characterization Model levels of chunking, procedure words over content

37 Beyond Sentiment: Behavior Prediction Case Study – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Analyze customer support notes  General issues – creative spelling, second hand reports  Develop categorization rules – First – distinguish cancellation calls – not simple – Second - distinguish cancel what – one line or all – Third – distinguish real threats

38 Beyond Sentiment Behavior Prediction – Case Study  Basic Rule – (START_20, (AND, – (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act – ask about the contract expiration date as she wanted to cxl teh acct Combine sophisticated rules with sentiment statistical training and Predictive Analytics

39 Beyond Sentiment - Wisdom of Crowds Crowd Sourcing Technical Support  Example – Android User Forum  Develop a taxonomy of products, features, problem areas  Develop Categorization Rules: – “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” – Find product & feature – forum structure – Find problem areas in response, nearby text for solution  Automatic – simply expose lists of “solutions” – Search Based application  Human mediated – experts scan and clean up solutions

40 Semantic Infrastructure: A Platform for KM Applications  Expertise Location – Individuals and Communities  Knowledge Sharing – Com. Of Practice – Find right person better – Knowledge representation to support better sharing – Enhance sharing as well as sub for person  Knowledge Base // Portal – Greatly improved – find what you are looking for – New kinds of presentations – rich search to dynamic graphs  Process – deliver rich K representation in work flow – SIRI+

Text Analytics: Future Directions  Start with the 80% of significant content that is not data – Enterprise search, content management, Search based applications  Text Analytics and Text Mining – Text Analytics turns text into data – Build better TM Apps – Better extraction and add Subject / Concepts – Sentiment and Beyond – Behavior, Expertise  Text Mining and Text Analytics – TM enriching TA – Taxonomy development – New Content Structures, ensemble models  Text Analytics and Predictive Analytics – More content, New content – social, interactive – CSR – New sources of content/data = new & better apps 41

42 Semantic Infrastructure Approach Conclusions  Semantic Infrastructure solution (people, policy, technology, semantics) and feedback is best approach  Foundation – Hybrid ECM model with text analytics, Search  Integrated information, knowledge, and semantics  Semantic Infrastructure as a platform for multiple applications – Build on infrastructure for economy and quality  Text Analytics (Entity extraction and auto-categorization, sentiment analysis) are essential  Future – new kinds of applications: – Text Mining and Data mining, research tools, sentiment – Beyond Sentiment – expertise applications – NeuroAnalytics – cognitive science meets search and more Watson is just the start

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services

44 Resources  Books – Women, Fire, and Dangerous Things George Lakoff – Knowledge, Concepts, and Categories Koen Lamberts and David Shanks – Formal Approaches in Categorization Ed. Emmanuel Pothos and Andy Wills – The Mind Ed John Brockman Good introduction to a variety of cognitive science theories, issues, and new ideas – Any cognitive science book written after 2009

45 Resources  Conferences – Web Sites – Text Analytics World – – Text Analytics Summit – – Semtech –

46 Resources  Blogs – SAS-  Web Sites – Taxonomy Community of Practice: – LindedIn – Text Analytics Summit Group – – Whitepaper – CM and Text Analytics - eetstextanalytics.pdf eetstextanalytics.pdf – Whitepaper – Enterprise Content Categorization strategy and development –

47 Resources  Articles – Malt, B. C Category coherence in cross-cultural perspective. Cognitive Psychology 29, – Rifkin, A Evidence for a basic level in event taxonomies. Memory & Cognition 13, – Shaver, P., J. Schwarz, D. Kirson, D. O’Conner Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, – Tanaka, J. W. & M. E. Taylor Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23,