Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Slides:

Advertisements

Similar presentations

Taxonomy Development in an Enterprise Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Advertisements

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Buy, Build, Automate: Why you should Buy Your Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.

Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.

Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.

Search, Browse, and Faceted Navigation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Development Case Studies

Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.

Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.

Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Chapter 8: Development of Business Intelligence

Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Cyborg Categorization Salvation for Search? Tom Reamy Information Architect Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights.

Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics And Text Mining Best of Text and Data

Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group

SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group

Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.

Taxonomies and Faceted Navigation Getting the Best of Both

Mashup Mindset Moving Mashups to Next Level Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.

Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.

Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.

Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.

Faceted Navigation Design Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Approaching a Problem Where do we start? How do we proceed?

Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.

Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.

Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant.

Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Enterprise Social Networks A New Semantic Foundation

Taxonomies, Lexicons and Organizing Knowledge

Text Analytics Workshop: Introduction

Text Analytics Workshop

Program Chair: Tom Reamy Chief Knowledge Architect

Presentation transcript:

Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2 Agenda  Text Analytics – Foundation – Features and Capabilities  Evaluation of Text Analytics – Start with Self-Knowledge – Features and Capabilities – Filter, Proof of Concept / Pilot  Text Analytics Development – Progressive Refinement – Categorization, Extraction, Sentiment – Case Studies – Best Practices

3 Semantic Infrastructure - Foundation Text Analytics Features  Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets  Summarization – Customizable rules, map to different content  Fact Extraction – Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc.  Sentiment Analysis – Rules – Objects and phrases – positive and negative

4 Semantic Infrastructure - Foundation Text Analytics Features  Auto-categorization – Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – NEAR (#), PARAGRAPH, SENTENCE  This is the most difficult to develop  Build on a Taxonomy  Combine with Extraction – If any of list of entities and other words

5

6

7

8

9

10

11

12

13

14 Evaluating Text Analytics Software Start with Self Knowledge  Strategic and Business Context  Info Problems – what, how severe  Strategic Questions – why, what value from the taxonomy/text analytics, how are you going to use it  Formal Process - KA audit – content, users, technology, business and information behaviors, applications - Or informal for smaller organization,  Text Analytics Strategy/Model – forms, technology, people – Existing taxonomic resources, software  Need this foundation to evaluate and to develop

15 Evaluating Text Analytics Software Start with Self Knowledge  Do you need it – and what blend if so?  Taxonomy Management Stand alone – Multiple taxonomies, languages, authors-editors  Technology Environment – ECM, Enterprise Search – where is it embedded  Publishing Process – where and how is metadata being added – now and projected future – Can it utilize auto-categorization, entity extraction, summarization  Is the current search adequate – can it utilize text analytics?  Applications – text mining, BI, CI, Alerts?

Evaluating Text Analytics Software Team - Interdisciplinary  IT – Large software purchase, needs assessment Text Analytics is different – semantics Construction company designing your house  Business – Understand the business needs Don’t understand information Restaurant owner doing the cooking  Library - know information, search Don’t understand the business, non-information experts Accountant doing financial strategy  Team – combination of consulting and internal 16

Semantic Infrastructure - Foundation Design of the Text Analytics Selection Team  Interdisciplinary Team, led by Information Professionals – IT – software experience, budget, support tests – Business – understand business and requirements – Library – understand information structure, understanding of search semantics and functionality  Much more likely to make a good decision – This is not a traditional IT software evaluation – semantics  Create the foundation for implementation 17

Evaluating Text Analytics Software Evaluation Process & Methodology: Two Phases  Phase I – Traditional Software Evaluation – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. – Filter Two - Feature scorecard – minimum, must have, filter to top 3 – Filter Three – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Four – In-Depth Demo – 3-6 vendors  Phase II - Deep POC (2) – advanced, integration, semantics 18

Evaluating Text Analytics Software Phase II - Proof Of Concept - POC  4-6 weeks POC – bake off / or short pilot  Measurable Quality of results is the essential factor  Real life scenarios, categorization with your content  2-3 rounds of development, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content  Majority of time is on auto-categorization  Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time  Taxonomy Developers – expert consultants plus internal taxonomists 19

Evaluating Text Analytics Software Phase II – POC: Range of Evaluations  Basic Question – Can this stuff work at all?  Auto-categorization to existing taxonomy – variety of content – Essential Issue is complexity of language  Clustering – automatic node generation  Summarization  Entity extraction – build a number of catalogs – design which ones based on projected needs – example privacy info (SS#, phone, etc.)  Entity example –people, organization, methods, etc. – Essential issue is scale and disambiguation  Evaluate usability in action by taxonomists 20

21 Text Analytics Evaluation: Case Study Self-Knowledge  Platform – range of capabilities – Categorization, Sentiment analysis, etc.  Technical – API’s, Java based, Linux run time – Scalability – millions of documents a day – Import-Export – XML, RDF  Total Cost of Ownership  Vendor Relationship - OEM  Usability, Multiple Language Support  Team – 3 KAPS - Information  5-8 Amdocs – SME - business, Technical.

Text Analytics Evaluation: Case Study Phase I – Case Study – Attensity – SAP – Inxight – Clarabridge – ClearForest – Concept Searching – Data Harmony / Access Innovations – Expert Systems – GATE (Open Source) – IBM – Lexalytics – Multi-Tes – Nstein – SAS – SchemaLogic – Smart Logic – Content Management – Enterprise Search – Sentiment Analysis Specialty – Ontology Platforms 22

Text Analytics Evaluation: Case Study Case Study: Telecom Service  Company History, Reputation  Full Platform –Categorization, Extraction, Sentiment  Integration – java, API-SDK, Linux  Multiple languages  Scale – millions of docs a day  Total Cost of Ownership  Ease of Development - new  Vendor Relationship – OEM, etc.  Expert Systems  IBM  SAS - Teragram  Smart Logic  Option – Multiple vendors – Sentiment & Platform  IBM and SAS – finalists 23

24 Text Analytics Evaluation: Case Study POC Design Discussion: Evaluation Criteria  Basic Test Design – categorize test set – Score – by file name, human testers  Categorization – Call Motivation – Accuracy Level – 80-90% – Effort Level per accuracy level  Sentiment Analysis – Accuracy Level – 80-90% – Effort Level per accuracy level  Quantify development time – main elements  Comparison of two vendors – how score? – Combination of scores and report

Text Analytics Evaluation: Case Study Phase II – POC: Risks  CIO/CTO Problem –This is not a regular software process  Language is messy not just complex – 30% accuracy isn’t 30% done – could be 90%  Variability of human categorization / expression – Even professional writers – journalists examples  Categorization is iterative, not “the program works” – Need realistic budget and flexible project plan  Anyone can do categorization – Librarians often overdo, SME’s often get lost (keywords)  Meta-language issues – understanding the results – Need to educate IT and business in their language 25

Text Analytics POC Outcomes Categorization Results SASIBM Recall-Motivation Recall-Actions Precision – Mot.84.3 Precision-Act100 Uncategorized87.5 Raw Precision

Text Analytics POC Outcomes Vendor Comparisons  Categorization Results – both good, edge to SAS on precision – Use of Relevancy to set thresholds  Development Environment – IBM as toolkit provides more flexibility but it also increases development effort  Methodology – IBM enforces good method, but takes more time – SAS can be used in exactly the same way  SAS has a much more complete set of operators – NOT, DIST, START 27

Text Analytics POC Outcomes Vendor Comparisons - Functionality  Sentiment Analysis – SAS has workbench, IBM would require more development – SAS also has statistical modeling capabilities  Entity and Fact extraction – seems basically the same – SAS and use operators for improved disambiguation –  Summarization – SAS has built-in – IBM could develop using categorization rules – but not clear that would be as effective without operators  Conclusion: Both can do the job, edge to SAS  Now the fun begins - development 28

29 Text Analytics Development: Foundation  Articulated Information Management Strategy (K Map) – Content and Structures and Metadata – Search, ECM, applications - and how used in Enterprise – Community information needs and Text Analytics Team  POC establishes the preliminary foundation – Need to expand and deepen – Content – full range, basis for rules-training – Additional SME’s – content selection, refinement  Taxonomy – starting point for categorization / suitable?  Databases – starting point for entity catalogs

30 Text Analytics Development Enterprise Environment – Case Studies  A Tale of Two Taxonomies – It was the best of times, it was the worst of times  Basic Approach – Initial meetings – project planning – High level K map – content, people, technology – Contextual and Information Interviews – Content Analysis – Draft Taxonomy – validation interviews, refine – Integration and Governance Plans

31 Text Analytics Development Enterprise Environment – Case One – Taxonomy, 7 facets  Taxonomy of Subjects / Disciplines: – Science > Marine Science > Marine microbiology > Marine toxins  Facets: – Organization > Division > Group – Clients > Federal > EPA – Instruments > Environmental Testing > Ocean Analysis > Vehicle – Facilities > Division > Location > Building X – Methods > Social > Population Study – Materials > Compounds > Chemicals – Content Type – Knowledge Asset > Proposals

32 Text Analytics Development Enterprise Environment – Case One – Taxonomy, 7 facets  Project Owner – KM department – included RM, business process  Involvement of library - critical  Realistic budget, flexible project plan  Successful interviews – build on context – Overall information strategy – where taxonomy fits  Good Draft taxonomy and extended refinement – Software, process, team – train library staff – Good selection and number of facets  Final plans and hand off to client

33 Text Analytics Development Enterprise Environment – Case Two – Taxonomy, 4 facets  Taxonomy of Subjects / Disciplines: – Geology > Petrology  Facets: – Organization > Division > Group – Process > Drill a Well > File Test Plan – Assets > Platforms > Platform A – Content Type > Communication > Presentations

34 Text Analytics Development Enterprise Environment – Case Two – Taxonomy, 4 facets  Environment Issues – Value of taxonomy understood, but not the complexity and scope – Under budget, under staffed – Location – not KM – tied to RM and software Solution looking for the right problem – Importance of an internal library staff – Difficulty of merging internal expertise and taxonomy

35 Text Analytics Development Enterprise Environment – Case Two – Taxonomy, 4 facets  Project Issues – Project mind set – not infrastructure – Wrong kind of project management Special needs of a taxonomy project Importance of integration – with team, company – Project plan more important than results Rushing to meet deadlines doesn’t work with semantics as well as software

36 Text Analytics Development Enterprise Environment – Case Two – Taxonomy, 4 facets  Research Issues – Not enough research – and wrong people – Interference of non-taxonomy – communication – Misunderstanding of research – wanted tinker toy connections Interview 1 implies conclusion A  Design Issues – Not enough facets – Wrong set of facets – business not information – Ill-defined facets – too complex internal structure

37 Text Analytics Development Conclusion: Risk Factors  Political-Cultural-Semantic Environment – Not simple resistance - more subtle – re-interpretation of specific conclusions and sequence of conclusions / Relative importance of specific recommendations  Understanding project scope  Access to content and people – Enthusiastic access  Importance of a unified project team – Working communication as well as weekly meetings

38 Text Analytics Development Case Study 2 – POC – Telecom Client  Demo of SAS - / Enterprise Content Categorization

39 Text Analytics Development Best Practices - Principles  Importance of ongoing maintenance and refinement  Need dedicated taxonomy team working with SME’s  Work with application developers to incorporate text analytics into new applications  Importance of metrics and feedback – Software and social  Questions: – What are important subjects (and changes) – What information do they need? – How is their information related to other silos?

40 Text Analytics Development Best Practices - Principles  Process – Realistic Budget – not a nice to have add on – Flexible Project plan - semantics are complex and messy Time estimates are difficult, object success measures are too – Transition from development to maintenance is fluid  Resources – Interdisciplinary Team is essential – Importance of communication – languages – Merging internal and external expertise

41 Text Analytics Development Best Practices - Principles  Categorization taxonomy structure – Tradeoff of depth and complexity of rules – Multiple avenues – facets, terms, rules, etc. No right balance – Recall-precision balance is application specific – Training sets of starting points, rules rule – Need for custom development  Technology – Basic integration – XML – Advanced –combine unstructured and structured in new ways

42 Text Analytics Development Best Practices – Risk Factors  Value understood, but not the complexity and scope  Project mindset – software project and then done  Not enough research on user information needs, behaviors – Talking to the right people and asking the right questions – Getting beyond “All of the Above” surveys  Not enough resources, wrong resources  Enthusiastic access to content and people  Bad design – starting with the wrong type of taxonomy  Categorization is not library science – More like cognitive anthropology

43 Semantic Infrastructure Development Conclusion  Text Analytics is the Foundation for Semantic infrastructure  Evaluation of Text Analytics – different than IT software – POC – essential, foundation of development – Difference of taxonomy and categorization Concepts vs. text in documents  Enterprise Context – strategic, self-knowledge – Infrastructure resource, not a project – Interdisciplinary Team and applications  Integration with other initiatives and technologies – Text Mining, Data Mining, Sentiment & beyond, Everything!

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services