Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Similar presentations


Presentation on theme: "Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services"— Presentation transcript:

1 Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

2 2 Agenda  Text Analytics Applications – Integration with Search –Faceted Navigation – Integration with ECM Metadata Auto-categorization – Platform for Information Applications Enterprise – internal and external Commercial Structure for Social

3 3 Text Analytics and Search - Elements  Facet – orthogonal dimension of metadata  Entity / Noun Phrase – metadata value of a facet  Entity extraction – feeds facets, signature, ontologies  Taxonomy and categorization rules  Auto-categorization – aboutness, subject facets  People – tagging, evaluating tags, fine tune rules and taxonomy

4 4 Essentials of Facets  Facets are not categories – Categories are what a document is about – limited number – Entities are contained within a document – any number  Facets are orthogonal – mutually exclusive – dimensions – An event is not a person is not a document is not a place.  Facets – variety – of units, of structure – Numerical range (price), Location – big to small – Alphabetical, Hierarchical – taxonomic  Facets are designed to be used in combination Wine where color = red, price = excessive, location = Calirfornia, And sentiment = snotty

5 5 Advantages of Faceted Navigation  More intuitive – easy to guess what is behind each door Simplicity of internal organization 20 questions – we know and use  Dynamic selection of categories Allow multiple perspectives Ability to Handle Compound Subjects  Systematic Advantages – fewer elements – 4 facets of 10 nodes = 10,000 node taxonomy – Ability to Handle Compound Subjects  Flexible – can be combined with other navigation elements

6 6 Developing Facets: Tools and Techniques Software Tools – Entity Extraction  Dictionaries – variety of entities, coverage, specialty – Cost of update – service or in-house – 50+ predefined entity types – 800,000 people, 700,000 locations, 400,000 organizations  Rules – Capitalization, text – Mr., Inc. – Advanced – proximity and frequency of actions, associations – Need people to continually refine the rules  Entities and Categorization – Total number and pattern of entities = a type of aboutness of the document – Bar Code, Fingerprint – SAS – integration of entities (concepts) and categorization

7 7 Three Environments  E-Commerce – Catalogs, small uniform collections of entities – Uniform behavior – buy this  Enterprise – More content, more types of content – Enterprise Tools – Search, ECM – Publishing Process – tagging, metadata standards  Internet – Wildly different amount and type of content, no taggers – General Purpose – Flickr, Yahoo – Vertical Portal – selected content, no taggers

8 8 Three Environments: E-Commerce

9 9

10 10 Enterprise Environment – When and how add metadata  Enterprise Content – different world than eCommerce – More Content, more kinds, more unstructured – Not a catalog to start – less metadata and structured content – Complexity -- not just content but variety of users and activities  Combination of human and automatic metadata – ECM – Software aided - suggestions, entities, ontologies  Enterprise – Question of Balance / strategy – More facets = more findability (up to a point) – Fewer facets = lower cost to tag documents  Issues – Not enough facets – Wrong set of facets – business not information – Ill-defined facets – too complex internal structure

11 11 Facets and Taxonomies Enterprise Environment –Taxonomy, 7 facets  Taxonomy of Subjects / Disciplines: – Science > Marine Science > Marine microbiology > Marine toxins  Facets: – Organization > Division > Group – Clients > Federal > EPA – Instruments > Environmental Testing > Ocean Analysis > Vehicle – Facilities > Division > Location > Building X – Methods > Social > Population Study – Materials > Compounds > Chemicals – Content Type – Knowledge Asset > Proposals

12 12 External Environment – Text Mining, Vertical Portals  Internet Content – Scale – impacts design and technology – speed of indexing – Limited control – Association of publishers to selection of content to none – Major subtypes – different rules – metadata and results  Complex queries and alerts – Terrorism taxonomy + geography + people + organizations  Text Mining – General or specific content and facets and categories – Dedicated tools or component of Portal – internal or external  Vertical Portal – Relatively homogenous content and users – General range of questions – More specific targets – the document, not a web site

13 13 Internet Design  Subject Matter taxonomy – Business Topics – Finance > Currency > Exchange Rates  Facets – Location > Western World > United States – People – Alphabetical and/or Topical - Organization – Organization > Corporation > Car Manufacturing > Ford – Date – Absolute or range (1-1-01 to 1-1-08, last 30 days) – Publisher – Alphabetical and/or Topical – Organization – Content Type – list – newspapers, financial reports, etc.

14 14

15 15

16 16

17 17 Integrated Facet Application Design Issues - General  What is the right combination of elements? – Faceted navigation, metadata, browse, search, categorized search results, file plan  What is the right balance of elements? – Dominant dimension or equal facets – Browse topics and filter by facet  When to combine search, topics, and facets? – Search first and then filter by topics / facet – Browse/facet front end with a search box

18 18 Integrated Facet Application Design Issues - General  Homogeneity of Audience and Content  Model of the Domain – broad – How many facets do you need? – More facets and let users decide – Allow for customization – can’t define a single set  User Analysis – tasks, labeling, communities Issue – labels that people use to describe their business and label that they use to find information  Match the structure to domain and task – Users can understand different structures

19 19 Automatic Facets – Special Issues  Scale requires more automated solutions – More sophisticated rules  Rules to find and populate existing metadata – Variety of types of existing metadata – Publisher, title, date – Multiple implementation Standards – Last Name, First / First Name, Last  Issue of disambiguation: – Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford – Same word, different entity – Ford and Ford  Number of entities and thresholds per results set / document – Usability, audience needs  Relevance Ranking – number of entities, rank of facets

20 20 Putting it all together – Infrastructure Solution  Facets, Taxonomies, Software, People  Combine formal power with ability to support multiple user perspectives  Facet System – interdependent, map of domain  Entity extraction – feeds facets, signatures, ontologies  Taxonomy & Auto-categorization – aboutness, subject  People – tagging, evaluating tags, fine tune rules and taxonomy  The future is the combination of simple facets with rich taxonomies with complex semantics / ontologies

21 21 Putting it all together – Infrastructure Solution  Integration with ECM – Central Team – Metadata – Create dictionaries of entities Develop text analytics catalogs – Publishing Process Software suggests entities, categorization Authors task is simple – yes or no, not think of keyword  Enterprise Search – Integrate at metadata level – build advanced presentation and refine results – Integrate into relevance

22 22 Text Analytics Platform – Multiple Applications  Platform for Information Applications – Content Aggregation – Duplicate Documents – save millions! – Text Mining – BI, CI – sentiment analysis – Social – Hybrid folksonomy / taxonomy / auto-metadata – Social – expertise, categorize tweets and blogs, reputation – Ontology – travel assistant – SIRI  Integrate with Applications  Text into data – predictive analytics  Use your Imagination!

23 23 New Applications in Social Media Behavior Prediction – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Analyze customer support notes  General issues – creative spelling, second hand reports  Develop categorization rules – First – distinguish cancellation calls – not simple – Second - distinguish cancel what – one line or all – Third – distinguish real threats

24 24 New Applications in Social Media Behavior Prediction – Telecom Customer Service  Basic Rule – (START_20, (AND, – (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act – ask about the contract expiration date as she wanted to cxl teh acct  Combine sophisticated rules with sentiment statistical training and Predictive Analytics and behavior monitoring

25 25 New Applications: Wisdom of Crowds Crowd Sourcing Technical Support  Example – Android User Forum  Develop a taxonomy of products, features, problem areas  Develop Categorization Rules: – “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” – Find product & feature – forum structure – Find problem areas in response, nearby text for solution  Automatic – simply expose lists of “solutions” – Search Based application  Human mediated – experts scan and clean up solutions

26 26 New Directions in Social Media Text Analytics, Text Mining, and Predictive Analytics  Two Systems of the Brain – Fast, System 1, Immediate patterns (TM) – Slow, System 2, Conceptual, reasoning (TA)  Text Analytics – pre-processing for TM – Discover additional structure in unstructured text – Behavior Prediction – adding depth in individual documents – New variables for Predictive Analytics, Social Media Analytics – New dimensions – 90% of information  Text Mining for TA– Semi-automated taxonomy development – Bottom Up- terms in documents – frequency, date, clustering – Improve speed and quality – semi-automatic

27 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com


Download ppt "Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services"

Similar presentations


Ads by Google