Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Chair: Tom Reamy Chief Knowledge Architect

Similar presentations


Presentation on theme: "Program Chair: Tom Reamy Chief Knowledge Architect"— Presentation transcript:

1 Program Chair: Tom Reamy Chief Knowledge Architect
Text Analytics Forum Program Chair: Tom Reamy Chief Knowledge Architect KAPS Group Author: Deep Text

2 Agenda Introduction – Welcome Text Analytics Introduction
Overview of Conference Text Analytics Introduction What is it? What is it good for? Results of TAF Survey Key Ideas – Present and Future Questions

3 Text Analytics Forum (TAF) Introduction Conference Highlights and Themes
Newest member of Info Today family of conferences First year of many? KMWorld – TA is a means of enriching KM Enriched content, expertise, collaboration TBC – TA Minds the Gap between taxonomy and content New knowledge organizations, cognitive-based ESD – TA is best means of improving search Faceted search, semi-automated subject tagging SharePoint – all major TA vendors integrate with it Hybrid model – software characterizing document, sent to author/editor for human check

4 Text Analytics Forum (TAF) Introduction Conference Highlights and Themes
Overview of field of text analytics General and current market by Seth Grimes Two tracks – technical / business & applications Could be development and applications Technical AI and TA, Cognitive computing, graph databases, Text and Data ML vs. Rules, taxonomy, Auto-categorization Business / Applications Search & TA, Fake News & Ads, TA and Taxonomy Case Studies, New Applications, Issues in Applications Ask the Experts Panel Some questions about the field of TA We want your questions

5 Text Analytics Forum (TAF) Introduction Deep Text: The Book – Who Am I?
Professional student / independent consultant – all but 6 years History of Ideas to Programmer – AI (Only 2 years away) Games – Galactic Gladiators/Adventures – still available KAPS Group – 13 years, Network of consultants (“hiring”) Taxonomy to text analytics Consulting, development – platform and applications Strategy, Smart Start, Search, Smart Social Media TA Training (1 day to 1 month), TA Audit Partners – Synaptica, SAS, IBM, Expert System, Smartlogic, etc. Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc. Presentations, Articles, White Papers –

6 A treasure trove of technical detail, likely to become a definitive source on text analytics – Kirkus Reviews

7 Text Analytics Forum (TAF) Introduction What is Text Analytics?
Text analytics is the use of software and knowledge models to analyze/utilize structures in poly-structured text. Text Mining – NLP, statistical, predictive, machine learning Different skills, mind set, Math & data not language Annotation/Extraction – entities and facts – known and unknown, concepts, events - catalogs with variants, rule based Sentiment Analysis Entities and sentiment words – statistics & rules Summarization Dynamic – based on a search query term Document – based on primary topics, position in document

8 Text Analytics Forum (TAF) Introduction What is Text Analytics?
Auto-categorization = the brains of the outfit Training sets – Bayesian, Vector space Terms – literal strings, stemming, dictionary of related terms Boolean– Full search syntax – AND, OR, NOT Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE

9 Text Analytics Forum (TAF) Introduction What is Text Analytics Good For?
Just about anything textual you can think of Enterprise: Search, BI, CI, Financial Services, eDiscovery, etc. Fraud – Function word patterns Adding text (depth and intelligence) to all data-based applications Whole new applications – customers likely to cancel, new? Social: Social Media analysis – adding text to data Sentiment analysis – beyond positive and negative Fake news – multiple module model

10 Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017
Important Areas: Business Intelligence – 87% Decision Support - 83% Financial Intelligence – 81% KM-Productivity – 80% Search – Search Apps – 78% Security – 77% Compliance – 76% Voice of Customer – 73% Social Media Analysis – 69%

11 Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017
Who is driving TA? R&D – 25% IT – 22% Rest are minor Factors slowing adoption of TA Lack of Knowledge/value – 43% Financial – 18% Lack of in-house expertise – 11% What new capabilities? Deep Learning, ML, AI – 23%

12 Text Analytics Forum (TAF) Introduction Future Directions: Survey Results – 2017
What do you like about TA software? Ease of Use Configurability Accuracy, quality of results What don’t you like? Difficult No one solution – domain specific Most difficult aspect of TA initiatives? Data Preparation Language complexity Understanding business needs, domain resources

13 Text Analytics Forum (TAF) Introduction
Key Ideas / Trends in Text Analytics

14 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics
AI / Deep Learning to the Rescue? Humans obsolete or empowered? Machine Learning vs. Rules-based Poly-structured text – Content Types and Sections New Knowledge Structures – Cognitive & Social

15 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics: Deep Learning
Neural Networks – from 1980’s New = size and speed Larger networks = can learn better and faster Multiple networks = more “intelligence” – networks output fed to other networks Strongest in areas like image recognition, physical patterns Weakest – concepts, subjects, deep language, metaphors, etc.

16 Text Analytics Forum (TAF) Introduction Deep Text vs. Deep Learning
Deep Learning is a Dead End - accuracy – 60-70% Black Box – don’t know how to improve except indirect manipulation of input Watson – “We don’t know how or why it works” Susceptible to bias – hard to fix Domain Specific, data not deep understanding No common sense (things fall, don’t wink in and out of existence No strategy to get there (faster not enough) Major – loss of quality – who is training who? Project personality and intelligence – on everything! Extra Benefits of a Deep Text Approach – Multiple InfoApps

17 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics: Automatic Taxonomy
Most Text Analytics vendors offered – very poor results, dropped New techniques – getting better but don’t give up your taxonomist day job Automatic – but not a taxonomy – cluster of co-occurring terms Suggest terms and relationships Text mining on steroids “Automatic” – huge human effort to design approach, mathematics, select content, seed taxonomies, keyword selection, data prep – then voila!

18 AI and Taxonomy AI: Past and Present

19 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics
AI / Deep Learning to the Rescue? Machine Learning vs. Rules-based Right kind of rules – general structure Learning with rules, ML with structure Poly-structured text – Content Types and Sections New Knowledge Structures – Cognitive & Social

20 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics
Machine Learning – Deep Learning but less Limited granularity – high level categories, very orthogonal Faster to get started and get to 60% - then the wall ML – scale – can do millions of texts But – both require upfront development – and once done, both can handle the same amount of content Do rules take more effort to develop? Some studies show it is less: “A rule-based system recoups its value in one month, compared with almost five years under the statistics-based approach”

21 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics
Rules-Based Rule-based system reported 92 percent accuracy and a fourfold increase in productivity. Less up front cost, and less time spent refining Statistical Approach Maximum accuracy achieved, 72 percent; productivity doubled. Why do IT departments favor ML? ML uses programmers and statisticians – more of them available than librarians, taxonomists, metadata, puzzle people Future = Combine machine learning and rules Application Level to categorization language level

22 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics
AI / Deep Learning to the Rescue? Machine Learning vs. Rules-based Poly-structured text – Content Types and Sections Deep Text a foundation for multiple applications Using sections for better auto rules New Knowledge Structures – Cognitive & Social

23 Text Analytics Forum (TAF) Introduction Adding Structure to Unstructured Content
Content Type – defined by sections Blogs, Announcements, Articles, Press Releases, News, Case Reports, Correspondence Sections Metadata and text indicators – rules to find Document Level: Title-Keywords, Abstract, summary, etc. Special sections – Methods, Objectives, Results, etc. Data patterns – dates, addresses – need context rules Weights – ignore all but section text to sophisticated weighting Clusters and machine learning – at section level, not document Clusters as sections, clusters within sections

24

25 Text Analytics Forum (TAF) Introduction Key Ideas in Text Analytics
AI / Deep Learning to the Rescue? Machine Learning vs. Rules-based Poly-structured text – Content Types and Sections New Knowledge Structures – Cognitive & Social Relational Frame Theory Deep Psychology/marketing

26 Text Analytics Forum (TAF) Introduction New Knowledge Structures
Multiple types of Knowledge Organization Taxonomy – concepts, hierarchical Ontology – any type of relationship, things and concepts Knowledge Graphs – triples, unlimited, no overall structure, best for facts K Graphs and hierarchical – best way to merge? Modules, facets Hierarchical network models New types – cognitive science - RFT, other? Brain is more than a network – universal language detector Child at 6-9 months – tell the difference between words – forwards and backwards – in any language

27 Text Analytics Forum (TAF) Introduction AI and Taxonomy
Relational Frame Theory - RFT Coordination – (similarity) dog is same as hound – types of similarity? Taxonomy of similarities? Distinction – (difference) – white dog different than a black dog Opposition – a black dog versus a while cat Comparison – this dog is bigger than that dog Spatial – this dog is on the left Temporal – I fed the dog before the cat Hierarchical – a dog is a sort of mammal Causal – a dog bit causes me to cry

28 Text Analytics Forum (TAF) Introduction Conclusions
AI-Deep Learning – still “Two years away” Deep Text Linguistic and cognitive depth – human-like learning Integration of multiple techniques and modules Infrastructure – Move fast with a stable infrastructure Enjoy the conference! Stay tuned – Next Year TAF season II – better than ever! New generation of text analytics software?

29 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group
Knowledge Architecture Professional Services


Download ppt "Program Chair: Tom Reamy Chief Knowledge Architect"

Similar presentations


Ads by Google