Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.

Similar presentations


Presentation on theme: "Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting."— Presentation transcript:

1 Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting

2 Abstract Big data inferences are increasingly used to mine huge heaps of data. The applications are endless. However, those inferences do not work well when many lines go to a single bubble. The lines and relationships must be drawn between concepts, not simply between words. Using the text analytics is a powerful tool, but it is a means to an end, not the end itself. The important work is in the interpretation of the data. This session outlines a highly accurate and efficient approach and provides a case study of the application.

3 Outline of the talk Using text analytics in term extraction – 3 examples – Pattern recognition – String tagging – Taxonomy control Achieving Synonymy Now what do I do with it?

4 Term clouds Good place to start Show concept landscape Basis = – Levenshtein distances – N-grams Redundant concepts, separately shown No disambiguation Not direct XML tagging

5 Sample article

6 Normal text extraction

7 Near conceptual synonyms

8 Nonsensical suggestions

9 Small Taxonomy Near synonym, conceptual duplicate

10 Refined presentation

11 Dependent concepts

12 Ontological dependencies

13 Achieving Synonymy Find like concepts Merge the terms Choose a preferred form Build term record – Hierarchy – Equivalence – Associative

14 Overview, Upload 7K documents, search for text string, add a tag, “Columbia”

15 “Colombian” – no stemming Same document – different terms

16 Colombiana – record overlap

17 “FARC” – No Synonymy

18 “People’s Armed Forces of Colombia”, i.e., FARC, lacks synonymy, some doc overlap

19 Tag suite, no hierarchy, no equivalence, no combining tags for synonymy

20 Disambiguation BridgeStructure Bridge Dentistry BridgeGame Bridge Concept

21 Now what do I do with it? Tag documents – Consistently – Even depth of treatment – Full breadth of conceptual area Insert concepts in full text or as linked data Implement in search Use for internal statistics and analysis Track industry trends Create semantic fingerprints

22

23 The AIP Thesaurus Hierarchy Term Record

24 The AIP Thesaurus: Rulebase This article is about (among other things) degenerate stars. The text string “degenerate stars” occurs zero times in the text of the article. But since the rulebase is tuned to understand that when certain other words appear near the text “star”or “stars” it was correctly indexed.

25 The AIP Thesaurus: Rulebase If the word “star” or “stars” appears in the same sentence as “degenerate” or “compact” MAI applies the term “Degenerate stars” instead of just using “Stars”

26 The AIP Thesaurus: Applications

27 Listing of the AIP Thesaurus terms in JATS. Includes the term, keyword-ID, weight, code.

28 Inline tagged terms (denoted by the highlighting). The keyword ID (kwd1.4) corresponds with the name in the previous screenshot.

29 HTML Header Copyright © 2013 Access Innovations, Inc.

30 7. Content Recommender More Articles on the same topic Selected Article Search “thin film sputtering” Grants available Upcoming conferences on this topic Authors working in this space

31 Taxonomy Driven Search Presentation

32 Copyright © 2005 - Access Innovations, Inc. Taxonomy view Thesaurus Term Record view

33 Suggested taxonomy descriptors

34 34 Visualization Strategies Matrix Visualization Software

35 Pattern Analysis Domain Associations

36 Pattern Analysis Gap Analyses

37 Summary Taxonomy tool box Text extraction / mining for terms Gather synonyms Disambiguate terms Look for gaps and over coverage Map all conceptual groupings – Hierarchical, Associative, Equivalence Apply to content Leverage knowledge of the collection

38 Thank you Marjorie M.K. Hlava, President Access Innovations 505-998-0800 mhlava@accessinn.com

39 About Access Innovations Access Innovations are experts in content creation, enrichment, and conversion services. We provide services to semantically enrich and tag raw text into highly structured data. We deliver clean, well-formed, metadata- enriched content so our clients can reuse, repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for your information. Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, and e-commerce. We change search to found! Quick Facts Founded in 1978 Headquartered in Albuquerque, NM Privately held Delivered more than 2000 engagements


Download ppt "Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting."

Similar presentations


Ads by Google