Download presentation
Presentation is loading. Please wait.
Published byDavid Oliver Modified over 9 years ago
1
Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting
2
Abstract Big data inferences are increasingly used to mine huge heaps of data. The applications are endless. However, those inferences do not work well when many lines go to a single bubble. The lines and relationships must be drawn between concepts, not simply between words. Using the text analytics is a powerful tool, but it is a means to an end, not the end itself. The important work is in the interpretation of the data. This session outlines a highly accurate and efficient approach and provides a case study of the application.
3
Outline of the talk Using text analytics in term extraction – 3 examples – Pattern recognition – String tagging – Taxonomy control Achieving Synonymy Now what do I do with it?
4
Term clouds Good place to start Show concept landscape Basis = – Levenshtein distances – N-grams Redundant concepts, separately shown No disambiguation Not direct XML tagging
5
Sample article
6
Normal text extraction
7
Near conceptual synonyms
8
Nonsensical suggestions
9
Small Taxonomy Near synonym, conceptual duplicate
10
Refined presentation
11
Dependent concepts
12
Ontological dependencies
13
Achieving Synonymy Find like concepts Merge the terms Choose a preferred form Build term record – Hierarchy – Equivalence – Associative
14
Overview, Upload 7K documents, search for text string, add a tag, “Columbia”
15
“Colombian” – no stemming Same document – different terms
16
Colombiana – record overlap
17
“FARC” – No Synonymy
18
“People’s Armed Forces of Colombia”, i.e., FARC, lacks synonymy, some doc overlap
19
Tag suite, no hierarchy, no equivalence, no combining tags for synonymy
20
Disambiguation BridgeStructure Bridge Dentistry BridgeGame Bridge Concept
21
Now what do I do with it? Tag documents – Consistently – Even depth of treatment – Full breadth of conceptual area Insert concepts in full text or as linked data Implement in search Use for internal statistics and analysis Track industry trends Create semantic fingerprints
23
The AIP Thesaurus Hierarchy Term Record
24
The AIP Thesaurus: Rulebase This article is about (among other things) degenerate stars. The text string “degenerate stars” occurs zero times in the text of the article. But since the rulebase is tuned to understand that when certain other words appear near the text “star”or “stars” it was correctly indexed.
25
The AIP Thesaurus: Rulebase If the word “star” or “stars” appears in the same sentence as “degenerate” or “compact” MAI applies the term “Degenerate stars” instead of just using “Stars”
26
The AIP Thesaurus: Applications
27
Listing of the AIP Thesaurus terms in JATS. Includes the term, keyword-ID, weight, code.
28
Inline tagged terms (denoted by the highlighting). The keyword ID (kwd1.4) corresponds with the name in the previous screenshot.
29
HTML Header Copyright © 2013 Access Innovations, Inc.
30
7. Content Recommender More Articles on the same topic Selected Article Search “thin film sputtering” Grants available Upcoming conferences on this topic Authors working in this space
31
Taxonomy Driven Search Presentation
32
Copyright © 2005 - Access Innovations, Inc. Taxonomy view Thesaurus Term Record view
33
Suggested taxonomy descriptors
34
34 Visualization Strategies Matrix Visualization Software
35
Pattern Analysis Domain Associations
36
Pattern Analysis Gap Analyses
37
Summary Taxonomy tool box Text extraction / mining for terms Gather synonyms Disambiguate terms Look for gaps and over coverage Map all conceptual groupings – Hierarchical, Associative, Equivalence Apply to content Leverage knowledge of the collection
38
Thank you Marjorie M.K. Hlava, President Access Innovations 505-998-0800 mhlava@accessinn.com
39
About Access Innovations Access Innovations are experts in content creation, enrichment, and conversion services. We provide services to semantically enrich and tag raw text into highly structured data. We deliver clean, well-formed, metadata- enriched content so our clients can reuse, repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for your information. Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, and e-commerce. We change search to found! Quick Facts Founded in 1978 Headquartered in Albuquerque, NM Privately held Delivered more than 2000 engagements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.