Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
Maurice Hendrix (Semi-)automatic authoring of AH.
Maurice Hendrix (Semi-)automatic authoring of AH.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
SciTech Strategies, Inc. William Pickering Dick Klavans Marjorie M.K. Hlava IEEE SciTech Strategies Access Innovations / Data Harmony March 23, 2010 Found.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Access Innovations, Inc. Marjorie M.K. Hlava Jay Ven Eman.
SEO PROPOSAL Zach Morrison Vice President.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
GlobalWisdom Software Bravo TM Reviewer for Online Editors Abhijit Patil.
Information and Business Work
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Aki Hecht Seminar in Databases (236826) January 2009
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Presented by Zeehasham Rasheed
Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Libraries and Institutional Content Management Systems
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
Overview of Search Engines
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Text Analytics And Text Mining Best of Text and Data
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
The title of your presentation goes here Graham McCann Head of Product Management & Innovation AAHEP5, Cornell, Sept 2011.
Electronic Scriptorium, Ltd. AIIM Minnesota Chapter Metadata and Taxonomy Presentation Copyright Electronic Scriptorium, Ltd. All rights reserved, 1991.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Evolution of a production pipeline Marjorie M.K. Hlava President Access Innovations.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
CREAM: Semantic annotation system May 24, 2013 Hee-gook Jun.
Working with Ontologies Introduction to DOGMA and related research.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Implementing Linked Open Data in a Controlled Vocabulary Marjorie M.K. Hlava President Access Innovations Inc
University of Paderborn - GermanyPresenter: Johannes Magenheim Developing the AGORA Road Map – 9 th WCCE July 27 th – 31 st, 2009 Bento Gonçalves, Brazil.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Oxlip+. What is Oxlip+? A tool for finding & linking to databases – Online collections of (scholarly) materials – Includes full text / indexes / range.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Innovative Novartis Knowledge Center
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Taxonomies, Lexicons and Organizing Knowledge
Multimedia Information Retrieval
Defining Data-intensive computing
How to publish in a format that enhances literature-based discovery?
Jonathan Griffin, Managing Director, IFIS Publishing &
AI Discovery Template IBM Cloud Architecture Center
Interoperability of metadata systems: Follow-up actions
Presentation transcript:

Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting

Abstract Big data inferences are increasingly used to mine huge heaps of data. The applications are endless. However, those inferences do not work well when many lines go to a single bubble. The lines and relationships must be drawn between concepts, not simply between words. Using the text analytics is a powerful tool, but it is a means to an end, not the end itself. The important work is in the interpretation of the data. This session outlines a highly accurate and efficient approach and provides a case study of the application.

Outline of the talk Using text analytics in term extraction – 3 examples – Pattern recognition – String tagging – Taxonomy control Achieving Synonymy Now what do I do with it?

Term clouds Good place to start Show concept landscape Basis = – Levenshtein distances – N-grams Redundant concepts, separately shown No disambiguation Not direct XML tagging

Sample article

Normal text extraction

Near conceptual synonyms

Nonsensical suggestions

Small Taxonomy Near synonym, conceptual duplicate

Refined presentation

Dependent concepts

Ontological dependencies

Achieving Synonymy Find like concepts Merge the terms Choose a preferred form Build term record – Hierarchy – Equivalence – Associative

Overview, Upload 7K documents, search for text string, add a tag, “Columbia”

“Colombian” – no stemming Same document – different terms

Colombiana – record overlap

“FARC” – No Synonymy

“People’s Armed Forces of Colombia”, i.e., FARC, lacks synonymy, some doc overlap

Tag suite, no hierarchy, no equivalence, no combining tags for synonymy

Disambiguation BridgeStructure Bridge Dentistry BridgeGame Bridge Concept

Now what do I do with it? Tag documents – Consistently – Even depth of treatment – Full breadth of conceptual area Insert concepts in full text or as linked data Implement in search Use for internal statistics and analysis Track industry trends Create semantic fingerprints

The AIP Thesaurus Hierarchy Term Record

The AIP Thesaurus: Rulebase This article is about (among other things) degenerate stars. The text string “degenerate stars” occurs zero times in the text of the article. But since the rulebase is tuned to understand that when certain other words appear near the text “star”or “stars” it was correctly indexed.

The AIP Thesaurus: Rulebase If the word “star” or “stars” appears in the same sentence as “degenerate” or “compact” MAI applies the term “Degenerate stars” instead of just using “Stars”

The AIP Thesaurus: Applications

Listing of the AIP Thesaurus terms in JATS. Includes the term, keyword-ID, weight, code.

Inline tagged terms (denoted by the highlighting). The keyword ID (kwd1.4) corresponds with the name in the previous screenshot.

HTML Header Copyright © 2013 Access Innovations, Inc.

7. Content Recommender More Articles on the same topic Selected Article Search “thin film sputtering” Grants available Upcoming conferences on this topic Authors working in this space

Taxonomy Driven Search Presentation

Copyright © Access Innovations, Inc. Taxonomy view Thesaurus Term Record view

Suggested taxonomy descriptors

34 Visualization Strategies Matrix Visualization Software

Pattern Analysis Domain Associations

Pattern Analysis Gap Analyses

Summary Taxonomy tool box Text extraction / mining for terms Gather synonyms Disambiguate terms Look for gaps and over coverage Map all conceptual groupings – Hierarchical, Associative, Equivalence Apply to content Leverage knowledge of the collection

Thank you Marjorie M.K. Hlava, President Access Innovations

About Access Innovations Access Innovations are experts in content creation, enrichment, and conversion services. We provide services to semantically enrich and tag raw text into highly structured data. We deliver clean, well-formed, metadata- enriched content so our clients can reuse, repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for your information. Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, and e-commerce. We change search to found! Quick Facts Founded in 1978 Headquartered in Albuquerque, NM Privately held Delivered more than 2000 engagements