Download presentation
Presentation is loading. Please wait.
Published byCuthbert Patrick Modified over 9 years ago
1
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com
2
2 Agenda Introduction Project: Update ACM taxonomy – after 12+ years Information Environment Text Mining / Text Analytics Multiple Methods / Reports Conclusion
3
3 Introduction: KAPS Group Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted & emotion taxonomies, natural categories Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics, Social Media development, consulting – Text Analytics Quick Start – Audit, Evaluation, Pilot Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc. Program Chair – Text Analytics World – March 29-April 1 - SF Presentations, Articles, White Papers – www.kapsgroup.comwww.kapsgroup.com Current – Book – Text Analytics: How to Conquer Information Overload, Get Real Value from Social Media, and Add Smart Text to Big Data
4
4 Introduction: Approach Is Automatic Taxonomy Development Here Yet? Not Yet But it is getting closer Hybrid: – Taxonomists, SME’s, database analysts, text analysts – Text Mining software – basic text analysis – power – Text analytics software – brains New taxonomy terms & structure – Old = indexing, authors adding tags & keywords – New = auto-tagging, applications
5
5 Information Environment Existing Taxonomy: Computing Classification System Content: – Database export of Guide to the Computing Literature bibliographic records (.txt; approximately 7GB in 58 files.) – Statistical distribution of CCS categories across the Digital Library and Guide to Computing Literature (Excel; 4 files) – ACM Digital Library full text files (PDFs and XML metadata, including CCS categories; approximately 170GB in 240,000 files) – Ralston Encyclopedia of Computer Science (PDFs and HTML of each article with XML metadata, including CCS categories; approximately 350MB in 1,850 files)
6
Text Analytics in Taxonomy Development Case Study – Multiple Methods Text Mining - terms in documents – frequency, date, source, etc. – Text Preparation – Create multiple filters Quality – important terms, co-occurring terms Time savings – only feasible way to scan documents Clustering – suggested categories, chunking for editors – Clustering within clusters - explore Entity Extraction – people, organizations, programming languages, hardware/devices, etc. Joint Work Sessions – interactive exploration 6
7
Case Study – Taxonomy Development 7
8
8
9
9
10
10
11
Case Study – Taxonomy Development 11
12
12 Multiple Sets of Reports Keyword Frequency – First Pass – 3,026 – Total – 508, 941 (Get from Big Database) – Sub-Totals Year Pre-1998, By Year, By 5 year blocks Map to other variables – Journals, Authors – basis for communities Keywords in Abstract/Title Cluster analysis of keyword-abstract-title Search Terms in keyword-abstract-title
13
13 Entity Extraction – Company, Internet, Organization, Title
14
14 Multiple Methods - Reports Spreadsheets – static reports Database query reports – Create multiple slices, views, filters Working reports – eliminate more noise words Multiple mapping – extractions, author tags &keywords Map – frequency in abstracts, titles, articles Search logs – terms and phrases Date ranges – trend reports – per terms, new words
15
15
16
16
17
17 Conclusions Auto-taxonomy not here - Yet Scale requires semi-automated solution Human effort – initial design, text preparation – Now would add more auto-categorization Human effort – analysis & refinement – of queries, text mining, and taxonomy Simple taxonomies are better – part of information ecosystem – Lower levels of terms – into auto-tagging rules Early 2015: New Book: – Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data – Title might be shorter but it will be cover all you need to know
18
Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.