Download presentation
Presentation is loading. Please wait.
Published byColin Lawrence Modified over 9 years ago
1
Introduction to Content Analytics Ömer Sever IBM SWG Enterprise Content Mangaement
2
IBM Content Analytics 1 Delivers: …Deep, valuable business insight extracted from unstructured content …Linguistic modeling and concept extraction through classification, NLP and text analytics …Dynamic, secure analytics-driven search with accurate and timely results …extract value and business success from unstructured content with Content Analytics! Analytics Text Miner Organize, Analyze and Visualize structured, semi- structured and unstructured enterprise content Identify trends, patterns, correlations, anomalies and business context from collections. Natural Language Processing (NLP) Utilize linguistic modeling to enable semantic / entity / concept extraction from business content Employ advanced classification to categorize content deliver business insight Secure Enterprise Search Crawl, index and analyze data and content for secure enterprise search Search and explore to derive business value Business Integration Integrate Content Analytics with business processes and other analytics solutions to deliver full business context. Enterprise Search Business Integration Text Analytics / NLP Content Analytics
3
2 Customer Service Product Management Service Management Marketing Churn Alerts Monitor FAQs Market Sentiment CSR Logs Market Research Transcripts Web Blogs Email Sales Partners Industry Reports Internal Docs and Reports Corporate Reputation Voice of the Customer CSR Training and Monitoring ICA Delivers Insight to Multiple Lines of Business
4
IBM Content Analytics adds value to… Automotive Quality Insight Analyzing: Tech notes, call logs, online media For: Warranty Analysis, Quality Assurance Benefits: Reduce warranty costs, improve customer satisfaction, marketing campaigns Crime Analytics Analyzing: Case files, police records, 911 calls… For: Rapid crime solving & crime trend analysis Benefits: Safer communities & optimized force deployment Healthcare Analytics Analyzing: E-Medical records, hospital reports For: Clinical analysis; treatment protocol optimization Benefits: Better management of chronic diseases; optimized drug formularies; improved patient outcomes Insurance Fraud Analyzing: Insurance claims For: Detecting Fraudulent activity & patterns Benefits: Reduced losses, faster detection, more efficient claims processes Customer Care Analyzing: Call center logs, emails, online media For: Buyer Behavior, Churn prediction Benefits: Improve Customer satisfaction / retention, marketing campaigns, new revenue opportunities Social Media for Marketing Analyzing: Call center notes, SharePoint, multiple content repositories For: churn prediction, product/brand quality Benefits: Improve consumer satisfaction, marketing campaigns, find new revenue opportunities or product/brand quality issues
5
4 Dynamically search and explore content for new business insight New Connections and Dashboard views to easily detect insights; plus add your own custom views Interactively assess for content preservation and decommissioning to reduce storage costs and risk Powerful solution modeling and support for advanced classification tools for more accurate and deeper insight Enhanced analytics configuration tools Deliver rapid insight to other systems, users and applications for complete business view Quickly generate Cognos BI reports, link between Cognos reports and ICA views; deliver analysis to IBM Case Manager solutions IBM Content Analytics v2.2 Highlights A platform for rapid insight
6
UIMA Pipeline + Annotators 5 …”car” that “smelled” “like smoke” and had “half a tank of gas”... Source Information Corporate (Contact Center, Test Data, Dealer notes, ECM, etc.) and External ( NHTSA, Edmunds, Consumer Reports, MotorTrend etc.) Noun Verb Noun PhrasePrep Phrase Vehicle Scent 1st issue 2nd issue Primary Issue: “Odor” Related Issue: “Fuel Level” Extracted Concepts Search and Content Analytics: How it works Analyzed Content (and Data) Fine grain control over the entities and facets that are created IBM Master Data Mgmt RDB
7
6 Tailor your text analysis with flexible, easy-to-use tooling 1 Develop your Custom Text Analysis with Tooling Build language and domain resources into a LangaugeWare dictionary. Develop rules to spot facts, entities and relationships. Create and test UIMA annotators with a collection of documents. 2 Export your Custom Text Analysis Easily generate the annotators to be Content Analytics ready 3 Deploy your Custom Text Analysis with in ICA Import newly created annotators via Content Analytics administration console and associate it to a collection. View of Project Resources Easy to test and verify your tailored text analysis Easy to export your custom text analysis
8
Text Analytics / Natural Language Processing (NLP) The simplest text analytics scenario is to scan a set of documents written in a natural language, then: model the document set for predictive classification purposes, or populate a database or search index with the information extracted. Text analytics also describes that application of text analytics to respond to business problems, whether independently or in conjunction with query and analysis of fielded, numerical data. It is a truism that 80 percent of business-relevant information originates in unstructured form, primarily text. These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.
9
Text Analytics The term text analytics describes a set of techniques: – Linguistic – Statistical – Machine learning These techniques model and structure the information content of textual sources for – business intelligence – exploratory data analysis – research – investigation. The term is roughly synonymous with text mining Text Analytics is now more frequently in business settings while "text mining" is used in – life-sciences research – government intelligence.
10
Text Analytics Text analytics involves – information retrieval – lexical analysis to study word frequency distributions – pattern recognition – tagging/annotation – information extraction – data mining link and association analysis – Visualization – predictive analytics The overarching goal is, essentially, to turn text into data for analysis via application of – natural language processing (NLP) – analytical methods.
11
What is Text Mining? Text mining technology has been developed to acquire useful knowledge from large amounts of textual data
12
Natural Language Processing (NLP) Natural Language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. – Natural language generation systems convert information from computer databases into readable human language. – Natural language understanding systems convert samples of human language into more formal representations such as parse trees or first-order logic structures that are easier for computer programs to manipulate.computer Many problems within NLP apply to both generation and understanding – a computer must be able to model morphology (the structure of words) in order to understand an English sentence – a model of morphology is also needed for producing a grammatically correct English sentence.
13
Terminology and Definitions annotation - Information about a span of text. For example, an annotation could indicate that a span of text represents a company name. annotator- a software component that performs linguistic analysis tasks and produces and records annotations. character rules - LRW rules to recognize sequences of characters (LRW way to do regular expressions) concept Extraction / Entity extraction- A text analysis function that identifies significant vocabulary items (such as people, places, or products) in text documents and produces a list of those items. dictionary– a list of words for document processing to use to create annotations lexical analysis- The overall process by which LanguageWare segments and normalizes text metadata– data about a document, such as size and modified date normalization- determining a single string representation for a word or term found in text. This single string representation may also be called lemma, citation form, canonical form. In LRW, since we include Semantic Normalization (IBM=International Business Machines, Big Blue) we use the term Normal Form parsing rules – LRW rules to recognize patterns of words, they run in the LRW rules engine regular expressions - A flexible means of identifying sequences of characters (such as URLs). Written in a formal language that can be interpreted by a regular expression processor. tokenization- The simple mechanical process of breaking up white space delimited text into words.
14
Unstructured Information Mgmt Architecture (UIMA) UIMA stands for Unstructured Information Management Architecture. Structure of UIMA The UIMA architecture can be thought of in four dimensions: It specifies component interfaces in an analytics pipelinepipeline It describes a set of Design patternsDesign patterns It suggests two data representations: – an in-memory representation of annotations for high-performance analytics – an XML representation of annotations for integration with remote web services.XML It suggests development roles allowing tools to be used by users with diverse skills Classification Custom Analytics Tokenization Word Analytics Named Entity Recognition Multi-word Analytics Language Identification UIMA
15
ICA, NLP, LRW & UIMA? annotator- a software component that performs linguistic analysis tasks and produces and records annotations
16
Annotators IBM® Content Analytics provides a number of UIMA annotators for advanced text analysis. When documents are processed through the document processing pipeline, the annotators extract from unstructured content – Concepts – Words – Phrases – Classifications – named entities The annotators mark these extractions as annotations. The annotations are added to the index as tokens or facets The annotations are used as the source for content analysis. Some annotators support user-defined dictionaries, user-defined rules, and custom configurations. When configuring the document processing pipeline for a collection, an administrator selects the annotators to be used. Some of the key functions the annotators support include: – Populating the common analysis structure to a relational database with specific text analysis results. – Capturing special words of interest as the subject of text analytics. – Capturing patterns of words as the subject of text analytics. – Capturing named entities, such as persons, places, and organization names. – Categorizing documents. – Fundamental text analytics, such as parsing content to identify parts of speech. – Multilingual text analytics capabilities. The results of analytics can vary based on the language of the input document.
17
16 LRW Develops UIMA annoators, isn't that enough?
18
17 Developing a model is an iterative process between LRW and ICA ICA Crawlers ICA Document Processing ICA Text Miner ICA Custom Annotator Configuration ICA is the platform for finding insights using the annotations!!
19
LanguageWare Resource Workbench 7.2 – type catalog A Type Catalog has been added, allowing: Search for a UIMA Type in project or whole workspace Find the resources that create it and use a UIMA type Easily rename a type, ensuring that all resources using the UIMA type are updated together. To open: right click, LanguageWare Type Catalog ->Find type 18
20
Communities On-line communities, User Groups, Technical Forums, Blogs, Social networks, and more – Find the community that interests you … Information Management ibm.com/software/data/communityibm.com/software/data/community Business Analytics ibm.com/software/analytics/communityibm.com/software/analytics/community Enterprise Content Management ibm.com/software/data/content- management/usernet.htmlibm.com/software/data/content- management/usernet.html IBM Champions – Recognizing individuals who have made the most outstanding contributions to Information Management, Business Analytics, and Enterprise Content Management communities ibm.com/championibm.com/champion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.