Download presentation
Presentation is loading. Please wait.
1
Course Lab Introduction to IBM Watson Explorer
University of Rome «La Sapienza» Course of Business Intelligence Course Lab Introduction to IBM Watson Explorer Ing. Vittorio Carullo IBM Italia
2
Our Target Familiarize with a «real» software used in large organizations Accomplish small but significant use cases in BI arena Introduce advanced topics in BI like the use of “non structured” information
3
Lab Schedule Lab sessions on Tuesday from October 17, 2017 , 4 - 6 pm
Presentation of the Watson Explorer tool and its basic features (1 – 1.5 sessions) Use of the tool for conducting standard BI use cases (2 – 2.5 sessions) Use of the tool for Advanced Content Analytics (2 sessions)
4
Reference Materials IBM Redbook on Watson Content Analytics Suggested chapters: 1-6. Further chapters are more «technical» IBM Knowledge Center m.ibm.discovery.es.nav.doc/explorer_analytics.htm Use it just as a technical reference for product features
5
Today’s Contents: Content Analytics
Text Analytics Basics in WEX Linguistic Analysis Lab 5.1 Discover More with Custom Annotators Lab 5.2
6
Text Analytics Basics in WEX
7
Remember the «body»? During last lesson, we focused on the need to define as text body the most significant part of our data in descriptive terms Examples Tweets -> body = message Customer complaints -> body = problem description Quality assurance -> body = product feedback …. Body is automatically set as Analyzable index field during Watson Explorer collection configuration
8
What happens to Analyzable text?
Index fields set as analyzable are processed in a special way during indexing phase by a WEX component called Document Processor In order to provide text analysis, Document Processor uses a pipeline of processing «blocks» called annotators, each one specialized for a particular analytical task. The final purpose is to enrich text documents with the most complete possible set of analytical facets, i.e. metadata related to syntax, semantics and meaning of the text Analytical facets may be used, together with other structured facets, to search/filter/examine content and discover insights in an even more powerful fashion
9
WEX Document processing pipeline
10
Document processing pipeline / 1
Language Identification Annotator: understands and takes note of language used. A wide variety of languages are supported. Linguistic Analysis Annotator : Based on the identified language, linguistic structures (noun, verbs, etc. ) are identified and lemmatized, i.e. reconducted to the canonical form. Dictionary Lookup Annotator: if present, this annotator search for occurrencies of particular terms contained in a specific lexicon. This can be useful for technical or specific jargon where words are not commonly found in vocaboulary.
11
Document processing pipeline / 2
Named Entity Recognition Annotator: This block recognizes words line names of persons, places or organizations, specific to the language Pattern Matcher Annotator : This annotator recognizes specific expressions based on a literal pattern like license plates, credit card numbers, social security numbers, etc. Content Classification Annotator: if present, this annotator is able to classify text content into one or more given categories, according to the topics treated.
12
Document processing pipeline / 3
Machine-Learning Annotator: If present, this block can be trained to recognize specific entities and relations. Training is performed by providing some examples of such entities (more details in next lesson) Rule-Based Annotator : This annotator can combine annotations previously discovered in the pipeline and build more complex annotations using sort of «proximity rules» (for example a date followed by a signature) Custom Annotator: if present, this block contains an annotator built using coding tools. Software code may create a wide set of annotation of various types.
13
2. Linguistic Analysis
14
Linguistic analysis Linguistic analysis annotators build «grammar analysis» of text and can identify POS (parts of speech) and phrase constituents The result of this analysis becomes a set of «linguistic facets» that are shown together with other facets into Facet tree Part of Speech branch contains single words divided by type (Noun, Verb, etc.) Phrase Constituent branch contains sequences of words (Noun Phrase, Predicate Phrase, etc.)
15
Part of Speech (list) Facet Name Example Noun (general)
account, payment, loan, … Noun (others) Equifax, Fargo, FCRA, FDCPA, XXXX Verb be, have, do, make, tell, get, send…. Adjective any, other, new, several, first, full, good,… Adverb now, never, then, also, even, again…. Conjunction and, or, but, so, for, ….. Interjection please, yes, hi, oh, …. Numeral 2016, 1, 1000, 5, ….
16
Phrase Constituent (list)
Facet Name Example Noun Sequence credit card, bank account,… Modified Noun first payment, additional, information,.. Preposition with Noun On time, for years, …. Noun – Predicate company has, I do, ….. Verb – Noun Check account, make payment,….
17
Linguistic Facet is NOT same as text search
What are the «others» twos ?
18
Eye to dots!
19
Why is Linguistic analysis useful?
This analysis can be very useful as the «first inquiry» to discover facts and clues inside body text Looking at nouns and verbs it is possible to answer the basic question: «what are they talking about here??» Noun Sequences, Modified Nouns and Predicate Phrases may further clarify concepts and suggest typical patterns
20
Let’s Have a Look Let us open Content Miner and explore Linguistic Facets for Consumer Complaints 2016 collection What can we discover?
21
Hands On: Lab 5.1 Use Linguistic Analysis in Content Miner to analyze and discover the type of activity for various companies Identify at least five companies and try to understand what kind of financial activity they usually are involved in Do not use Product facet Report your findings in Exercise Form 4.2 Content Miner Link
22
Lab 5.1 Hints & Tips Use Noun Facet in Facets View
Make a list of words that are significant for the banking context (loan, mortgage, credit, etc.) Try to identify some groups of «activity types» by grouping similar words Note: activity type is an insight that you are discovering! Use Facet Pairs View (Company vs Nouns) What nouns are more related to certain companies and vice versa? Using association with nouns, make associations with activity type
23
3. Discover More Vith Custom Annotators
24
Limits of Linguistic Facets
Linguistic facets are very useful to approach analysis of non structured text, but are too generic. It is usually important to recognize more specific things Isolate concepts belonging to a well-defined jargon/lexicon Catch the occurrencies of certain entities that are interesting for our domain Understand relationship between concepts Understand sentiment-related facts (praise, criticism, anger, complaint,,,) More sophisticated annotators are needed!
25
Example: Report car accident analysis
Part of car Certified Advanced 208-Compliant air bag system Model year 2005 Manufacturer Ford Model Escape XLT 4x4 Incident Two-vehicle crash Date of incident July 2014 Time of incident 1539 This on-site investigation focused on the performance of the Certified Advanced 208-Compliant air bag system in a 2005 Ford Escape XLT 4x4 sport utility vehicle. This two-vehicle crash occurred in July 2014 at 1539 hours in the state of Colorado. Content Analytics
26
Example: Police Report
These are annotation types built using annotator blocks. Different types are shown with different colors. Annotation results are shown here as text highlights.
27
WEX Main Custom Annotators
Dictionary Lookup Annotator Content Classification Annotator Character Rule Annotator Parsing Rule Annotator Machine Learning Annotator Sentiment Analysis Annotator
28
What is a ‘dictionary’ for WEX?
A dictionary consists of a list of ‘keywords’ (relevant words and phrases) In the dictionary, you may put words that are related a certain aspect of your data that you want to investigate and analyze with new facets. For example, if you want to create a facet for analyze colors, you can select a list of keywords like ‘yellow’, ‘blue’, and ‘red’ and put them into a Color dictionary. WEX can be instructed to use dictionaries for selecting documents and enriching them with new facets For instance, when documents are processed in WEX, words coming from POS analysis are checked against the Color dictionary. If a match occurs, that document is associated with a new Color facet.
29
Dictionary Examples Legal terms World Countries IT Brands
30
Define new Facets using Dictionary
Once you have created your word list, you have to map the dictionary with a new Facet. There are two possible ways Use Administration Console You have to enable Dictionary Lookup Annotator into Annotators pipeline Use Watson Explorer Content Analytics Studio A separate tool to create complex custom annotators and deploy them into your Collection
31
Dictionary-based facet : Results
32
Hands On: Lab 5.2 Looking at Consumer Complaints database, try to identify a possible «dictionary» that could be interesting for discovering something interesting Try to list some words for this dictionary Report your list in Exercise Form 5.2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.