Course Lab Introduction to IBM Watson Explorer

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Introduction to phrases & clauses
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
C++ Training Datascope Lawrence D’Antonio Lecture 11 UML.
UNDERSTANDING BILINGUAL TRANSLATION OF SPECIALIZED TEXTS.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Sharif University of Technology Session # 7.  Contents  Systems Analysis and Design  Planning the approach  Asking questions and collecting data 
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
1 DEVELOPING ASSESSMENT TOOLS FOR ESL Liz Davidson & Nadia Casarotto CMM General Studies and Further Education.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
THE TBL FRAMEWORK: LAGUAGE FOCUS Willis, J. (1996) ByJulietaEdayFabiola.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Systems Analysis and Design in a Changing World, 6th Edition 1 Chapter 4 - Domain Classes.
Chapter Four In the Community.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Domain Classes – Part 1.  Analyze Requirements as per Use Case Model  Domain Model (Conceptual Class Diagram)  Interaction (Sequence) Diagrams  System.
Chapter 17 – Object- Oriented Design. Chapter Goals To learn about the software life cycle To learn about the software life cycle To learn how to discover.
Writing to Teach - Tutorials Chapter 2. Writing to Teach - Tutorials The purpose of a tutorial is to accommodate information to the needs of the user.
1 Information System Analysis Topic-3. 2 Entity Relationship Diagram \ Definition An entity-relationship (ER) diagram is a specialized graphic that illustrates.
NATURAL LANGUAGE PROCESSING
SNOMED CT Vendor Introduction 27 th October :30 (CET) Implementation Special Interest Group Tom Seabury IHTSDO.
RHETORIC AND GRAMMAR Refining Composition Skills Macías Rinaldi Leyla – Comisión C CHAPTER II: INTRODUCTION TO THE PARAGRAPH CHAPTER III: THE NARRATIVE.
5 Chapter 5: Modeling Systems Requirements: Events and Things Systems Analysis and Design in a Changing World.
Machine Learning Best Practices with Alfresco & Activiti
Inspecting Software Requirement Document
Vocabulary Module 2 Activity 5.
CSC207 Fall 2016.
Approaches to Machine Translation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
1. Review of last Friday (Form, Function, Fluency)
Introduction Characteristics Advantages Limitations
Modeling and Simulation (An Introduction)
Unified Modeling Language
Introduction to becoming a writer
The Systems Engineering Context
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Reports Chapter 17 © Pearson 2012.
Course Lab Introduction to IBM Watson Explorer
OpenWorld 2018 How to Create Chatbots with OMCe
Pilot project training
ITE 130 Web Searching.
UML Class Diagrams: Basic Concepts
Search Techniques and Advanced tools for Researchers
Social Knowledge Mining
Writing Analytics Clayton Clemens Vive Kumar.
LANGUAGE TEACHING MODELS
BASIC CONCEPT of ACCOUNTING
InnovationQ Plus Quick Start Guide
Course Lab Introduction to IBM Watson Analytics
2. An overview of SDMX (What is SDMX? Part I)
Chapter 10 Object States and The Statechart Diagram
Course Lab Introduction to IBM Watson Analytics
SYS466 Domain Classes – Part 1.
Approaches to Machine Translation
Finding Trends with Visualizations
Levels of Linguistic Analysis
Course Lab Introduction to IBM Watson Analytics
Text Mining & Natural Language Processing
Spreadsheets, Modelling & Databases
Features of a Good Research Study
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Introduction to Text Analysis
Planning Training Programs
Chapter 10: Compilers and Language Translation
ADVANCED SEARCH ON WESTLAWNEXT
ABI/INFORM Collection
Presentation transcript:

Course Lab Introduction to IBM Watson Explorer University of Rome «La Sapienza» Course of Business Intelligence - 2017 Course Lab Introduction to IBM Watson Explorer Ing. Vittorio Carullo IBM Italia v.carullo@it.ibm.com

Our Target Familiarize with a «real» software used in large organizations Accomplish small but significant use cases in BI arena Introduce advanced topics in BI like the use of “non structured” information

Lab Schedule Lab sessions on Tuesday from October 17, 2017 , 4 - 6 pm Presentation of the Watson Explorer tool and its basic features (1 – 1.5 sessions) Use of the tool for conducting standard BI use cases (2 – 2.5 sessions) Use of the tool for Advanced Content Analytics (2 sessions)

Reference Materials IBM Redbook on Watson Content Analytics http://www.redbooks.ibm.com/abstracts/sg247877.html?Open Suggested chapters: 1-6. Further chapters are more «technical» IBM Knowledge Center https://www.ibm.com/support/knowledgecenter/SS8NLW_11.0.2/co m.ibm.discovery.es.nav.doc/explorer_analytics.htm Use it just as a technical reference for product features

Today’s Contents: Content Analytics Text Analytics Basics in WEX Linguistic Analysis Lab 5.1 Discover More with Custom Annotators Lab 5.2

Text Analytics Basics in WEX

Remember the «body»? During last lesson, we focused on the need to define as text body the most significant part of our data in descriptive terms Examples Tweets -> body = message Customer complaints -> body = problem description Quality assurance -> body = product feedback …. Body is automatically set as Analyzable index field during Watson Explorer collection configuration

What happens to Analyzable text? Index fields set as analyzable are processed in a special way during indexing phase by a WEX component called Document Processor In order to provide text analysis, Document Processor uses a pipeline of processing «blocks» called annotators, each one specialized for a particular analytical task. The final purpose is to enrich text documents with the most complete possible set of analytical facets, i.e. metadata related to syntax, semantics and meaning of the text Analytical facets may be used, together with other structured facets, to search/filter/examine content and discover insights in an even more powerful fashion

WEX Document processing pipeline

Document processing pipeline / 1 Language Identification Annotator: understands and takes note of language used. A wide variety of languages are supported. Linguistic Analysis Annotator : Based on the identified language, linguistic structures (noun, verbs, etc. ) are identified and lemmatized, i.e. reconducted to the canonical form. Dictionary Lookup Annotator: if present, this annotator search for occurrencies of particular terms contained in a specific lexicon. This can be useful for technical or specific jargon where words are not commonly found in vocaboulary.

Document processing pipeline / 2 Named Entity Recognition Annotator: This block recognizes words line names of persons, places or organizations, specific to the language Pattern Matcher Annotator : This annotator recognizes specific expressions based on a literal pattern like license plates, credit card numbers, social security numbers, etc. Content Classification Annotator: if present, this annotator is able to classify text content into one or more given categories, according to the topics treated.

Document processing pipeline / 3 Machine-Learning Annotator: If present, this block can be trained to recognize specific entities and relations. Training is performed by providing some examples of such entities (more details in next lesson) Rule-Based Annotator : This annotator can combine annotations previously discovered in the pipeline and build more complex annotations using sort of «proximity rules» (for example a date followed by a signature) Custom Annotator: if present, this block contains an annotator built using coding tools. Software code may create a wide set of annotation of various types.

2. Linguistic Analysis

Linguistic analysis Linguistic analysis annotators build «grammar analysis» of text and can identify POS (parts of speech) and phrase constituents The result of this analysis becomes a set of «linguistic facets» that are shown together with other facets into Facet tree Part of Speech branch contains single words divided by type (Noun, Verb, etc.) Phrase Constituent branch contains sequences of words (Noun Phrase, Predicate Phrase, etc.)

Part of Speech (list) Facet Name Example Noun (general) account, payment, loan, … Noun (others) Equifax, Fargo, FCRA, FDCPA, XXXX Verb be, have, do, make, tell, get, send…. Adjective any, other, new, several, first, full, good,… Adverb now, never, then, also, even, again…. Conjunction and, or, but, so, for, ….. Interjection please, yes, hi, oh, …. Numeral 2016, 1, 1000, 5, ….

Phrase Constituent (list) Facet Name Example Noun Sequence credit card, bank account,… Modified Noun first payment, additional, information,.. Preposition with Noun On time, for years, …. Noun – Predicate company has, I do, ….. Verb – Noun Check account, make payment,….

Linguistic Facet is NOT same as text search What are the «others» twos ?

Eye to dots!

Why is Linguistic analysis useful? This analysis can be very useful as the «first inquiry» to discover facts and clues inside body text Looking at nouns and verbs it is possible to answer the basic question: «what are they talking about here??» Noun Sequences, Modified Nouns and Predicate Phrases may further clarify concepts and suggest typical patterns

Let’s Have a Look Let us open Content Miner and explore Linguistic Facets for Consumer Complaints 2016 collection What can we discover?

Hands On: Lab 5.1 Use Linguistic Analysis in Content Miner to analyze and discover the type of activity for various companies Identify at least five companies and try to understand what kind of financial activity they usually are involved in Do not use Product facet Report your findings in Exercise Form 4.2 Content Miner Link http://172.31.1.2:8393/ui/analytics

Lab 5.1 Hints & Tips Use Noun Facet in Facets View Make a list of words that are significant for the banking context (loan, mortgage, credit, etc.) Try to identify some groups of «activity types» by grouping similar words Note: activity type is an insight that you are discovering! Use Facet Pairs View (Company vs Nouns) What nouns are more related to certain companies and vice versa? Using association with nouns, make associations with activity type

3. Discover More Vith Custom Annotators

Limits of Linguistic Facets Linguistic facets are very useful to approach analysis of non structured text, but are too generic. It is usually important to recognize more specific things Isolate concepts belonging to a well-defined jargon/lexicon Catch the occurrencies of certain entities that are interesting for our domain Understand relationship between concepts Understand sentiment-related facts (praise, criticism, anger, complaint,,,) More sophisticated annotators are needed!

Example: Report car accident analysis Part of car Certified Advanced 208-Compliant air bag system Model year 2005 Manufacturer Ford Model Escape XLT 4x4 Incident Two-vehicle crash Date of incident July 2014 Time of incident 1539 This on-site investigation focused on the performance of the Certified Advanced 208-Compliant air bag system in a 2005 Ford Escape XLT 4x4 sport utility vehicle. This two-vehicle crash occurred in July 2014 at 1539 hours in the state of Colorado. Content Analytics

Example: Police Report These are annotation types built using annotator blocks. Different types are shown with different colors. Annotation results are shown here as text highlights.

WEX Main Custom Annotators Dictionary Lookup Annotator Content Classification Annotator Character Rule Annotator Parsing Rule Annotator Machine Learning Annotator Sentiment Analysis Annotator

What is a ‘dictionary’ for WEX? A dictionary consists of a list of ‘keywords’ (relevant words and phrases) In the dictionary, you may put words that are related a certain aspect of your data that you want to investigate and analyze with new facets. For example, if you want to create a facet for analyze colors, you can select a list of keywords like ‘yellow’, ‘blue’, and ‘red’ and put them into a Color dictionary. WEX can be instructed to use dictionaries for selecting documents and enriching them with new facets For instance, when documents are processed in WEX, words coming from POS analysis are checked against the Color dictionary. If a match occurs, that document is associated with a new Color facet.

Dictionary Examples Legal terms World Countries IT Brands

Define new Facets using Dictionary Once you have created your word list, you have to map the dictionary with a new Facet. There are two possible ways Use Administration Console You have to enable Dictionary Lookup Annotator into Annotators pipeline Use Watson Explorer Content Analytics Studio A separate tool to create complex custom annotators and deploy them into your Collection

Dictionary-based facet : Results

Hands On: Lab 5.2 Looking at Consumer Complaints database, try to identify a possible «dictionary» that could be interesting for discovering something interesting Try to list some words for this dictionary Report your list in Exercise Form 5.2