Course Lab Introduction to IBM Watson Explorer University of Rome «La Sapienza» Course of Business Intelligence - 2017 Course Lab Introduction to IBM Watson Explorer Ing. Vittorio Carullo IBM Italia v.carullo@it.ibm.com
Our Target Familiarize with a «real» software used in large organizations Accomplish small but significant use cases in BI arena Introduce advanced topics in BI like the use of “non structured” information
Lab Schedule Lab sessions on Tuesday from October 17, 2017 , 4 - 6 pm Presentation of the Watson Explorer tool and its basic features (1 – 1.5 sessions) Use of the tool for conducting standard BI use cases (2 – 2.5 sessions) Use of the tool for Advanced Content Analytics (2 sessions)
Reference Materials IBM Redbook on Watson Content Analytics http://www.redbooks.ibm.com/abstracts/sg247877.html?Open Suggested chapters: 1-6. Further chapters are more «technical» IBM Knowledge Center https://www.ibm.com/support/knowledgecenter/SS8NLW_11.0.2/co m.ibm.discovery.es.nav.doc/explorer_analytics.htm Use it just as a technical reference for product features
Today’s Contents: Working with a Real Data Set Customer Complaints Database Description of Data Attributes Raw Data Analysis Lab 3.1 – Browse and Analyze 4. Data Mining
1. Customer Complaints Database
Our Data Provider CFPB (Consumer Finance Protection Bureau) is a federal office that collect US consumers’ complaints about financial products and services Each week they send complaints to companies for response. By adding their voice, consumers help improve the financial marketplace. https://www.consumerfinance.gov/data-research/consumer-complaints/
Data and Privacy Policy Customer complaints are published on CFPB site after the company responds or after 15 days, whichever comes first. CFPB publish the consumer’s description of what happened only if the consumer opts to share it and after taking steps to remove personal information. For more details refer to the scrubbing policy document: http://files.consumerfinance.gov/a/assets/201503_cfpb_Narrative-Scrubbing-Standard.pdf
Example of «scrubbed» Information Personal information: Names, ages, race, nationality, medical conditions, etc. Location and contact information: Addresses, phone and fax numbers, emails, IP addresses, etc. Company names: Names of businesses not directly connected to the complaint Employment: Names of employers, occupations, student status Sensitive numbers: Social security, account, credit or debit card, license plate, etc. Dollar amounts: They are rounded to prevent others from using exact dollar amounts to identify people. Offensive language: abusive, vulgar, offensive, threatening, or harassing expressions
2. Description of Data Attributes
Record Example
Record Field Explaination Field Name Field Description Data type Date Received The date the CFPB received the complaint Date & Time Product The type of product the consumer identified in the complaint Plain Text (Category) Sub-product The type of sub-product the consumer identified in the complaint Issue The issue the consumer identified in the complaint Sub-issue The sub-issue the consumer identified in the complaint
Record Field Explaination (cont’d) Field Name Field Description Data type Consumer complaint narrative Consumer complaint narrative is the consumer-submitted description of "what happened" from the complaint. Consumers must opt-in to share their narrative. We will not publish the narrative unless the consumer consents, and consumers can opt-out at any time. The CFPB takes reasonable steps to scrub personal information from each complaint that could be used to identify the consumer Plain Text (Free text) Company public response The company's optional, public-facing response to a consumer's complaint. Companies provide a public response to the CFPB, for posting on the public database, by selecting a response from a set list of options.
Record Field Explaination (cont’d) Field Name Field Description Data type Company The complaint is about this company Plain Text (Entity Name) State The consumer’s reported mailing state for the complaint ZIP code Mailing ZIP code provided by the consumer. This field may: i) include the first five digits of a ZIP code; ii) include the first three digits of a ZIP code (if the consumer consented to publication of their complaint narrative); or iii) be blank (if ZIP codes have been submitted with non-numeric values, if there are less than 20,000 people in a given ZIP code, or if the complaint has an address outside of the United States).
Record Field Explaination (cont’d) Field Name Field Description Data type Tags Data that supports easier searching and sorting of complaints submitted by or on behalf of consumers Plain Text (Free text) Submitted via How the complaint was submitted to CFPB Plain Text (Category) Date sent to company The date the CFPB sent the complaint to the company Date & Time Company response to consumer This is how the company responded Timely response? Whether the company gave a timely response Plain Text (Yes / No) Consumer disputed? Whether the consumer disputed the company’s response Complaint ID The unique identification number for a complaint Number
3.Data Analysis
Data preparation on WEX: Review of Process Use the button «Create Collection» to create a new collection Type the collection name and set collection type as «Content analytics collection»
Data preparation on WEX: Review of Process Click on the newly created collection name to open control section Hover on «Import» bar an click on ‘+’ Select CSV Import
Data preparation on WEX: Review of Process Specify the path to the CSV file to be imported and where to find import settings, if any Note: CSV data file and import settings file can be either on your locale workstation or on server machine
Data preparation on WEX: Review of Process Review importing options to ensure that records are correctly imported You should examine your data file and know what is the field delimiter and the text quotation delimiter. Check the preview to ensure everything is OK
Data preparation on WEX: Review of Process Decide which fields to import as table attributes Specify key properties (Returnable, Faceted search, free text search, etc: see next slide)
Data Type vs Field Attribute Data features Field Type to specify Field Attributes to specify Date & Time A date expressed in a canonical format like 10/31/2017 Date None (automatically set) Plain Text (Free Text) A textual description of something without any constraint (new) body Free text search Use «body» reserved field for the part of text that you want to process with text analytics Plain Text (Category) Plain Text (Entity) Plain Text (Yes/No) A textual description of something that can be reconducted to a «lexicon» containing a finite list of expressions Faceted Search Free text search if you want to be able to search through the categories Number A numeric value None
Hands On: Lab 3.1 Review data specification provided for Customer Complaint database and decide which features should be set for every field Fill in the exercise form 3.1 adding missing information Verify the configuration with the one set for existing collection on the WEX Admin Console Console link: http://172.31.1.2:8390/ESAdmin Connect with credentials esadmin/uniroma1
4. Browse and Analyze Data
Query Input Text Field Tabs for selecting views Operator Tools to execute query on selected object Checkboxes for facet selection
How to filter data Data filtering can be done: Running a query using query input text field everywhere Selecting a Facet and applying Views: Facets, Deviations, Trends, Facet pairs Selecting a Time Interval and applying an Operator Tool Views: Time Series, Deviations, Trends,
Example: Facet selection 2. Click on the appropriate tool (AND / AND NOT / OR) to run the query 1. Select the facet of interest clicking on the checkbox
Example: Time interval selection 2. Click on the appropriate tool (AND / AND NOT / OR) to run the query 1. Select the interval of interest dragging mouse over the bars
Facets view (all data) Frequency (Count for each value) Facet tree Values for selected facet
Facets view (filtered data) Frequency count Correlation Values (with respect to selected query) Current selection
Time Series Set time unit here Count for each time unit
Deviations / Trends Visualization criteria Facet selected
Facet Pairs
Hands On: Lab 3.2 Use Content Miner Content Miner Link Select the collection «Customer Complaints 2016» Use data views and query operators to answer questions in Exercise Form 3.2 Content Miner Link http://172.31.1.2:8393/ui/analytics No authentication required
Lab 3.2 Hints & Tips Facet View Check Time Series Examine frequency for various facets Focus search with a keyword (eg. «loan» and view how correlation changes) Where is «loan» string searched? Check Time Series Change time scale from year to month Restrict the scope to few months