NLP for business process automation practical cases Deloitte Ukraine
Deloitte Ukraine Digital Solution Lab Deloitte Global 150 countries 286,000 people wordwide 150 years of history Ukrainian AI team Data Scientists / NLP Specialists Developers UI/UX designers Management team Deloitte named a leader by Gartner in Data and Analytics Service Providers, Worldwide
Common NPL use cases in automation Information extraction Semantic search Machine translation Data summarization Template fulfilling Dialogue support and navigation Documents classification and management
Deloitte use case 1: plenty web-sites analysis Business need Necessity to review, extract and summarize information from a large number of web-sites Issue Manual time-consuming process Subjective decisions Human-factor errors Challenges Multiple language content Non-standard web-sites structure Human-factor errors in data-labelling Tasks Text summarization Text generation Activity comparison
Text summarization and generation Ideas: Extract key words (wheat, sell) Combine them into the sentence (The company sells wheat)
Text summarization and generation p(product) p(activity) p(none)
Text summarization and generation LSTM p(product) p(activity) p(none)
Text summarization and generation p(product) p(activity) p(none) Database of urls and summaries used Summary from DB: Development of robotic solutions 2/3 non-stop words found!
Text summarization and generation Two approaches for text summarization Extractive summarization Abstractive summarization Select parts (typically sentence) of the original text to form a summary Easier Too restrictive Most past work is extractive Generate novel sentences using natural language generation techniques. More difficult More flexible and human Necessary for future progress
Text summarization and generation Final predicted Vocab Distribution 1-Pgen Pgen Attention Distribution Predicted Vocab Distribution Attention Encoder (BI-LSTM) Input-Sequence Decoder (RNN) Context vector
Text summarization and generation 43% of correct summaries More experiments are on the way Results: Text summarization and generation Joining two approaches: Our 100% <activity> Made </activity> in Italy <product> robotic solutions </product> ensure cutting edge performance that will last over time.
Activity comparison Idea: Compare with GloVe or something better
Selection of the mimimum distance for each criteria type Activity comparison Input criteria grain, wheat storage oil manufacturing machinery wholesale, trade ? Selection of the mimimum distance for each criteria type 0.51, 0.23 0.34 0.39 0.42 0.38 0.85, 0.31 ? Each with each word comparison using GloVe algorithm*. wheat (0.08,0.27) crop (0.24, 0.1) machinery (0.32, 0,6) distance ~ 0.236 Text from web-site We purchase, we accept for storage, we render services of completion and logistics of grain crops and sunflower.
Activity comparison ? Company 1: 0.31, 0.2, 0.45, 0.3 Random forest algorithm on 100 decision trees Simple voting process Probability of acceptance ….. Company 7999: 0.51, 0.1, 0.1, 0.36, X>0.3 Y<0.5 Z<0.8 Rejected Accepted W<0.8 True False Company 8000: 0.3, 0.22, 0.35, 0.67 0.31 0.23 ? 0.34 0.39 0.42 0.38
Probability of acceptance 92% recall 50% FPR Results: Activity comparison Random forest algorithm on 100 decision trees Probability of acceptance
Deloitte use case 2: documents analysis Business need Necessity to store large number of documents in the database with certain attributes that enable search Issue Manual time-consuming process Human-factor errors causing documents loose Challenges Multiple documents formats, including scanned pdfs Tough deadlines Poor labeling Tasks Document classification Attribute extraction
Attributes extraction Ideas: Annotate texts Use LSTM
Attributes extraction Receiving data from client Conversion to wildcard expressions tf-idf on limited vocabulary with logistic regression Models blending Key phrases: The client hereby agrees to use… Confidential work prepared for... client * agrees to… confidential work prepared * for… Documents and general information
95% F 0.5 Results: Attribute extraction
Possible solutions Advanced web-crawling Data capturing Customized search and information scraping via internet for marketing researches, reputation analysis, threat intel, KYC etc. Analysis, summarization and capturing of relevant information from web-sites, text documents, annual reports, 10K-reports etc. Templates pre-populating Data summarization & classification Filling templates with text or quantitative data, selected or generated using cognitive algorithm. Classification based on textual data in content of e-mails, documents, agreements etc.
Q& A
About Deloitte Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”), its network of member firms, and their related entities. DTTL and each of its member firms are legally separate and independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients. Please see www.deloitte.com/about for a detailed description of DTTL and its member firms. Please see www.deloitte.com/us/about for a detailed description of the legal structure of Deloitte LLP and its subsidiaries. Certain services may not be available to attest clients under the rules and regulations of public accounting. Copyright © 2018 Deloitte Development LLC. All rights reserved. Member of Deloitte Touche Tohmatsu Limited