Technology Assisted Review (TAR)

Slides:



Advertisements
Similar presentations
Writing Obituaries A Timeless Art. Is the obituary page the best read page in the newspaper?
Advertisements

Oak Tree Plot Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad.
Journal 7 December 12, Sit down quietly and begin Journal 7. 2.Write the prompt: Like Mark Antony in Julius Caesar, if you had to speak at your.
Experience Report: System Log Analysis for Anomaly Detection
James Monroe’s life Jasmine Ella 5-2.
PowerPoint Template The Coolest Design Type something here
Science project By Thomas,Emmilea. What is a grassland biome?
Speaker Name Title of Presentation
Speaker Name Title of Presentation
PowerPoint Templates provided by PresentationPoint.com
BUSINESS REPROT TEMPLATE
Staff Training Type your subject here.
Names, titles & Affiliations
FOOD $ENSE CLASSES Month Year | County Name NAME OF CLASS
2017 LOGO ANNUAL REPORT Powerpoint Template Designed By Yexue in 2017.
2017 MORE THAN TEMPLATE MarcoKing Design.
Welcome To Coolsummer 95% From 2015 COOL SUMMER
PRESENTATION TITLE Subtitle Can Go Here.
PINK POINT THEME INTELLIGRAPHIC.
KKU PRESENTATION TEMPLATE
PowerPoint Templates provided by PresentationPoint.com
THIS IS A TITLE SLIDE AND SHOULD BE USED IN CAPITALS
Names, titles & Affiliations
NAME OF PRESENTATION Company Name.
Names, titles & Affiliations
Names, titles & Affiliations
Fingerspell or sign the words presented in this slide show.
Title of Article Topic By: Your Name Homeroom.
Top Five Tools Sample Lead Magnet Template
NAME OF EVENT SUBHEAD FOR EVENT.
NAME OF THE EVENT SUBHEAD FOR EVENT.
Names, titles & Affiliations
CSCI 5832 Natural Language Processing
PROJECT TEAM STRUCTURE
What’s happening this week!
Names, titles & Affiliations
Despicable me, Little YELLOW.
Extracting Recipes from Chemical Academic Papers
Name of Event, Talk, etc. Subhead for event Wednesday November 15
WELCOME Welcome to the Business Presentation Template
Voice Interaction Need Finding
LOGO MORE THAN TEMPLATE MarcoKing Design.
Staff Training Type your subject here.
Numbers Time By 張映芬老師.
201X MORE THAN TEMPLATE
Presentation Title Goes Right Here
文字 PowerPoint Template 边框 MarcoKing 小小草 designer.
2018 MORE THAN TEMPLATE PPTS Design.
MovieS FEATURE TITLE FEATURE TITLE JANUARY [YEAR] MOVIES
EXPERIENCE INTERNATIONAL EXPERIENCE Company Name | Job Title
Seminar Options Selling
Regional Split of Denmark
Horizontal Organization Chart SmartArt
Default Cover Slide Subtitle | Date.
8 Name Here years PROFILE EXPERIENCE EDUCATION SKILLS of experience
CLICK TO EDIT MASTER TITLE Click to edit Master subtitle style.
Web Designer, Developer
PRESENTATION TITLE Presentation Subtitle By James Sager – Dec 10, 2020
Paper/presentation title
HTML Introduction.
Business Flat General Ppt Template LOGO
Course Title Instructor Name Course Code / Semester / Year
Title Slide Templates.
2019 PRESENTATION TITLE LOGO
8 Name here years PROFILE EXPERIENCE EDUCATION SKILLS of experience
EXPERIENCE INTERNATIONAL EXPERIENCE Company name | Job title
Web Designer, Developer
Presentation transcript:

Technology Assisted Review (TAR) Turning unstructured data into structured data with NLP and Machine Learning

Technology Assisted Review (TAR) Use case: Obituary fielding Very sensitive content No room for error SLA must be near 100% accurate Can’t completely leave to automation But, interested in speeding process And reducing costs

Basic Task: Text blob to Fielded Data Entity Type Sub Type Value Place Service <name, addr> Date <date> Time <time> Death Birth Relation Spouse <name> Child Siblings Text blob Conversion with Accuracy

Agenda Original Method: Breakdown text blobs into useful fielded data Road to a new way: Use of NLP algorithms and an annotation tool Modern Method: Implementation of a Machine Learning model

Components Unstructured text blobs Fielded data Labor – The Screeners Brat: A data annotation tool NLP open source python libraries from Stanford Training Set and Control Set Machine Learning Model Creation and development Evolution A new “Technology Assisted Review” process

Full Obituary Example Obituary for Mr. Richard J. Baske Richard J. Baske age 82 late of Mokena, IL. Beloved husband of the late Irene nee Kwilosz. Loving father of Susan Baske, Karen (Gary) Hanko, Michele (James) Creed and Michael Baske. Proud grandfather of Kaitlyn, Jimmy, Jonathan and Nash. Special friend of Mary Marino. Caring brother in law of Martin Kwilosz. Dear friend to many. Funeral Wednesday October 18, 2017 12:00 PM from the Vandenberg Funeral Home 19604 S. Wolf Road Mokena, IL 60448 to St. Mary Church Mass 12:30 PM. Interment in Good Shepherd Cemetery. Visitation Wednesday October 18, 2017 from 10:00 AM to 12:00 PM. For information on services 708-479-1210 or www.vandenbergfuneralhome.com

Examples of Service Info in Text Blobs Funeral Wednesday October 18, 2017 12:00 PM from the Vandenberg Funeral Home 19604 S. Wolf Road Mokena, IL 60448 to St. Mary Church Mass 12:30 PM. Interment in Good Shepherd Cemetery. Visitation Wednesday October 18, 2017 from 10:00 AM to 12:00 PM.  Visitation will be Saturday from 8:45 AM until the time of the prayers at 10:45 AM at Ahlgrim & Sons Funeral Home, 330 W. Golf Road, Schaumburg. Mass Saturday at 11:30 AM at St. Hubert Church, 729 Grand Canyon St., Hoffman Estates. Interment will be in St. Michael Cem.

Today: Manual Tool Charles A. Wessersmith of Cortlandt Manor, NY died on April 6, 2004. He was 53. Mr. Castrovinci was self-employed and worked as a machine mechanic for Handi Rental in Carmel for five years. He was born August 5,1950 in Peekskill to Charles and Antoinette Maselli Castrovinci. On November 19, 1994, he married Debra Schweigert in Cold Spring, NY. He is survived by his wife, Debra (Leslie) Castrovinci; his mother, Antoinette Castrovinci of Cortlandt Manor; two sons: Jason Castrovinci of Conesusus Lake, NY and Charles Castrovinci of New Britian, CT; one sister, Marie Colucci of Hopewell Junction and three grandchildren. Calling hours are Thursday 2-4 and 7-9 pm at Joseph F. Nardone Funeral Home. The Funeral Service is Friday at 10:00 am at the funeral home. Cremation will be private. JOSEPH F. NARDONE FUNERAL HOME 414 Washington St., Peekskill (914)737-1363 Name: Place of Death: Date of Death: Date of Birth: Service Name: Service Address: Service Date: Service Time: Save Cancel

Time-consuming Process Copy and paste Charles A. Wessersmithof Cortlandt Manor, NY died on April 6, 2004. He was 53. Mr. Castrovinci was self-employed and worked as a machine mechanic for Handi Rental in Carmel for five years. He was born August 5,1950 in Peekskill to Charles and Antoinette Maselli Castrovinci. On November 19, 1994, he married Debra Schweigert in Cold Spring, NY. He is survived by his wife, Debra (Leslie) Castrovinci; his mother, Antoinette Castrovinci of Cortlandt Manor; two sons: Jason Castrovinci of Conesusus Lake, NY and Charles Castrovinci of New Britian, CT; one sister, Marie Colucci of Hopewell Junction and three grandchildren. Calling hours are Thursday 2-4 and 7-9 pm at Joseph F. Nardone Funeral Home. The Funeral Service is Friday at 10:00 am at the funeral home. Cremation will be private. JOSEPH F. NARDONE FUNERAL HOME 414 Washington St., Peekskill (914)737-1363 Name: Place of Death: Date of Death: Date of Birth: Service Name: Service Address: Service Date: Service Time: Save Cancel

Human Model Text blob Database

The End Game: Machine Learning Model Text blob Database Machine Learning Model

Supervised Learning f(X) = Y + e X = Training Set. An Answer Set. Y = Predictions Obituary Text Blob Entity Type Entity Value Entity Relationships Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore Date Monday July 8 Date of Funeral Time From 8pm to 10 Time Service Place Holy Cross Cemetery Place of Burial Person Tom Son of Decedent James Monroe Brother of Decedent Entity Type Entity Value Entity Relationship ?

Model Creation Process Delineate sentences and normalize certain data and time stamps Identify entities and entity relationships Custom built model using using several known algorithms such as: Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, KNN, K-Means … Then tweak f(X) to make it as “accurate as possible” The code continues to see scenarios that it is familiar with but under different circumstances so it adds new rules to refine the algorithm Repeated over and over with the Training Set until e is minimized

Stanford libraries Broad range of Natural Language Processing tools such as: *Sentence Delineation *Entity Extraction *Dependency Parsing Stemming Lemmatization Part-Of-Speech Tagging Sentiment Analysis Well maintained and updated Can run as a web service Written in Java Connectors for other languages: Ruby, Perl, JavaScript, .NET and Python https://nlp.stanford.edu/software/

Training Set. Control Set. TS CS Training Set. Control Set. Training Set A set of examples of the raw data and answers Represents a variety of input types. Not random. Originally several hundred examples with answers == 20% success rate. Eventually several thousand training examples == 95%-98% Control Set Validates the model Each test produces a score to determine progress (or not) 1000 rows

Brat Annotation Tool TS CS Visual structured annotation text tool Creates fixed-form results that can be processed by a program Using labels it identifies entities people, places, times, etc. Their types from a pre-determined list: city, country, hospital, etc. And relation between them: family, events occurred, etc. Used to create Training Set(TS) and Control Set(CS) Brat home page: http://brat.nlplab.org/

Brat example TS CS Tom Smith (Age 80) passed away on February 17, 2016 after a short illness at Mount Sinai West in New York City. Tom was born and raised in Brooklyn, NY by Sicilian immigrants.

Brat example TS CS decedent Tom Smith (Age 80) passed away on February 17, 2016 after a short illness at Mount Sinai West in New York City. Tom was born and raised in Brooklyn, NY by Sicilian immigrants. died hospital decedent city family born lived date <people> <places> <events> <dates> Entities: Relationships:

Define Prediction Goal Product Creation Overview Define Prediction Goal Set Protocol (TAR) Educate Reviewer Generate Example Sets TS CS

Product Creation Overview Program Creation & Data Mgt Create Model Validate Against Control Data Manually Validate TS M M Release to Production CS

Summary This is a supervised learning machine learning model Attribute data sets are provided by Brat to train and test the model Training Set: hundreds (20%), thousands (95-98%) Control Set: 1,000 Stanford libraries programmatically isolate sentences, standardize dates/times and identify the entities and relationships. Build the Model. Learning logic through context.

Questions?