Download presentation
Presentation is loading. Please wait.
1
Technology Assisted Review (TAR)
Turning unstructured data into structured data with NLP and Machine Learning
2
Technology Assisted Review (TAR)
Use case: Obituary fielding Very sensitive content No room for error SLA must be near 100% accurate Can’t completely leave to automation But, interested in speeding process And reducing costs
3
Basic Task: Text blob to Fielded Data
Entity Type Sub Type Value Place Service <name, addr> Date <date> Time <time> Death Birth Relation Spouse <name> Child Siblings Text blob Conversion with Accuracy
4
Agenda Original Method: Breakdown text blobs into useful fielded data
Road to a new way: Use of NLP algorithms and an annotation tool Modern Method: Implementation of a Machine Learning model
5
Components Unstructured text blobs Fielded data Labor – The Screeners
Brat: A data annotation tool NLP open source python libraries from Stanford Training Set and Control Set Machine Learning Model Creation and development Evolution A new “Technology Assisted Review” process
6
Full Obituary Example Obituary for Mr. Richard J. Baske Richard J. Baske age 82 late of Mokena, IL. Beloved husband of the late Irene nee Kwilosz. Loving father of Susan Baske, Karen (Gary) Hanko, Michele (James) Creed and Michael Baske. Proud grandfather of Kaitlyn, Jimmy, Jonathan and Nash. Special friend of Mary Marino. Caring brother in law of Martin Kwilosz. Dear friend to many. Funeral Wednesday October 18, :00 PM from the Vandenberg Funeral Home S. Wolf Road Mokena, IL to St. Mary Church Mass 12:30 PM. Interment in Good Shepherd Cemetery. Visitation Wednesday October 18, 2017 from 10:00 AM to 12:00 PM. For information on services or
7
Examples of Service Info in Text Blobs
Funeral Wednesday October 18, :00 PM from the Vandenberg Funeral Home S. Wolf Road Mokena, IL to St. Mary Church Mass 12:30 PM. Interment in Good Shepherd Cemetery. Visitation Wednesday October 18, 2017 from 10:00 AM to 12:00 PM. Visitation will be Saturday from 8:45 AM until the time of the prayers at 10:45 AM at Ahlgrim & Sons Funeral Home, 330 W. Golf Road, Schaumburg. Mass Saturday at 11:30 AM at St. Hubert Church, 729 Grand Canyon St., Hoffman Estates. Interment will be in St. Michael Cem.
8
Today: Manual Tool Charles A. Wessersmith of Cortlandt Manor, NY died on April 6, He was 53. Mr. Castrovinci was self-employed and worked as a machine mechanic for Handi Rental in Carmel for five years. He was born August 5,1950 in Peekskill to Charles and Antoinette Maselli Castrovinci. On November 19, 1994, he married Debra Schweigert in Cold Spring, NY. He is survived by his wife, Debra (Leslie) Castrovinci; his mother, Antoinette Castrovinci of Cortlandt Manor; two sons: Jason Castrovinci of Conesusus Lake, NY and Charles Castrovinci of New Britian, CT; one sister, Marie Colucci of Hopewell Junction and three grandchildren. Calling hours are Thursday 2-4 and 7-9 pm at Joseph F. Nardone Funeral Home. The Funeral Service is Friday at 10:00 am at the funeral home. Cremation will be private. JOSEPH F. NARDONE FUNERAL HOME 414 Washington St., Peekskill (914) Name: Place of Death: Date of Death: Date of Birth: Service Name: Service Address: Service Date: Service Time: Save Cancel
9
Time-consuming Process
Copy and paste Charles A. Wessersmithof Cortlandt Manor, NY died on April 6, He was 53. Mr. Castrovinci was self-employed and worked as a machine mechanic for Handi Rental in Carmel for five years. He was born August 5,1950 in Peekskill to Charles and Antoinette Maselli Castrovinci. On November 19, 1994, he married Debra Schweigert in Cold Spring, NY. He is survived by his wife, Debra (Leslie) Castrovinci; his mother, Antoinette Castrovinci of Cortlandt Manor; two sons: Jason Castrovinci of Conesusus Lake, NY and Charles Castrovinci of New Britian, CT; one sister, Marie Colucci of Hopewell Junction and three grandchildren. Calling hours are Thursday 2-4 and 7-9 pm at Joseph F. Nardone Funeral Home. The Funeral Service is Friday at 10:00 am at the funeral home. Cremation will be private. JOSEPH F. NARDONE FUNERAL HOME 414 Washington St., Peekskill (914) Name: Place of Death: Date of Death: Date of Birth: Service Name: Service Address: Service Date: Service Time: Save Cancel
10
Human Model Text blob Database
11
The End Game: Machine Learning Model
Text blob Database Machine Learning Model
12
Supervised Learning f(X) = Y + e
X = Training Set. An Answer Set. Y = Predictions Obituary Text Blob Entity Type Entity Value Entity Relationships Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore Date Monday July 8 Date of Funeral Time From 8pm to 10 Time Service Place Holy Cross Cemetery Place of Burial Person Tom Son of Decedent James Monroe Brother of Decedent Entity Type Entity Value Entity Relationship ?
13
Model Creation Process
Delineate sentences and normalize certain data and time stamps Identify entities and entity relationships Custom built model using using several known algorithms such as: Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, KNN, K-Means … Then tweak f(X) to make it as “accurate as possible” The code continues to see scenarios that it is familiar with but under different circumstances so it adds new rules to refine the algorithm Repeated over and over with the Training Set until e is minimized
14
Stanford libraries Broad range of Natural Language Processing tools such as: *Sentence Delineation *Entity Extraction *Dependency Parsing Stemming Lemmatization Part-Of-Speech Tagging Sentiment Analysis Well maintained and updated Can run as a web service Written in Java Connectors for other languages: Ruby, Perl, JavaScript, .NET and Python
15
Training Set. Control Set.
TS CS Training Set. Control Set. Training Set A set of examples of the raw data and answers Represents a variety of input types. Not random. Originally several hundred examples with answers == 20% success rate. Eventually several thousand training examples == 95%-98% Control Set Validates the model Each test produces a score to determine progress (or not) 1000 rows
16
Brat Annotation Tool TS CS Visual structured annotation text tool
Creates fixed-form results that can be processed by a program Using labels it identifies entities people, places, times, etc. Their types from a pre-determined list: city, country, hospital, etc. And relation between them: family, events occurred, etc. Used to create Training Set(TS) and Control Set(CS) Brat home page:
17
Brat example TS CS Tom Smith (Age 80) passed away on February 17, 2016 after a short illness at Mount Sinai West in New York City. Tom was born and raised in Brooklyn, NY by Sicilian immigrants.
18
Brat example TS CS decedent
Tom Smith (Age 80) passed away on February 17, 2016 after a short illness at Mount Sinai West in New York City. Tom was born and raised in Brooklyn, NY by Sicilian immigrants. died hospital decedent city family born lived date <people> <places> <events> <dates> Entities: Relationships:
19
Define Prediction Goal
Product Creation Overview Define Prediction Goal Set Protocol (TAR) Educate Reviewer Generate Example Sets TS CS
20
Product Creation Overview
Program Creation & Data Mgt Create Model Validate Against Control Data Manually Validate TS M M Release to Production CS
21
Summary This is a supervised learning machine learning model
Attribute data sets are provided by Brat to train and test the model Training Set: hundreds (20%), thousands (95-98%) Control Set: 1,000 Stanford libraries programmatically isolate sentences, standardize dates/times and identify the entities and relationships. Build the Model. Learning logic through context.
22
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.