From R e s e a r c h to R e a l i t y TM Reducing the Resource Burden of Narrative Text Classification for Large Administrative Databases Wellman HM 1,

Slides:



Advertisements
Similar presentations
Elements of an Effective Safety and Health Program
Advertisements

National Injury and Fatality Data for Aging Farmers John R. Myers Larry A. Layne Suzanne M. Marsh National Institute for Occupational Safety and Health.
“Highway Construction Work Zones and Traffic Control Hazards” A Training Program developed under a Susan Harwood grant from OSHA Prepared by Wayne State.
Introduction way Construction and Work Safety  Highway Construction and Work Safety  Concern to many  Construction workers, contractors, highway and.
1 Landscaping and Horticultural Service Industry.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
Safety and Health Programs
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Focus Area 20: Occupational Safety and Health Progress Review February 18, 2004.
Jeffrey Lew, Dulcy Abraham, Reini Wirahadikusumah, Javier Irizarry, Carlos Arboleda PURDUE UNIVERSITY CIB W099 Conference University of Hong Kong May 8,
Bureau of Labor Statistics 2006 Fatal Occupational Injury Data Texas Department of Insurance Division of Workers’ Compensation.
Job content Social relations Material working conditions Terms of employment Risk assessment Action plan Productivity Quality Reputation improving Occupational.
Safety and Health Programs
Office of Safety / Loss Control Mgt.
Health and safety training by Carol Slinger. Health and Safety What concerns you most?
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
International Instruments in Occupational Health and Safety.
Introduction Research indicates benefits to companies who establish effective worker safety and health programs: –Reduction in the extent and severity.
Unit #4 Establishing Committee Expectations – Safety & Health Programs 1.
OSHAX.org - The Unofficial Guide To the OSHA1. Benefits of Effective Safety and Health Programs Reduce work related injuries and illnesses Improve morale.
Introduction to Workplace Safety
Introduction Falls are the leading cause of fatal injuries in the construction industry. Despite the significance, detailed data (e.g., the distance fallen),
Safety and Health ProgramsPage 1 Harold Gribow, MS, CSP, ARM.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Review of Cost Talking costs are the best way to convincing a business about safety Costs has Direct & Indirect Component Indirects are difficult to measure.
1 Review and Assessment of the Korea Rail ’ s Safety Performance using Risk Assessment Models International Railway Safety Conference 2009, Sweden Chan-Woo.
JHA: Job Hazard Analysis. Job Hazard Analysis  A multi-step process designed to study and analyze a task or job, then break down that task into steps.
Research Project #6 Develop Better Data on Accident Precursors or Leading Indicators.
Introduction to a Safe Workplace
2007 CFOI Press Release Information Bureau of Labor Statistics Census of Fatal Occupational Injuries This morning the Bureau of Labor Statistics released.
A total of 5,071 fatal work injuries were recorded in the United States in 2008, down from a total of 5,657 fatal work injuries reported for 2007 Bureau.
Convention 100 Equal Remuneration, 1951 Basic principle: gender should not be the basis upon which remuneration is calculated or paid - either directly.
1 Introduction Module 1. 2 Disclaimer This material was produced under grant number SH from the Occupational Safety and Health Administration,
2008 CFOI Press Release Information Bureau of Labor Statistics Census of Fatal Occupational Injuries This morning the Bureau of Labor Statistics released.
1. Construction employment in the United States, (All employment)
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
National Center for Health Statistics DCC CENTERS FOR DISEASE CONTROL AND PREVENTION Measuring Injury Using the National Health Interview Survey Margaret.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Dynamic Bandwidth Reservation in Cellular Networks Using Road Topology Based Mobility Predictions InfoCom 2004 Speaker : Bo-Chun Wang
DOE vs OSHA Worker Safety and Health Regulation. DOE vs. OSHA Regulations OSHA regulations (29 CFR) were original published following passage of the Occupational.
Accident analysis One-hour training.
Occupational Health Indicators in Wyoming, 2001 – 2005 Mulloy KB 1, Stinson KS 1,Boudreau Y 2, Newman LS 1, Helmkamp J 2 1 – Mountain and Plains Education.
Recognizing and controlling workplace hazards. Objective To explain a job hazard analysis and encourage employees to recognize and evaluate workplace.
Campus Wide Safety Committee Initial Meeting 11/9/16
2017 Safety Symposium Monday, May 9th
BACKGROUND & SIGNIFICANCE RESULTS RESULTS (CONTINUED)
Healthcare workers report some of the highest injury rates in the nation, and those injuries come at a price beyond the workers' wellbeing. In 2011,
Occupational Health and Safety
Assessment of Injection Drug Use Based on Diagnostic Codes in Administrative Datasets M Kuo 1, NZ Janjua 1,2, AYW Yu 1, N Islam 1,2, H Samji 1, JA Buxton.
DEVELOPING SAFETY SYSTEM AT ARAB SPECIALIST HOSPITAL
Height of Fatal Fall Injuries in the U. S
Health Care Injuries.
Liberty Mutual Research Institute for Safety
Disaster Site Worker Safety
Personal Protective Equipment (PPE)
Patient Safety Organization orientation for workforce
Experiments in Machine Learning
Network Screening & Diagnosis
Census of Fatal Occupational Injuries CFOI, Bureau of Labor Statistics
Personal Protective Equipment (PPE)
Accident Investigation
Elements of an Effective Safety and Health Program
Disaster Site Worker Safety
Elements of an Effective Safety and Health Program
Predicting Loan Defaults
—ROC curves for each simple test compared with NCS (gold standard) plotting the sensitivity versus 1-specificity (the false-positive rate) for different.
Personal Protective Equipment (PPE)
MANUAL HANDLING for Non-Clinical Staff
Presentation transcript:

from R e s e a r c h to R e a l i t y TM Reducing the Resource Burden of Narrative Text Classification for Large Administrative Databases Wellman HM 1, Corns HL 1, Lehto, MR 2 Liberty Mutual Research Institute for Safety 1 Liberty Mutual Research Institute for Safety, 71 Frankland Road, Hopkinton, MA 01748, USA 2 School of Industrial Engineering, Purdue University, 1287 Grissom Hall, West Lafayette, In 47907, USA

from R e s e a r c h t o R e a l i t y TM Background Narrative analysis is used in many different areas of safety research  Prevention information (Sorock et al.,1997, Smith 2001).  Case Identification (Lombardi et al., 2005).  Identifying pre-event circumstances (Stutts et al., 2001, Sorock et al., 1996).  Developing hazard scenarios (Smith et al.,2006).

from R e s e a r c h t o R e a l i t y TM Computerized classification of NHIS injury narratives. (Wellman et al., 2004) Calculations of word/word combination probabilities Fuzzy Bayes predictions/ assignment of codes Learn on E-code categories assigned by experts Predict e-code categories for new narratives Original injury narratives and e-codes

from R e s e a r c h t o R e a l i t y TM Objectives Develop and evaluate different filtering techniques (i.e. semi-automated strategy using computer classification scheme & strategically assigned manual coders) to improve accuracy  Receiver Operating Characteristic (ROC) curves  Linear Predictor method Prediction and learning datasets unique – eliminate optimistic bias

from R e s e a r c h t o R e a l i t y TM Methods: Data extraction and gold standard codes assigned Randomly extracted injury narratives from a large Workers’ Compensation insurance database Establish ‘Gold Standard’ injury cause codes  2 expert coders manually classify each narrative  Gold standard cases where the 2 coders agreed. (N=10,389)

from R e s e a r c h t o R e a l i t y TM 1 Digit Classification of Injury Events BLS Occupational Injury and Illness Classification System (OIICS) 1. CategoriesDescription 0* (01, 02…) Contact With Objects and Equipment 1* Falls 2* Bodily Reaction and Exertion 3* Exposure to Harmful Substances or Environments 4* Transportation Accidents 5* Fires and Explosions 6* Assaults and Violent Acts 9* Other Events or Exposures 9999 Non-classifiable 1 (ANSI Z , American National Standard for Information Management for Occupational Safety and Health).

from R e s e a r c h t o R e a l i t y TM 2 Digit Classification of Injury Events Contact Category (‘0*’) Fall Category (‘1*’) 2 digit BLS category Description 10Falls, unspecified 11Fall to lower level 12Jump to lower level 13Fall on same level 19Fall nec 2 digit BLS category Description 00Contact with Objects 01Struck against 02Struck by 03Caught in 04Caught in collapsing materials 05Rubbed/abraded by Friction 06Rubbed/abraded/ jarred by vibration 09Contact with objects nec

from R e s e a r c h t o R e a l i t y TM Methods Flowchart Original injury narratives & manual codes Calculations of word/word combination probabilities (based on 7,389 narratives) Fuzzy Bayes predictions/ assignment of codes for 3000 narratives Learn on coded categories assigned by experts Predict code categories for new narratives Use LP method to filter results Use ROC method to filter results

from R e s e a r c h t o R e a l i t y TM Assigning Category Probabilities – Learning Phase WordClassificationProbability FELL02:Struck by0.70 FELL13:Fall on same level0.10 FELL-ABOUT11:Fall to lower level0.86 FELL-AGAINST02:Struck by0.29 FELL-AGAINST13:Fall on same level0.57 FELL-BACK11:Fall to lower level0.38 FELL-BACK13:Fall on same level0.38 FELL-BACKWARD11:Fall to lower level0.20 FELL-BACKWARD13:Fall on same level0.50 FELL-BACKWARDS11:Fall to lower level0.23 FELL-BACKWARDS13:Fall to same level0.73

from R e s e a r c h t o R e a l i t y TM Narrative Coding Prediction Phase Example Single Word Original Narrative ‘BATTERY FELL ON FOOT’ Original Narrative ‘BATTERY FELL ON FOOT’ Multiple Word Multiple word predictor “FELL&ON” incorrectly classified the narrative into “ All Falls ” category (strength=0.78) Single word predictor FELL incorrectly classified the narrative into the “All Falls” category (strength=0.77) Multiple Word with 2 word sequences Multiple word predictor “FELL- ON&ON-FOOT” correctly classified the narrative into the “Contact with Objects and Equipment” category (strength=0.92)

from R e s e a r c h t o R e a l i t y TM Baseline results Category level BLS Eventn Baseline 1- digit Category Sensitivity Baseline 2-digit Category Sensitivity Contact with objects/equipment55784%64 % Falls47490%79 % Bodily Reaction (BR) & Exertion96186 % 68 % Exposure to harmful environment36194 % 90 % Transportation Accident37093 % 81 % Fires & Explosions2110 % Assaults & Violent Acts11774 % 73 % Non Classifiable13925 % Total % 71 %

from R e s e a r c h t o R e a l i t y TM I.Receiver Operating Characteristic (ROC) Filtering

from R e s e a r c h t o R e a l i t y TM Methods ROC curves were plotted for categories (sensitivity vs. 1- specificity). Determine the optimal cut off point of prediction strengths  Balance the selection of errors and amount of manual coding required at that filter (threshold).

from R e s e a r c h t o R e a l i t y TM Example ROC for ‘Fall to Lower Level’

from R e s e a r c h t o R e a l i t y TM Effect of Threshold on Accuracy: Fall to Lower Level A stricter threshold means more manual coding

from R e s e a r c h t o R e a l i t y TM Comparison with Baseline of ROC Filtering – 2 digit improvement

from R e s e a r c h t o R e a l i t y TM II.Linear Predictor Threshold Filtering

from R e s e a r c h t o R e a l i t y TM Methods A linear predictor was created using both  The difference in prediction strengths of the top two predictors  The actual strength of the highest predictor LP= [ P(Ai/n) 1 -P(Ai/n) 2 ] +P(Ai/n) 1 Determine the optimal cut off point of prediction strengths  Balance the selection of errors and amount of manual coding required at that linear predictor level. An example might be: P(Ai/n) 1 =0.76, P(Ai/n) 2 =0.74 So LP = [0.76 – 0.74] = 0.78

from R e s e a r c h t o R e a l i t y TM Balancing Resources & Sensitivity

from R e s e a r c h t o R e a l i t y TM Comparison with Baseline of LP Filtering – 2 digit improvement

from R e s e a r c h t o R e a l i t y TM Conclusions Filtering improves accuracy substantially from baseline This approach opens up the opportunity to code narrative data from large administrative databases Other classification protocols can be used because the program learns from manually coded narratives

from R e s e a r c h t o R e a l i t y TM Generating knowledge to help people live safer, more secure lives.