Text Mining for Data Quality Analysis of Melanoma Tumor Depth

Slides:



Advertisements
Similar presentations
Debugging ACL Scripts.
Advertisements

Hematopoietic and Lymphoid Neoplasm Project. Acknowledgments American College of Surgeons (ACOS) Commission on Cancer (COC) Canadian Cancer Registries.
STAGING MCR Staff Show Me Healthy Women March 27, 2008 Supported by a Cooperative Agreement between DHSS and the Centers for Disease Control and Prevention.
Tumour Matching N.Ireland Experience Colin Fox (IT Manager) Richard Middleton (Data Manager)
Cancer Registry Coding Changes for 2014 Presented by the Kentucky Cancer Registry February, 2014.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Unit 4: Monitoring Data Quality For HIV Case Surveillance Systems #6-0-1.
Automated Cancer Registration N.Ireland Experience Colin Fox (IT Manager) Richard Middleton (Data Manager)
Basic Concept of Data Coding Codes, Variables, and File Structures.
2010 Hematopoietic and Lymphoid Neoplasm Project Registry Operations and the SEER Program.
All – Will have Completed the objectives allowing them to cover the Level 1 criteria for the FS assessment Most – Will have covered objectives.
Putting the Puzzle Together: Breast Collaborative Staging Melissa Riddle, RHIT, CTR October 6, 2012.
Hematopoietic and Lymphoid Neoplasm Project. Acknowledgments American College of Surgeons (ACOS) Commission on Cancer (COC) Canadian Cancer Registries.
MCR and WebPlus: Melanoma Reporting Nancy Cole, Missouri Cancer Registry.
CANCER INCIDENCE IN NEW JERSEY BY COUNTY, for the Comprehensive Cancer Control Plan County Needs Assessments August 2003 Prepared by: Cancer.
Data Quality Toolbox for Registrars MCSS Workshop December 9, 2003 Elaine Collins.
Timeliness of Cancer Registry Reporting Ali Johnson, CTR Vermont Cancer Registry Vermont Explor Annual Data Meeting May 1, 2006.
1 Myeloma Plasma Cell Disorders (Schema Name: MyelomaPlasmaCellDisorder) V0203.
CASEFINDING Debra W. Christie, MBA, RHIA, CTR, CCRP Director, Cancer Research & Data Center University of Mississippi Medical Center.
What’s the Diff? Sue C. Vest, CTR Missouri Cancer Registry This project was supported in part by a cooperative agreement between the Centers for Disease.
Hematopoietic and Lymphoid Neoplasm Project. Acknowledgments American College of Surgeons (ACOS) Commission on Cancer (COC) Canadian Cancer Registries.
CDC Site Visit at Emory CHD Surveillance Cooperative Agreement Data Quality & Validation September 25, 2013 Wendy Book, MD.
New Data Items MP/H Task Force Multiple Primary Rules Histology Coding Rules 2007.
Inaccurate Cycle Billing Records in OMD - Northeast Black Belt: Jim Palmer Division: Operations/Northeast BU – Steering Comm Presentation.
THRio Database Linkage and THRio Database Issues.
Session 21-1 Session 44 The Verification Selection Process.
Collaborative Staging for Colon Site Specific Factors Tonya Brandenburg, MHA, CTR QA Manager Abstracting and Coding Kentucky Cancer Registry.
Vicki LaRue, CTR KCR Abstractor’s Training February 12,
*Be brief, yet thorough enough to allow recoding of the data fields from the text only! * Use accepted abbreviations from Abstractor’s Manual Appendix.
1 Linking Social Security Death Index (SSDI) Data with Registry Data to Update Demographics and Vital Status David O’Brien, PhD, GISP Alaska Cancer Registry.
BY FRANCES ROSS, CTR PRESENTED AT THE NAACCR ANNUAL CONFERENCE JUNE, 2008 Record Consolidation Test with the 2007 Multiple Primary/Histology Rules.
Donna Morrell, CTR NAACCR 2014 Annual Conference Ottawa, Ontario, Canada June 25, 2014 Using Scanners and Optical Character Recognition for Pathology Report.
Using NAPIIA to Improve the Accuracy of Asian Race Code in Registry Data Mei-Chin Hsieh, MSPH, CTR Lisa A. Pareti, BS, RHIT, CTR Vivien W. Chen, PhD NAACCR.
Gary M. Levin, BA, CTR Florida Cancer Data System NAACCR 2008 Annual Conference 2007 Multiple Primary Rules: Impact on Tumor Counts.
Is Your Data Management System Flexible for Quality Control Activities? Winny Roshala, CTR Data Standards and Quality Control Unit NAACCR: June 13-19,
Electronic CAP Cancer Checklists and Cancer Registries – A Pilot Project 2009 NAACCR Conference Ken Gerlach, MPH, CTR Castine Verrill, MS, CTR CDC-National.
Treatment Capture from Follow Back to Oncology Offices by Frances Ross Presented at the 2013 NAACCR Annual Conference Austin, TX.
Using CDC Edits Metafile in the Registry to Support Clinical Trials Recruitment Alan R. Houser, MA, MPH C/NET Solutions Dennis Deapen, DrPH Los Angeles.
Introducing… The Death Clearance Manual Robin Otto, RHIA, CTR Manager, Pennsylvania Cancer Registry Co-Chair, Death Clearance Issues Workgroup NAACCR 2008.
NAACCR Annual Meeting Detroit, 2007 Assessing Completeness of Melanoma Reporting in Louisiana Wu XC, Ferdaus R, Andrews PA, Chen VW Louisiana Tumor Registry.
2016 Edits.
National Bowel Cancer Audit
Introduction to Marketing Research
Population-Based Cancer Registries in the United States:
Surveillance Research Program
Unit 4: Monitoring Data Quality For HIV Case Surveillance Systems
Creating the perfect text…
SEER Case Consolidation Study: Design & Objective
Automated Consolidation of Collaborative Stage Data Items
What Abstractors will love about SEER*Abs
What Abstractors will love about SEER*Abs
2018 NM Community Survey Data Entry Training
Mary Potts, RHIA, CPA, CTR, Manager
Lecture 2- Query Processing (continued)
BUSINESS COMMUNICATION SKILLS PRESENTATION SKILLS OF THESIS & PROJECT
DATA RECORDS & FILES By Sinkala.
The ultimate in data organization
COURSE OBJECTIVES Review Case Comparison
Comparing the multiple sources of cancer treatment data
Louisiana’s Hospital Follow-up Exchange: A Decade of Partnership
Do Latinas who live in ethnic enclaves have better or worse survival?
2019 NAACCR Annual Conference
NAACCR/IACR Annual Conference, June 2019
OncotypeDX DCIS test use and clinical utility: A SEER population-based study Yao Yuan, PhD, MPH, Alison Van Dyke, MD, PhD, Serban Negoita, MD, DrPH & Valentina.
Multigene Genomic Testing (ONCOTYPE DX)
Martin Whiteside, DC, PhD, MSPH Director, Tennessee Cancer Registry
Nadia Howlader, PhD National Cancer Institute
The Progress of npcr audits What have we done, what have we learned, and where are we going now Click to edit subtitle Click to enter your Division Name.
SEER Auto-Consolidation Workgroup
Presentation transcript:

Text Mining for Data Quality Analysis of Melanoma Tumor Depth 2019 NAACCR/IACR Combined annual conference Vancouver, BC

Outline Background Objectives Case selection criteria Source documents Algorithm development & testing Preliminary analyses Next steps

acknowledgements IMS NCI Glenn Abastillas Ariel Brest Linda Coyle Cancer Registries IMS NCI Louisiana Detroit New Jersey New York Utah Glenn Abastillas Ariel Brest Linda Coyle Jennifer Stevens Peggy Adamo Clara Lam Serban Negoita Valentina Petkov

Background

Melanoma tumor depth Most important determinant of prognosis for melanomas Pre-2018 diagnoses: CS SSF1 Greatest measured thickness from any procedure recorded Recorded in hundredths of millimeters (mm) Three-digit field with implied decimal point between 1st & 2nd digits Measurement of 2.0 mm coded as 200

Coding concerns Decimal point errors Transcription errors Miscoding of tumor size for tumor depth Incomplete information

objectives

objectives Develop, test algorithm to identify accurate melanoma depth measurement values from unstructured text Conduct assessment of error distribution & effect on stage & survival estimates Provide registries with set of flagged cases with high likelihood of inaccurate depth measurement values for review Provide registries with method for automatic error correction Disseminate algorithm logic & query algorithm files to cancer surveillance & clinical research community Provide evidence-based input for registrar training materials

Case selection criteria

Melanoma cases Must meet all criteria Diagnosed between 2010-2014 Behavior Code = 3 (invasive cancers) Primary Site = C44_ Histology Codes = 8720-8790 Reportable to SEER Expanding dx years to 2017 for all registries moving forward

Melanoma cases, cont. Exclude Death-certificates-only diagnoses (Reporting Source =7) Cases with scanned images

Source documents

Source documents Include Exclude All NAACCR source abstracts E-path reports Exclude Pathology reports dated before diagnosis date

Source documents, cont. E-path reports contain up to 8 different regions known as segments 3 of 8 regions included in our source documentation to develop algorithm Final diagnosis Microscopic diagnosis/description Synoptic report

Algorithm development & testing

Algorithm development & testing What are we trying to capture? Any numerical values relevant to melanoma tumor depth Qualifier words that might indicate a measurement (e.g. at least) Key words (e.g. Breslow’s, depth, thickness) Process, Process, Process Checks, verifications put in place to confirm measurements are relevant measurements

Algorithm development & testing, cont. Building Consolidated Results Data Set Process raw measurement data to obtain standardized tumor depth measurement values Select best standardized measurement value at source document level Select best source document Transform standardized measurement from best source document into NAACCR standard code value Add new machine-generated code values to original CTC record (from SEER*DMS) to create analytic data set When there are multiple measurements found: take largest after dx date (don’t use repts prior to dx date) Prefer mm measurement over a cm measurement

Algorithm development & testing, cont. Building Gold Standard Two experienced CTRs code melanoma depth, reconcile discrepancies CTRs use all available data sources to determine measurement value for each consolidated case (CTC) CTR reported value is “gold standard” (GS) value Once CTR review of random sample data complete, algorithm/machine generated (MG) valued and “gold standard” values compared to originally reported values (OCTC) GS development has been done for one registry so far, results shown in subsequent slides are based on this one registry. 190 cases in GS group (240 – 40 that were 000 – 10 with images only) We will repeat the GS process for each registry.

Preliminary analyses Results from one registry so far

MG & OCTC Code Values Agreement with GS by Tumor Thickness   Counts Agreement between I2e MG & GS Values SAS MG & GS Values Agreement between OCTC & GS Values Match No Match N PctN GS code value distribution 139 102 73.4 37 26.6 106 76.3 33 23.7 104 74.8 35 25.2 001-979 980 3 100.0 2 66.7 1 33.3 999 48 40 83.3 8 16.7 41 85.4 7 14.6 32 16 All 190 145 45 150 78.9 21.1 138 72.6 52 27.4 190 cases in GS group (240 – 40 that were 000 – 10 with images only) 000 = No mass/tumor found Last row in this table – repeated, same #s on next 2 slides: shows overall agreement 001-979 = actual measured depth in mm 980 = 9.80 millimeters or larger 999 = unk Looking at I2e and SAS just to make sure SAS is comparable to what was done with I2e, and it is. We will use only SAS going forward. Using SAS for other similar projects (HPV, for example)

MG & OCTC Code Values Agreement with GS by T category   Counts Agreement between I2e MG & GS Values Agreement between SAS MG & GS Values Agreement between OCTC & GS Values Match No Match No Match N PctN T category distribution 8 100.0 4 50.0 T0 TX 32 28 87.5 12.5 27 84.4 5 15.6 T1 74 59 79.7 15 20.3 48 64.9 26 35.1 T2-T4 76 50 65.8 34.2 55 72.4 21 27.6 77.6 17 22.4 All 190 145 76.3 45 23.7 150 78.9 40 21.1 138 72.6 52 27.4

MG & OCTC Code Values Agreement with GS by Source Documents   Counts Agreement between I2e MG & GS Values Agreement between SAS MG & GS Values Agreement between OCTC & GS Values Match No Match N PctN Source Document 90 73 81.1 17 18.9 70 77.8 20 22.2 66 73.3 24 26.7 Path Report only Abstract & Path 57 38 66.7 19 33.3 45 78.9 12 21.1 43 75.4 14 24.6 Abstract Only 34 79.1 9 20.9 35 81.4 8 18.6 29 67.4 32.6 All 190 145 76.3 23.7 150 40 138 72.6 52 27.4

Error Analysis – I2e & GS N PctN All I2e MG compared to GS Match   N PctN All 190 100.00 I2e MG compared to GS 145 76.32 Match Decimal Error 3 1.58 Both have values < 9.8 but values do not match 13 6.84 GS has value I2e does not GS no value but I2e found value 8 4.21 GS value < 980, I2e value > 980

Error Analysis – SAS & GS   N PctN All 190 100.00 SAS MG compared to GS 150 78.95 Match Decimal Error 1 0.53 Both have values < 9.8 but values do not match 10 5.26 GS has value SAS does not 22 11.58 GS no value but SAS found value 7 3.68 SAS = 78.95 (I2e was 76.32).

Error Analysis – OCTC & GS   N PctN All 190 100.00 OCTC compared to GS 138 72.63 Match Decimal Error 17 8.95 Both have values < 9.8 but values do not match 12 6.32 GS has value CTC does not 6 3.16 GS no value but CTC has value GS value > 980, CTC value < 980 1 0.53 No record in range 4 2.11

Next steps

Next steps Continue refining algorithm Develop GS for remaining registries Increase from 240 to 480 GS cases Run algorithm on all of registry’s invasive melanoma of skin cases Provide registry with reports to use to determine which cases to review Algorithm refinement: hope to improve % agreement Registry reports: Decimal errors, value found by algorithm & no value on CTC (start with these) then could look at discrepant values, algorithm = 980 & not on CTC or vice versa. Can customize reports for the registries.

Thank you!! Questions?