Information Extraction Group Health David Carrell, PhD Group Health Research Institute June 29, 2010.

Information Extraction Group Health David Carrell, PhD Group Health Research Institute June 29, 2010

BA Phil. & Relig. NNC MA Pol. Science SUNY U of Washington PhD Pol. Science Pew Health Policy UCSF U of Wash Research, IT Group Health IT, Research David’s background...

Group Health Research Institute (GHRI) ●Group Health (www.ghc.org) ●Founded 1947, Seattle, WA ●Integrated delivery system (“HMO”) ●~600K patients in WA (some OR, ID) ●Comprehensive EMR & patient portal (2004+) ●GHRI (www.grouphealthresearch.org) ●Founded 1983 ●300 staff (50 investigators) ●2009: >250 active grants ($39M)

Group Health Research Institute (GHRI) ●Applied research ●Epidemiology, health systems, clinical trials, economics... ●Limited bio-informatics expertise ●Collaborative ●HMO-Research Network, Cancer-RN,... MH-RN ●Federated data systems ●NLP vision ●NLP expertise through collaboration ●Bring NLP to the text—locally... other network sites

HMO Research Network Large data repositories Common EMR platforms GHRI & Research Consortia Virtual Data Warehouse (VDW)

GHRI & Virtual Data Warehouse (VDW) Structured data (legacy + Epic/EMR) Minimum 1990+ Integrated care delivery (some claims) Diagnoses, procedures, pharmacy, tumor, vitals, census/geocode, etc.

HMO Research Network GHRI & Virtual Data Warehouse (VDW)

GHRI & NLP Adoption

HMO Research Network GHRI & NLP Adoption

caBIG TBPT adoption proposal, Jun 2006 caTIES for pathology & radiology text, ~2007 Chart note text, May 2007 GWAS (eMERGE) proposal, Aug 2007 GATE experimentation, Feb 2008 Strategic planning conference, Dec 2008 ARRA Challenge Grant, Apr 2009 UIMA/cTAKES adoption, Aug 2009 Proposals... e.g.,HMORN multi-site, Jan 2010 GHRI & NLP Adoption

●How to bring NLP capacity to clinical text? ●“Cookbooks” (SAS  Java programmers) ●“Parachuted” hardware ●Parachuted virtual machine (?) ●Cloud-based processing ●Security issues ●Other?

GHRI & NLP Adoption

Challenges of Cloud-based Solutions: Unfamiliar technologies Responsibility sharing (e.g., security) Patient privacy Institutional risk De-identification Graduated adoption? GHRI & NLP Adoption

SHARP Cloud Security Workshop Spring 2011 Educational focus Challenges of processing clinical text in a novel security space (virtual firewall?) Security best practices IRB engagement Graduated adoption strategies SHARP -- Exploring deployment strategies

NLP Challenge Grant Natural Language Processing for Cancer Research Network Surveillance Studies Aim 1: Deploy open-source NLP software Develop ETL connective tissue Build “human capital” (Java, NLP) Aim 2: NLP algorithm boot camp: Recurrent breast cancer diagnoses >3000 existing gold standard cases (human reviewed) Approach: Local deployment/programming support High-level NLP/bioinformatics expertise via external collaboration Participants: GHRI (Carrell, Buist, Chubak), Mayo Clinic/Harvard (Savova), Pittsburgh (Chapman), Vanderbilt (Xu).

Epic/Clarity Chart Notes Radiology Reports Pathology Reports UIMA/cTAKES NLP Raw Rich Document Manager Document_IdentifierConcept_Code Radiology_Report_0000012877143 Radiology_Report_0000018600231 Radiology_Report_0000013134988 Radiology_Report_0000015287109 Normalized NLP SQL Server Database NLP Challenge Grant – Aim 1

Document Type Available Documents Percent NLP Concept-Coded Chart Notes20M25% Radiology4M33% Pathology1.2M2% Chart Notes Radiology Path NLP Challenge Grant – Aim 1

NLP Challenge Grant – Aim 2

Rec Br Ca? AE1AE2AE3 Progress Notes AE1AE2 Oncology Notes AE1AE2AE3 Radiology Reports AE1 Pathology Reports NLP Challenge Grant – Aim 2

eMERGE consortium Vanderbilt, Mayo, Northwestern, Marshfield, Group Health Can EMRs from multiple institutions provide comparable phenotype data for GWAS? 14 phenotypes Group Health structured data Adoption of NLP algorithms developed by others “Low-tech” NLP Text explorer, Assisted chart abstraction

Clinical Text Explorer Select text source (chart notes, radiology, pathology, etc.) Search: recurrent NEAR breast NEAR cancer. Date range Sample spec’s N documents, N patients found Search terms highlighted

Assisted Chart Abstraction

A-Z Full-text Indexes Chart notes 550K pts 17M notes 0.8B lines SQL Server Chart notes 550K pts 17M notes 0.8B lines Pre-processed A-Z ID A-Z Date Cohort Lists Data Warehouse A-Z Etc. Point-and-click Outside EMR Assisted Chart Abstraction GUI NLP Concept Codes Data Text capture Assisted Chart Abstraction

Identify Cohort Selection criteria applied to the patient Selection criteria applied to the notes Pt Dx/Px/RxPt VisitsPt DemogNote DateNote ByNote TypeNote Text Assign note priority Assisted Chart Abstractio n Traditional chart abstractionAssisted chart abstraction Data Assisted Chart Abstraction

2903 (100%) Initial cohort identification: 137,019 (100%) 671 (23%) Inclusion criteria (demog., dx, px, etc.): 70,119 (51%) 122 (4%) Pre- processed text: 284 (0.2 %) 228 (8%) Electronic text: 28,186 (21%) Chart Notes Patients Stage Text: “CATARACT” Note: Op/Ophthal exam Near: Cataract procedure Assisted Chart Abstraction

Potential SHARP synergy... National Cancer Institute FOA: Tools for Electronic Data Extraction Funding: NCI Contract for software development Aim: Enhance/automate existing SEER cancer case identification (largely manual abstraction of EHR/paper charts) Approach: Assess, propose, test, modify, develop, deploy technologies that leverage NLP to automate some aspects of SEER workflow Participants: IMS, Inc., SEER sites (4), Group Health, Harvard

SHARP – NLP research lab

Questions – Discussion

Information Extraction Group Health David Carrell, PhD Group Health Research Institute June 29, 2010.

Similar presentations

Presentation on theme: "Information Extraction Group Health David Carrell, PhD Group Health Research Institute June 29, 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Information Extraction Group Health David Carrell, PhD Group Health Research Institute June 29, 2010.

Similar presentations

Presentation on theme: "Information Extraction Group Health David Carrell, PhD Group Health Research Institute June 29, 2010."— Presentation transcript:

Similar presentations

About project

Feedback