Big Data in Biomedical Engineering Iyad Obeid, PhD November 7, 2014 Temple University Neural Engineering Data Consortium

Slides:



Advertisements
Similar presentations
Software change management
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Get Involved Download and use the data Take the survey Join the user group Neural Engineering Data Consortium Iyad Obeid PhD, Joseph Picone PhDTemple University,
Future Trends in Monitoring Keith J Ruskin, MD Associate Professor of Anesthesiology Yale University School of Medicine.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Take the Survey! Big data needs? How could membership benefit you? Automatic Interpretation of EEGs Statistics Acknowledgements DARPA/MTO (D13AP00065)
Corpus Development EEG signal files and reports had to be manually paired, de-identified and annotated: Corpus Development EEG signal files and reports.
Artificial Intelligence Austin Luczak, Katie Regin, John Trawinski.
Simon Briggs Department of Clinical Pharmacology University of Oxford 13 th June 2008 Data management – A researchers prospective.
Computers: Tools for an Information Age
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
Welcome to CMPE003 Personal Computer Concepts: Hardware and Software Winter 2003 UC Santa Cruz Instructor: Guy Cox.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
 A data processing system is a combination of machines and people that for a set of inputs produces a defined set of outputs. The inputs and outputs.
Basics of Good Documentation Document Control Systems
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 14 Systems Analysis and Design: The Big Picture.
Abstract EEGs, which record electrical activity on the scalp using an array of electrodes, are routinely used in clinical settings to.
Complete and Integrated Lifecycle Management. Challenges 1.
Automatic Labeling of EEGs Using Deep Learning M. Golmohammadi, A. Harati, S. Lopez I. Obeid and J. Picone Neural Engineering Data Consortium College of.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Systems Analysis and Design: The Big Picture
Data Analysis for Evaluation Eric Graig, Ph.D.. Slide 2 Innovation Network, Inc. Purpose of this Training To increase your skills in analysis and interpretation.
Concussion Detection Research Tool Codi-Lee Hayes Samantha Mearns Rebecca Yaffe Dr. Thirimacho Bourlai Dr. Aaron Monseau.
Data Processing Machine Learning Algorithm The data is processed by machine algorithms based on hidden Markov models and deep learning. They are then utilized.
MSS Technologies and the AIIM Grand Canyon Chapter present: Electronic Document Management System Needs Analysis.
Streamlining the Review Cycle Michael Oettli, nlg GmbH Santa Clara, October 10 th.
Document Control Basics of Good Documentation and
PROJECT MULTICASTER Kenneth Brian Gilliam Computer Electronic Networking Dept. of Technology Eastern Kentucky University.
AIAA’s Publications Business Publications New Initiatives Subcommittee Wednesday, 9 January 2008 Rodger Williams.
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
The Neural Engineering Data Consortium: ‘Déjà Vu All Over Again’ Iyad Obeid, Associate Professor Joseph Picone, Professor Department of Electrical and.
Views The architecture was specifically changed to accommodate multiple views. The used of the QStackedWidget makes it easy to switch between the different.
THE TUH EEG CORPUS: The Largest Open Source Clinical EEG Corpus Iyad Obeid and Joseph Picone Neural Engineering Data Consortium Temple University Philadelphia,
The Indiana Youth Survey Insert Your Name, Title and Organization.
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
Why Big Data is Crucial Overall progress in the field is not commensurate with the scope of investment. The existence of massive corpora has proven to.
Keynote Talks: 10:00 AM to 10:15 AM:Dr. Iyad Obeid, Bringing Big Data to Bioengineering 10:15 AM to 10:45 AM:Dr. Chris Cieri, Data Center Models and Impact.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
TUH EEG Corpus Data Analysis 38,437 files from the Corpus were analyzed. 3,738 of these EEGs do not contain the proper channel assignments specified in.
Component 2: The Culture of Health Care Unit 3: Health Care Settings- Where Care is Delivered Unit 3 Objectives and Overview 3.1 a: Outpatient Care.
© 2013 The McGraw-Hill Companies, Inc. All rights reserved. Chapter 10 Productivity Center and Utilities.
The Neural Engineering Data Consortium: ‘Déjà Vu All Over Again’ 1 Iyad Obeid, Associate Professor Joseph Picone, Professor Department of Electrical and.
How To Design a Clinical Trial
Big Mechanism for Processing EEG Clinical Information on Big Data Aim 1: Automatically Recognize and Time-Align Events in EEG Signals Aim 2: Automatically.
Bringing Big Data to Neural Interfaces Iyad Obeid PhD, Joseph Picone PhD Temple University, Philadelphia, Pennsylvania Existing Research & Funding Model.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Reduce Waiting & No-Shows  Increase Admissions & Continuation Reduce Waiting & No-Shows  Increase Admissions & Continuation Lessons Learned.
Abstract Automatic detection of sleep state is important to enhance the quick diagnostic of sleep conditions. The analysis of EEGs is a difficult time-consuming.
DOE Data Management Plan Requirements
Demonstration A Python-based user interface: Waveform and spectrogram views are supported. User-configurable montages and filtering. Scrolling by time.
Feature Extraction Find best Alignment between primitives and data Found Alignment? TUH EEG Corpus Supervised Learning Process Reestimate Parameters Recall.
24 Nov 2007Data Management and Exploratory Data Analysis 1 Yongyuth Chaiyapong Ph.D. (Mathematical Statistics) Department of Statistics Faculty of Science.
Data Analysis Generation of the corpus statistics was accomplished through the analysis of information contained in the EDF headers. Figure 4 shows some.
SOFTWARE ARCHIVE WORKING GROUP (SAWG) REPORT TODD KING PDS MANAGEMENT COUNCIL MEETING FEB. 4-5, 2016.
Project Management Strategies Hidden in the CMMI Rick Hefner, Northrop Grumman CMMI Technology Conference & User Group November.
Views The architecture was specifically changed to accommodate multiple views. The used of the QStackedWidget makes it easy to switch between the different.
Abstract Automatic detection of sleep state is an important queue in accurate detection of sleep conditions. The analysis of EEGs is a difficult time-consuming.
CHRISTINA SHEPPLER, PHD Devers Eye Institute Telehealth Alliance of Oregon 2013 Implementing a Telemedicine Program for Diabetic Retinopathy Screening.
Session 6: Data Flow, Data Management, and Data Quality.
Automation Living in a Paper Oriented World and The Steps to Automation.
The Records Management Vision The Records Management Vision: Our Journey Towards Solutions for Everyday Life Ronald G. Smith, CRM Records and Information.
The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Scalable EEG interpretation using Deep Learning and Schema Descriptors
Brain operated wheelchair
CLASSIFICATION OF SLEEP EVENTS IN EEG’S USING HIDDEN MARKOV MODELS
G. Suarez, J. Soares, S. Lopez, I. Obeid and J. Picone
Data challenges in the pharmaceutical industry
Objective Pain Measurement “ If you can’t measure it,
Presentation transcript:

Big Data in Biomedical Engineering Iyad Obeid, PhD November 7, 2014 Temple University Neural Engineering Data Consortium

Brain Machine Interfaces Systems that bridge biological tissue with electronic circuits Characterized by the flow of information Direction of information flow – In towards the brain - cochlear implant – Out from the brain - neuroprosthesis – Bi-directional - closed loop neuroprosthesis

Signal Sources Acquire neural signals – Electroencephalogram (EEG) – Electrocorticogram (ECoG) – Cortical electrodes (single / multi unit and LFPs) – Peripheral nerves – Electromyogram (EMG) Derive / Estimate neural intent signal – Kinematic variable – Static variable

Signal Sources Invasiveness Ability to understand the neural code Long-term stability of the recordings

Key Challenges Biocompatibility of electrodes with neural tissue Acquiring and processing multichannel signals Understanding the neural code Packaging – Heat dissipation – Power supply – Form factor

10-20 EEG System

EEG Commercial EEGEmotiv Wireless Headset

Brain Computer Interfaces P300 SpellerCursor Control Wolpaw J R, and McFarland D J PNAS 2004;101:

P300 Signals

EEG

A Decade of Progress Over $200M spent by NIH & NSF alone on “neural engineering” and “brain computer interfaces” Significant additional investment from DARPA, military, state, international agencies Over 1,700 peer review journal papers Conferences, book chapters, etc Excellent academic output

Translation to Marketplace Relatively little commercial technology development Relatively little translation from well-controlled lab environments into general use Signal processing isn’t sufficiently robust – Insufficient training data Lack of common benchmarks – Need for community standards Is there a better way of investing resources?

Existing Funding Model Funding Agencies PI Research Question Money Data Methods Results PI Research Question Money Data Methods Results PI Research Question Money Data Methods Results

Limitations of Existing Model Each PI creates own data using own protocol to answer own question Practically impossible to generate sufficient data to adequately capture signal variability under a variety of experimental conditions Lack of common benchmarks Hard/impossible to cross-validate results or compare results between groups All of these factors impede progress!

Neural Signal Databases NIH and NSF Data Sharing Policy Physionet CRCNS These are good, but they do not address the lack of massive datasets collected on common protocols

Big Data Resources More than just “warehousing” archival data Can the data drive the research? Can basic scientific discovery be accelerated by intelligently designing big data resources? Can scientific discovery be made more cost effective?

Alternative Model Funded PI Funded PI Funded PI Funded PI Funded PI Funded PI Unfunded PI Unfunded PI Unfunded PI Unfunded PI Unfunded PI Unfunded PI Neural Engineering Data Consortium Neural Engineering Data Consortium Data Design Data Generation Results Scoring Results Research Questions Funding Agencies Fees Data Industry PI

Neural Engineering Data Consortium Serves the community (not vice versa) Focuses community on common problems Drives algorithm development Improves amount and quality of data Validates performance claims Tiered membership fees Promotes “technology that works” Does not preclude PI’s individual work!

Neural Engineering Data Consortium Board of Directors Academia Industry Funding Groups Operations External Grants Business Mgr Director Data Design Scientists & Engineers Statisticians Study Designers Data Annotation & Archives Software Developers Scientists & Engineers Archivists Data Delivery and Support IT Personnel Technical Support Data Collection Technicians Graduate Students Medical Staff

TUH-EEG Database Goal: Release 20,000+ clinical EEG recordings from Temple University Hospital ( ) Includes physician EEG reports and patient medical histories Released as free public resource

TUH-EEG Database Task 1: Software Infrastructure and Development – convert data from proprietary formats to an open standard (EDF) Task 2: Data Capture – copy files from CDs and DVDs Task 3: Release Generation – Deidentify data – Resolve physician reports and EEGs – Clean up data

Task 1: Software and Infrastructure Development Major Tasks: Inventory the data (EEGs and physician reports) Develop a process to convert data to an open format Develop a process to deidentify the data Gain necessary system accesses to the source forms of the reports Status and Issues: Efforts to automate.e to.edf conversion failed due to incompatibilities between Nicolet’s NicVue program and ‘hotkeys’ technology. Accessing physician reports required access to 5 different hospital databases and cutting through lots of red tape (e.g., it took months to get access to the primary reporting system). No automated methods for pulling reports from the back-end database. EDF files were not “to spec” according to open source “EDFlib” so additional EDF conversion software had to be written. Patient information appears in EDF annotations.

Task 2: Data Capture Major Tasks: Copy data from media to disk Convert EEG files to EDF Capture Physician Reports Label Generation Status and Issues: 22,000+ EEG sessions have been captured from CDs/DVDs. Approximately 15% of the media were defective and needed multiple reads or some form of repair. Raw data occupies about 2 TBytes of space including video files. Conversions to EDF averaged 1 file per minute with most of the time spent writing data to disk. The process generates three files: an EEG file in EDF format, an impedance report, and a test report that contains preliminary findings. Multiple EDF files per session due to the way physicians annotate EEGs.

Number of Sessions: 22,000+ Number of Patients: ~15,000 (one patient has 42 EEG sessions) Age: 16 years to 90+ Sampling: 16-bit data sampled at 250 Hz, 256 Hz or 512 Hz Number of Channels: variable Task 2: TUH-EEG at a Glance Number of Channels: ranges from [28, 129] (one annotation channel per EDF file) Over 90% of the alternate channel assignments can be mapped to the configuration.

Task 2: Physician Reports Two Types of Reports:  Preliminary Report: contains a summary diagnosis (usually in a spreadsheet format).  EEG Report: the final “signed off” report that triggers billing. Inconsistent Report Formats: The format of reporting has changed several times over the past 12 years. Report Databases:  MedQuist (MS Word.rtf)  Alpha (OCR’ed.pdf)  EPIC (text)  Physician’s (MS Word.doc)  Hardcopies (OCR’ed pdf)

Task 2: Challenges and Technical Risks Missing Physician Reports:  It is unclear how many EEG reports in the standard format will be recovered from the hospital databases.  Coverage for 2013 was good – less than 5% of the EEG Reports were missing (and we are still trying to locate these working with hospital staff).  Coverage pre-2009 could be problematic.  Our backup strategy is to use data available from preliminary reports, which contain basic classifications of normal/abnormal and when abnormal, a preliminary diagnosis. OCR of Physician Reports:  The scanned images are noisy, resulting in OCR errors.  Takes 2 to 3 minutes per image to manually correct.

Task 3: Release Generation Major Tasks: Deidentify and randomly sequence files so patient information can’t be traced. Quality control to verify the integrity of the data. Release data incrementally to the community for feedback. Status and Issues: Patient’s name can appear in the annotations and must be redacted; format is unpredictable. Initially, we will only release standard 20-minute EEGs. Long-term monitoring or ambulatory EEGs will be released separately once we understand the data. Regularization of the physician reports.

Accomplishments and Results 22,000+ EEG signals online and growing (about 3,000 per year). Approximately 2,000 EEGs from 2012 and 2013 have been resolved and prepared for deidentification/release. Anticipated pilot release in January Need community feedback on the value of the data and the preferred formats for the reports. Expect additional incremental releases through 2Q’2014. Acquired 1,400 more EEGs from the last half of 2013 (newer data is can be processed much faster).

Observations General: This project would not have been possible without leveraging three funding sources. Community interest is high, but willingness to fund is low. Project Specific: Recovering the EEG signal data was challenging due to software incompatibilities and media problems. Recovering the EEG reports is proving to be challenging due to the primitive state of the hospital record system. Making the data truly useful to machine learning researchers will require additional data clean up, particularly with linking reports to specific EEG activity.

Automatic Interpretation of EEGs Use TUH-EEG big data resource to train machine learning systems for EEG interpretation Assist neurologists (esp. w/ long rec’s) Potentially missed diagnoses Smaller regional hospitals lacking adequate neurology staff

Funding

How to Help Take the survey! Use the data!