CS 698 | Current Topics in Data Science

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and Feature Extraction Techniques By Mohammed Jirari Shanghai,

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

TU/e PHILIPSPhilips Medical Systems Healthcare IT - Advanced Development 1/38 The Effects of Filtering on Visualization and Detection of Colonic Polyps.

Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Evaluating Classifiers

AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University Automatic 3D Image Segmentation of Internal Lung Structures.

1 End-to-End Learning for Automatic Cell Phenotyping Paolo Emilio Barbano, Koray Kavukcuoglu, Marco Scoffier, Yann LeCun April 26, 2006.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Date of download: 5/28/2016 Copyright © 2016 SPIE. All rights reserved. Flowchart of the computer-aided diagnosis (CAD) tool. (a) Segmentation: The region.

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Lecture 4b Data augmentation for CNN training

Image Processing Diagnostics: Emphysema Alex McKenzie Metropolitan State College of Denver.

Automatic Lung Nodule Detection Using Deep Learning

Radboud University Medical Center, Nijmegen, Netherlands

Automatic Lung Cancer Diagnosis from CT Scans (Week 3)

Cancer Metastases Classification in Histological Whole Slide Images

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Presented by Yuting Liu

Deep Learning for Dual-Energy X-Ray

Automatic Lung Cancer Diagnosis from CT Scans (Week 1)

Analysis of Sparse Convolutional Neural Networks

Evaluating Classifiers

Deep Feedforward Networks

The Relationship between Deep Learning and Brain Function

Environment Generation with GANs

Object Detection based on Segment Masks

Automatic Lung Cancer Diagnosis from CT Scans (Week 2)

[Ran Manor and Amir B.Geva] Yehu Sapir Outlines Review

Computer Science and Engineering, Seoul National University

The Problem: Classification

Session 7: Face Detection (cont.)

Automatic Lung Cancer Diagnosis from CT Scans (Week 4)

Tulane University University of Central Florida Problem Overview

Robust Lung Nodule Classification using 2

Basic machine learning background with Python scikit-learn

Natural Language Processing of Knee MRI Reports

Schizophrenia Classification Using

CAMELYON16 Challenge Matt Berseth, NLP Logix Jacksonville FL

Training Techniques for Deep Neural Networks

CS 698 | Current Topics in Data Science

CS6890 Deep Learning Weizhen Cai

Brain Hemorrhage Detection and Classification Steps

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Deep Learning Convoluted Neural Networks Part 2 11/13/

Fully Convolutional Networks for Semantic Segmentation

A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.

Bird-species Recognition Using Convolutional Neural Network

Deep Face Recognition Omkar M. Parkhi Andrea Vedaldi Andrew Zisserman

Introduction to Neural Networks

CSSE463: Image Recognition Day 11

CS 4501: Introduction to Computer Vision Training Neural Networks II

Deep Learning Hierarchical Representations for Image Steganalysis

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Lecture: Deep Convolutional Neural Networks

Analysis of Trained CNN (Receptive Field & Weights of Network)

Automatic Handwriting Generation

Introduction to Neural Networks

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

DRC with Deep Networks Tanmay Lagare, Arpit Jain, Luis Francisco,

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Object Detection Implementations

Report 7 Brandon Silva.

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Outlines Introduction & Objectives Methodology & Workflow

Presented By: Firas Gerges (fg92)

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

CS 698 | Current Topics in Data Science Dr. Usman Roshan Deep Convolutional Neural Networks for Lung Cancer Detection Paper Presentation | Spring 2018 Fadi G. Farhat February 15th, 2018 New Jersey Institute of Technology

Authors Albert Chon • Peter Lu Niranjan Balachandar Department of Computer Science Stanford University

Introduction: Lung cancer is one of the most common and deadliest cancers 225,000 cases; 150,000 deaths; $12 billion in healthcare costs yearly in the United States Only 17% of people in the U.S. diagnosed with lung cancer survive five years after the diagnosis Current diagnostic methods include biopsies and imaging, such as CT scans Early detection of lung cancer significantly improves the chances for survival; difficult to do with fewer symptoms

Objective: Binary classification problem to detect the presence of lung cancer in patient CT scans of lungs with and without early stage lung cancer Build an accurate classifier using 2D and 3D convolutional neural networks Classifier could speed up and reduce costs of lung cancer screening; allow early detection; improve survival Computer-aided diagnosis (CAD) system will take as input patient chest CT scans, and outputs whether or not the patient has (early stage) or is likely to develop lung cancer

Challenges: CAD system must detect the presence of a tiny nodule (less than 10 mm in diameter for early stage) from a large 3D lung CT scan (around 200 mm x 400 mm x 400 mm) Example of an early stage lung cancer nodule (~5mm) CT scan is filled with noise from surrounding tissues, bone, air, water, blood

Data: Primary dataset: patient lung CT scan dataset from Kaggle’s Data Science Bowl 2017 Labeled data for 2101 patients; divided into training set of 1261, validation set of 420, and test set of 420 Data consists of CT scan data (100 to 400 2D slice images per patient) and a label (0 for no cancer, 1 for cancer); Kaggle dataset does not have labeled nodules!

Data (cont.): Secondary dataset: patient lung CT scan data with labeled nodules from the LUng Nodule Analysis 2016 (LUNA16) Challenge LUNA16 dataset has labeled data for 888 patients; divided into training set of 710, validation set of 178 Data consists of CT scan data and a nodule label (list of nodule center coordinates and diameter)

Approach: Preprocess the 3D CT scans using segmentation, normalization, down-sampling, and zero-centering Train a U-Net (2D Convolutional Networks for Biomedical Image Segmentation) for nodule candidate detection Input regions around nodule candidates detected by the U-net into 3D CNNs to classify the CT scans as positive or negative for lung cancer

Preprocessing & Segmentation: Convert the pixel values in each image to Hounsfield units (HU), a measurement of radiodensity, then stack 2D slices into a single 3D image bone: tissue:

Preprocessing & Segmentation: Use segmentation to mask out the bone, outside air, and other substances that would make the data noisy; retain only lung tissue information Watershed and Thresholding segmentation tested; Thresholding used original thresholding watershed

Preprocessing & Segmentation: Normalize the 3D image by applying linear scaling Down-sample each 3D image by a scale of 0.5 in each of the three dimensions Zero-center the data by subtracting the mean of all the images from the training set

U-Net for Nodule Detection: Find small boxes containing top cancerous nodule candidates Train a modified version of the U-Net on the LUNA16 data Model is trained to output images (256x256) where each output pixel has a value between 0 and 1 indicating the probability the pixel belongs to a nodule Trained U-Net is then applied to the segmented Kaggle CT scan slices to generate nodule candidates

U-Net for Nodule Detection:

U-Net for Nodule Detection:

U-Net for Nodule Detection: U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives U-Net labeled input U-Net predicted output true nodule location

U-Net for Nodule Detection: Solution: Locate top 8 (most active) nodule candidates (32x32x32 volumes) and save them Top sectors not permitted to overlap to prevent them from simply being clustered in the brightest region of the image Combine these sectors into a single 64x64x64 volume and use as input to classifiers, which assign a label (cancer or not cancer)

Malignancy Classifiers: Linear classifier used as a baseline, then a vanilla 3D CNN, and a GoogleNet-based 3D CNN were applied Each classifier used weighted loss (weight for a label is the inverse of the frequency of the label in the training set) CNNs use ReLU activation and droupout after each convolutional layer during training ReLU = Rectified Linear Unit

Malignancy Classifiers (cont.): Vanilla 3D CNN (left) and GoogleNet 3D CNN (right) architectures

Results: Kaggle test set accuracy, sensitivity, specificity, and AUC of ROC Sensitivity: true positive rate Specificity: true negative rate AUC: Area Under the ROC Curve ROC: Receiver Operating Characteristic FPR vs. TPR for diff. cutoff points

Results (cont.): Observation: activations showing that cancerous nodule presence (and location) is detected in some outputs

Conclusions: The deep 3D CNN models, and in particular the GoogleNet-based model, performed the best on the test set State-of-the-art performance AUC of 0.83 not achieved; models performed well considering that less labeled data was used (than most state-of-the-art CAD systems) Current model could be extended to determine the exact location of the cancerous nodules, and not only whether or not the patient has cancer (slide 20)

Future Work: Use Watershed method instead of Thresholding as the initial lung segmentation Make the networks deeper Perform more extensive hyper-parameter tuning Generalize: extend models to 3D images for other cancers

CS 698 | Current Topics in Data Science Dr. Usman Roshan CT Scan Classification for Lung Cancer Detection Course Project | Spring 2018 | Update 1 Fadi G. Farhat April 19th, 2018 New Jersey Institute of Technology

Project Data: Primary dataset: patient lung CT scan dataset from Kaggle’s Data Science Bowl 2017 Labeled data for 1595 patients; divided into training set of 1256, validation set of 141, and test set of 198 Data consists of CT scan images (100 to 500+ 2D slice images per patient) and a label (0 for no cancer, 1 for cancer)

Initial Approach & Baseline: Main challenge: number of slices per patient varies, making it difficult to create a uniform dataset to feed to classifier. Initial approach: take average of all CT scan images per patient and condense into one image. Clearly some information is lost by doing that. Initial approach: Create one flattened numpy array from average image, resulting in a dataset with 1595 rows and 262,144 columns (HU pixel values, not RGB).

Initial Approach & Baseline: Visualizing average image taken from all CT scan slices. Image is still in DICOM format; smaller array than JPG.

Initial Approach & Baseline: To establish baseline, data was classified using SVM and Random Forest; feature selection performed. Linear SVC results (accuracy): 1,000 features (training time: 90 minutes): CV: ZERO Label: 75.00%; ONE Label: 37.84% Test: ZERO Label: 69.50%; ONE Label: 21.05% 10,000 features (training time: 180 minutes): CV: ZERO Label: 66.35%; ONE Label: 35.14% Test: ZERO Label: 71.63%; ONE Label: 38.60%

Initial Approach & Baseline: Random Forest results (accuracy): 1,000 features (training time: 90 minutes): Overall accuracy: 70.70% ZERO Label: 96.15%; ONE Label: 5.40% 10,000 features (training time: 240 minutes): Overall accuracy: 72.34% ZERO Label: 97.16%; ONE Label: 5.26% Random Forest classifier does not seem to discriminate between cancer and non-cancer averaged CT scans.

Convolution Neural Network: Flower Classification 2D CNN results (accuracy): Conv2D → RELU → AveragePooling2D → Flatten → Dense(2) → Softmax Overall accuracy: 73.76% (cross validation) ZERO Label: ~100%; ONE Label: ~0% Training time (50 epochs): 1 Days 14 Hours Flower classification CNN does not seem to discriminate between cancer and non-cancer averaged CT scans.

Convolution Neural Network: Image Classification 2D CNN results (accuracy): Conv2D → RELU → Conv2D → RELU → MaxPooling2D → Dropout(0.25) → Flatten → Dense(128) → RELU → Dropout(0.5) → Dense(2) → Softmax Overall accuracy: 73.76% (cross validation) ZERO Label: ~100%; ONE Label: ~0% Training time (50 epochs): 2 Days 4 Hours Image classification CNN does not seem to discriminate between cancer and non-cancer averaged CT scans.

Next steps: Perform Lung Segmentation before classification and build new dataset [took ~10 hours to create set]: Single Segmented Slice Average Segmented Slice

Next steps: Create set of 9 or 10 chunks per CT scan instead of one:

Next steps: Create flattened dataset based on segmented average slice and run through linear classifiers again. Create 3D image dataset based on segmented chunks and run through 3D CNN [took ~80 hours to create set]: Convolution3D → RELU → MaxPooling3D → Convolution3D → RELU → MaxPooling3D → Convolution3D → RELU → MaxPooling3D → Dropout(0.25) → Flatten → Dense(512) → RELU → Dropout(0.5) → Dense(128) → RELU → Dropout(0.5) → Dense(2) → Softmax

Thursday Morning Update: Dataset based on 10 (non-segmented) and averaged chunks is too large for even 1 TB of RAM! Models running on Ubuntu VM with 32 logical processors and 1 TB of RAM, which is a VM Ware limitation.

Thursday Morning Update: Attempt to run 3D CNN on 3D dataset failed. 1 TB of RAM is consumed before the first epoch is completed!