Automatic Lung Nodule Detection Using Deep Learning and Hand Crafted Features wrichey@tulane.edu Tulane University najikh@cs.ucf.edu University of Central Florida bagci@ucf.edu Problem Overview Hand Crafted Features Results Bag of Frequencies For 80% training, 20% testing, the percent accuracy is shown below. Negative accuracy reflects the number of true negatives, positive accuracy reflects the number of true positives Parameter Description Example Number of slices #of 2D surfaces extracted from nodule 10 [4 slices in XY plane, 3 slices in YZ plane, 3 slices in ZX plane] Size of slices Important if sampling outside of radius 1.2 slices will be 120% the size of the nodule Slice below is 40x40 Radius is 17 Sample Radii Determines #and location of intensity profiles in relation to the radius [.5, .9, 1.05] three intensity profiles will be taken at 50%, 90% and 105% of the radius Each radius is shown in a separate color (RBG) in the image below Number of intensity Samples # of points sampled; determines angles b/t points of intensity profile 8 points will be taken every 45 degrees, counter clockwise (as shown; first sample is black, last is bright) Characterizes nodules based on morphology and obtains feature vectors from frequency based spectral signatures Input: Whole image Nodule radius Nodule center coordinates Output: Feature vector of frequency profile from intensities Take 2D surfaces from 3D voxel Sample intensities at each given radius Get spectral signatures from intensity profiles missed lung cancer cancer detected by CAD false positive from CAD Motivation Lung Cancer is the leading cause of cancer-related death worldwide 30-35% of lung nodules are missed by radiologists during lung cancer screening Early detection of cancer improves mortality and morbidity Estimated Radius Annotated Radius Above: Bag of Frequencies sampling being used to estimate nodule radius (in yellow) Objective Improve lung cancer screening by developing a novel, accurate, and efficient CAD system. Dataset Taxonomic Diversity & Taxonomic Distinctness LIDC/IDRI Dataset, publicly available LUng Nodule Analysis (LUNA) Annotations Positive findings: locations of nodules >= 3mm accepted by at least 3 out of 4 radiologists Negative findings: candidates from 3 existing detection algorithms that were rejected by radiologists None Index 0 0 Voxels Index 1 0 Voxels 63 1 Voxel 43 2 Voxel Index 2 0 Voxels 42 2 Voxel Index 107 29 2 Voxel -513 1 Voxel Intensity(HU): Abundance: Estimated Radius Annotated Radius Characterizes nodules based on texture and patterns, analyzing intensity distribution/relationships utilizing phylogenetic trees (from ecology) Input: Whole image Nodule radius Nodule center coordinates Output: Feature vector of taxonomic indices from ROIs Determine Regions of Interest rings, spheres, whole nodule Cast intensities into a tree for each ROI Each HU(intensity) is a leaf Each leaf has a property: # of voxels with that intensity (abundance) Calculate Taxonomic Indices using Distance between intensities in the tree of intensities Abundance of intensity values Deep Learned Features Discussion Above: an example tree; left nodes = height, for simpler calculations; right nodes = intensity value (in Hounsfield Units); Each node has a property, the number of voxels with that intensity Nodule radius estimation average error: 1.29 pixels Estimation provides meaningful features for classification; improving accuracy will improve results to closer to annotated radius results Simplified illustration of CNN architectures of GoogLeNet Pretrained Network: GoogLeNet, “Inception-v3” Current state-of-the-art CNN architecture for the ILSVRC challenge trained on ImageNet Two convolution layers, two pooling layers, and nine “Inception” layers. Inception layers of GoogLeNet consist of six convolutional layers with different kernel sizes and one pooling layer (shown above) Inception-v3: concatenates filters of different sizes and dimensions into a single new filter Nodule Radius Estimation Future Work Iterative algorithm to estimate nodule radius based on the assumption that the nodule lies between bright intensities inside the nodule, and darker intensities outside the nodule. Creating a pipeline to improve time efficiency Combining with custom neural network Compare with commercial software Evaluate and test the software in an independent cohort = ending point = sampled radius = starting endpoint Iteration 1: sampled radius lies inside the nodule; start = sampled radius Iteration 2: sampled radius lies outside the nodule; end = sampled radius References Armato III, S. G., McLennan, G., Bidaut, L., McNitt-Gray, M. F., Meyer, C. R., Reeves, A. P., ... & Kazerooni, E. A. (2011). The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics, 38(2), 915-931. Ciompi, F., Jacobs, C., Scholten, E. T., Wille, M. W., de Jong, P. A., Prokop, M., & van Ginneken, B. (2015). Bag-of-Frequencies: A Descriptor of Pulmonary Nodules in Computed Tomography Images. IEEE Transactions On Medical Imaging, 34(4), 962-973. de Carvalho Filho, A. O., Silva, A. C., de Paiva, A. C., Nunes, R. A., & Gattass, M. (2016). Lung-Nodule Classification Based on Computed Tomography Using Taxonomic Diversity Indexes and an SVM. Journal of Signal Processing Systems, 1-18. Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., ... & Summers, R. M. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging, 35(5), 1285-1298. A B Input: Image Patch calculate: start = image center end = image edge Output: Estimated Radius Samples taken using BoF At radius halfway between start and end Pixel value threshold applied: sampled intensity is classified as inside/outside of nodule Percentage threshold applied: Sampled radius is classified as inside/outside of nodule Radius inside nodule: (sample was too small) End = previous end Start = sampled radius Radius outside nodule: (sample was too big) Start = previous start End = sampled radius Above: the difference, in pixels, between the estimation and the true radius; shown out of 1000 nodule patches Acknowledgements NSF-REU at the University of Central Florida is greatly appreciated for funding support. Special thanks to Dr. Shah and Dr. Lobo. Illustration of inception 3a layer of GoogLeNet.