Camelyon Challenge 2016 Detection of metastases in sentinel lymph nodes Radboud University Medical Center, Nijmegen, Netherlands Diagnostic Image Analysis Group David Tellez Martin david.tellezmartin@radboudumc.nl Hello, everyone, thank you for attending this presentation. My name is David Tellez and I am a participant of the Camelyon Challenge from this year. I am going to present my solution to the tasks introduced by the Organizers before.
Classification of whole-slide images Target: metastasis yes or no Tasks Yes No lesions.csv x1, y2, p1 x2, y2, p2 x3, y3, p3 … xn, yn, pn - First thing I’d like to do is to remind you the two tasks of the Challenge: Classify each whole slide image as being tumor free or not Actually detect and locate specific tumor lesions, proving spatial coordinates and a probability score. Task 1 Classification of whole-slide images Target: metastasis yes or no Task 2 Detection of tumor lesions Target: x-y coordinates and probability score for each lesion
Processing Pipeline System 2 System 1 System 3 Task 1 System 2 Yes No Convolutional Neural Network Whole-Slide classification System 1 Task 2 Convolutional Neural Network I decided to break down the data pipeline into three main systems: The first one, and most important, will determine where the tumor tissue is within the whole-slide images by providing a kind of heatmap where red areas represent are very likely to be tumor tissue. I used a convolutional neural network for this. The second one takes those likelihood maps and decide whether the whole-slide image contains metastases or not. It is not trivial because sometimes the likelihood maps produce false positives so we need a smart way to decide whether these areas are tumor or false positives. The third one takes the likelihood maps as well and identify specific tumor regions, providing spatial coordinates and a probability score for the given detected lesion. Whole-Slide Images Likelihood Maps lesions.csv x1, y2, p1 x2, y2, p2 x3, y3, p3 … xn, yn, pn System 3 Post-Processing Routine Location and Probability of Tumor Lesions
Processing Pipeline System 2 System 1 System 3 Task 1 System 2 Yes No Convolutional Neural Network Whole-Slide classification System 1 Task 2 Convolutional Neural Network I will start talking about the first system. Whole-Slide Images Likelihood Maps lesions.csv x1, y2, p1 x2, y2, p2 x3, y3, p3 … xn, yn, pn System 3 Post-Processing Routine Location and Probability of Tumor Lesions
System 1: Patch-Based Classifier Patches from healthy tissue Augmented with rotation, flipping and blurring System 1 The task that we want to address here is to differentiate, or classify, between healthy and tumor tissue. I take positive and negative patches from annotated regions in the whole slide. We take the label from the central pixel. These tissue patches are augmented by randomly rotating and flipping them, and applying a blurring effect to mimic the effect the out of focus effect from the scanner. The patches are extracted online, packed into a mini-batch (balanced classes) and fed into a convolutional neural network. Convolutional Neural Network Annotated Whole-Slide Images Patches from tumor tissue
System 1: Network Architecture 3x256x256 64x128x128 128x64x64 256x32x32 512x16x16 512x8x8 512x4x4 2048x1x1 2x1x1 conv3-64 maxpool conv3-128 conv3-256 conv3-512 dropout 0.5 conv1-2 softmax conv4-2048 conv1-2048 Dense layers converted to fully convolutional I present you the network architecture that I used, with all the technical details, just in case you would like to use it with your own problems. A few details I’d like to mention: It is based on the design of the VGG networks, using these blocks of two convolutional layers and a max-pooling layer. Implemented in the Lasagne library, based in thof memory. e Theano framework. Its dense layers were converted to fully convolutional ones so that we can produce the likelihood maps faster by using bigger patch sizes at test time. It was trained in a modest GPU with only 4GB Technical details Design guidelines follow VGGNet-13 Implemented in Lasagne-Theano Loss function: categorical cross-entropy SGD updates: Adam Non-linearity: leaky ReLU Weight initialization: Xavier with ReLU scaling Regularization: L2 1e-5 and dropout 0.5 Batch normalization (epsilon 1e-4, alpha 1e-1) Convolutions do not shrink spatial dimensions Max-pooling halves spatial dimensions Training details Training-validation split 70%-30% On-the-fly patch extraction On-the-fly data augmentation Mini-batch size: 16 patches Learning rate: divided by 10 when validation accuracy plateaus Hardware: Nvidia Geforce GTX 970 (4GB) Training time: 12 hours
S1 System 1: Results On the top-left plot, you can see the training and validation loss during training On the bottom-left plot, you can observe the training and validation accuracy. It reaches 97% accuracy, it seems high but most of the patches are trivial to classify such as fatty tissue, empty patches, patches full of lymphocites, etc. Most of the real learning happens in the borders of tumor lesions. On the right, I’d like to show you how the network cluster some patches from the validation set together. Open image Show example of negative cluster Show example of positive cluster Show example of mix cluster Final patch-based classification accuracy in the validation set (unseen) 97% t-SNE projection of features extracted from validation (unseen) patches by the convnet
System 1: Likelihood Maps Trained Convolutional Neural Network Slides: normal37, normal123, tumor31, tumor 52 Once the network is trained, we can generate these likelihood maps, here you can see a few examples. Notice that for the negative examples, the network detects really small regions. Because of this noise, we need to further process these heatmaps. Healthy WSI Healthy WSI Tumor WSI Tumor WSI
Processing Pipeline System 2 System 1 System 3 Task 1 System 2 Yes No Convolutional Neural Network Whole-Slide classification System 1 Task 2 Convolutional Neural Network Once we have the likelihood maps we know where most of the tumor tissue is, however, we need to summarize this information into binary answer: healthy or tumorous WSI Whole-Slide Images Likelihood Maps lesions.csv x1, y2, p1 x2, y2, p2 x3, y3, p3 … xn, yn, pn System 3 Post-Processing Routine Location and Probability of Tumor Lesions
System 2: Likelihood Map Classifier Augmented with rotation and flipping System 2 For this task, we will use a very similar convolutional neural network. As an input, we will take a reshaped version of the likelihood maps, augment them through rotation and flipping and fed them to the network. Please notice that classifying these maps is not trivial due to this noise. Convolutional Neural Network Pre-processed Likelihood Maps (reshaped to 256x256) Raw Likelihood Maps
System 2: Network Architecture Dense layers converted to fully convolutional conv3-64 maxpool conv3-128 conv3-256 conv3-512 dropout 0.5 conv1-2 softmax conv4-1024 conv1-1024 1x256x256 64x128x128 128x64x64 256x32x32 512x16x16 512x8x8 512x4x4 1024x1x1 2x1x1 There are only two differences: The input volume is now a grayscale image The size of the dense layers is smaller since this task is less complex than the previous one. Technical details Design guidelines follow VGGNet-13 Implemented in Lasagne-Theano Loss function: categorical cross-entropy SGD updates: Adam Non-linearity: leaky ReLU Weight initialization: Xavier with ReLU scaling Regularization: L2 1e-5 and dropout 0.5 Batch normalization (epsilon 1e-4, alpha 1e-1) Convolutions do not shrink spatial dimensions Max-pooling halves spatial dimensions Training details Training-validation split 70%-30% Images loaded from disk On-the-fly data augmentation Mini-batch size: 16 images Learning rate: divided by 10 when validation accuracy plateaus Hardware: Nvidia Geforce GTX 970 (4GB) Training time: 3 hours
S2 System 2: Results It converges relatively quickly compared with the previous system, since we don’t have that much data The validation accuracy reaches 80% correct classification rate On the right side, you can see the ROC curve, selected as the evaluation method for this task Final whole-slide classification accuracy in the validation set (unseen) 80% Proposed Evaluation Method: ROC Curve AUC Score in the validation set 0.88
Processing Pipeline System 2 System 1 System 3 Task 1 System 2 Yes No Convolutional Neural Network Whole-Slide classification System 1 Task 2 Convolutional Neural Network In the last part of the presentation, I’ll explain the system used to detect actual tumor lesions within the likelihood maps. Remember that we need to provide two spatial coordinates and a probability score for each lesion. Whole-Slide Images Likelihood Maps lesions.csv x1, y2, p1 x2, y2, p2 x3, y3, p3 … xn, yn, pn System 3 Post-Processing Routine Location and Probability of Tumor Lesions
System 3: Lesion Detection XYP System 3: Lesion Detection I’d like to illustrate the difficulty of this task. On your left, the likelihood map generated by the network. On your right, the ground-truth tumor lesions that I’d have to report to get a perfect detection. The really small ones doesn’t need to be detected. Big problem: decompose large lesions into smaller regions Why: in order to get the maximum sensitivity score, you need to report every single tumor lesion individually. As you can see, although my network detects most of the tumor tissue, there is no easy way to decompose the big red region into those required by the ground-truth. I spent most of my time dealing with this problem. Predicted Likelihood Map (red areas are very likely to be tumor) Ground-truth Tumor Lesion (each color represents a different lesion)
System 3: Algorithm S3 I came up with this algorithm: XYP System 3: Algorithm I came up with this algorithm: Grey-value erosion to force some areas to detach from the main lesion and also be far enough from the lesion borders. The filter size was selected running a parameter search and taking that one with the best final score (window size 4). Thresholding: removing every pixel below a certain probability level. This threshold was selected running a parameter search as well (0.9). Finally, binarization and labelling, taking the average probability as a score. Although the final figure is very similar to the ground-truth, the sensitivity is going to be low since we are not reporting several small lesions. Raw Likelihood Map After Grey-value Erosion After Thresholding After Binarization and Labelling Ground-truth
S3 XYP System 3: Results Finally, the FROC curve was used to assess the performance of the algorithm within this task. Proposed evaluation method: average sensitivity at 6 predefined false positive rates of the FROC curve. Final score in the validation set: 0.6096
Thank You Radboud University Medical Center, Nijmegen, Netherlands Diagnostic Image Analysis Group David Tellez Martin david.tellezmartin@radboudumc.nl