MS-SEG Challenge: Results and Analysis on Testing Data

Slides:



Advertisements
Similar presentations
1 Spheres A sphere is formed by revolving a circle about its diameter. In space, the set of all points that are a given distance from a given point, called.
Advertisements

Comparative Analysis of Continuous Table and Fixed table Acquisition Methods: Effects on Fat Suppression and Time Efficiency for Single-Shot T2-weighted.
© Fraunhofer MEVIS Toward Automated Validation of Sketch-based 3D Segmentation Editing Tools Frank Heckel 1, Momchil I. Ivanov 2, Jan H. Moltz 1, Horst.
Patch to the Future: Unsupervised Visual Prediction
Performance Evaluation Measures for Face Detection Algorithms Prag Sharma, Richard B. Reilly DSP Research Group, Department of Electronic and Electrical.
AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Z.Theodosiou, F.Raimondo, M.E.Garefalaki, G.Karayannopoulou, K.Lyroudia, I.Pitas,
A UTOMATIC S EGMENTATION OF T HALAMIC N UCLEI WITH STEPS L ABEL F USION J.Su 1, T. Tourdias 2, M.Saranathan 1, and B.K.Rutt 1 1 Department of Radiology,
3D Reconstruction of Anatomical Structures from Serial EM images.
MedIX – Summer 06 Lucia Dettori (room 745)
University of North Carolina Comparison of Human and M-rep Kidneys Segmented from CT Images James Chen, Gregg Tracton, Manjari Rao, Sarang Joshi, Steve.
Yujun Guo Kent State University August PRESENTATION A Binarization Approach for CT-MR Registration Using Normalized Mutual Information.
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
12-Apr CSCE790T Medical Image Processing University of South Carolina Department of Computer Science 3D Active Shape Models Integrating Robust Edge.
reconstruction process, RANSAC, primitive shapes, alpha-shapes
Multivariate Data Analysis Chapter 9 - Cluster Analysis
Texture-based Deformable Snake Segmentation of the Liver Aaron Mintz Daniela Stan Raicu, PhD Jacob Furst, PhD.
An Integrated Pose and Correspondence Approach to Image Matching Anand Rangarajan Image Processing and Analysis Group Departments of Electrical Engineering.
Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.
Pohl K, Konukoglu E -1- National Alliance for Medical Image Computing Measuring Volume Change in Tumors Kilian M Pohl, PhD Ender Konugolu Slicer3 Training.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
MS Lesion Visualization Assisted Segmentation Daniel Biediger COSC 6397 – Scientific Visualization.
Segmentation of Subcortical Sructures in MR Images VPA.
Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.
AUTOMATIZATION OF COMPUTED TOMOGRAPHY PATHOLOGY DETECTION Semyon Medvedik Elena Kozakevich.
ASSESSMENT OF INTERNAL BRUISE VOLUME OF SELECTED FRUITS USING MR IMAGING Ta-Te Lin, Yu-Che Cheng, Jen-Fang Yu Department of Bio-Industrial Mechatronics.
Object Detection with Discriminatively Trained Part Based Models
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
A Comparative Evaluation of Three Skin Color Detection Approaches Dennis Jensch, Daniel Mohr, Clausthal University Gabriel Zachmann, University of Bremen.
NIH PCKD/Emory University MRI Imaging Report 1/31/2000.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
NA-MIC National Alliance for Medical Image Computing Competitive Evaluation & Validation of Segmentation Methods Martin Styner, UNC NA-MIC.
Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for.
QIBA CT Volumetrics - Cross-Platform Study (Group 1C) March 18, 2009 Interclinic Comparison of CT Volumetry Quantitative Imaging Biomarker Alliance.
NA-MIC National Alliance for Medical Image Computing National Alliance for Medical Image Computing: NAMIC Ron Kikinis, M.D.
Focal Analysis of Knee Articular Cartilage Quantity and Quality Dr. Tomos G. Williams Imaging Science and Biomedical Engineering University of Manchester.
QIBA CT Volumetrics Group 1B: (Patient Image Datasets) Teleconference Nov 5, 2008.
N Petrick RSNA QIBA Phantom Group November 13, QIBA Proposed Phase 1A Project V5.0 QIBA Phantom Subgroup.
TVCG 2013 Sungkil Lee, Mike Sips, and Hans-Peter Seidel.
UNC Shape Analysis Pipeline
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Impact of Axial Compression for the mMR Simultaneous PET-MR Scanner Martin A Belzunce, Jim O’Doherty and Andrew J Reader King's College London, Division.
References [1] Coupé et al., An optimized blockwise nonlocal means denoising filter for 3-D magnetic resonance images. IEEE TMI, 27(4):425–441, 2008.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Declaration of Conflict of Interest or Relationship Research partially supported by a grant from GE Healthcare. THOMAS: T HALAMUS O PTIMIZED M ULTI -A.
Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.
Author :J. Carballido-Gamio J.S. Bauer Keh-YangLeeJ. Carballido-GamioJ.S. BauerKeh-YangLee S. Krause S. MajumdarS. KrauseS. Majumdar Source : 27th Annual.
Bar Graphs Used to compare amounts, or quantities The bars provide a visual display for a quick comparison between different categories.
Accuracy of 3D Scanning Technologies in a Face Scanning Scenario Chris Boehnen and Patrick Flynn University of Notre Dame.
France Life Imaging (FLI) Information Analysis and Management (IAM)
Random Forests For Multiple Sclerosis Lesion Segmentation
ISBI Camelyon16 Challenge Prague, April 13, 2016
Journal Club M Havaei, et al. Université de Sherbrooke, Canada
Model-Based Organ Segmentation: Recent Methods
Figure 1 Perivenous distribution of multiple sclerosis lesions
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
CSc4730/6730 Scientific Visualization
“The Truth About Cats And Dogs”
Cognition And Neocortical Volume After Stroke
Considerations in measuring cartilage thickness using MRI: factors influencing reproducibility and accuracy  S. Koo, M.S., G.E. Gold, M.D., T.P. Andriacchi,
Elena Bernardis, Leslie Castelo-Soccio 
Introduction What IS computer vision?
Measurement accuracy of focal cartilage defects from MRI and correlation of MRI graded lesions with histology: a preliminary study  Chris A McGibbon,
Figure 3 Anatomical distribution of all implanted electrodes
Deep Learning for Diagnosis of Chronic Myocardial Infarction on Nonenhanced Cardiac Cine MRI Deep learning to identify myocardial infarction on nonenhanced.
MultiModality Registration using Hilbert-Schmidt Estimators
Automating stroke lesion segmentation in brain images using a multi-model multi-path convolutional neural network Yunzhe.
Presentation transcript:

MS-SEG Challenge: Results and Analysis on Testing Data Olivier Commowick October 21, 2016

Outline Testing data overview Metrics overview and ranking strategy Outlier case results Scanner-wise results Overall results and ranking Discussion

Outline Testing data overview Metrics overview and ranking strategy Outlier case results Scanner-wise results Overall results and ranking Discussion

Testing data description 38 MRI from three different centers Bordeaux, Lyon, Rennes Acquisitions from 4 different scanners Siemens Verio 3T (01) – Rennes – 10 cases GE Discovery 3T (03) – Bordeaux – 8 cases Siemens Aera 1.5T (07) – Lyon – 10 cases Philips Ingenia 3T (08) – Lyon – 10 cases MRI following the OFSEP protocol 3D T1, FLAIR, axial DP and T2

Testing data description A common protocol, yet some differences In-plane resolution: 0.5 mm for 01, 03, 08; 1 mm for 07 Slice thickness different from scanner to scanner Between 3 and 4 mm slice thickness for T2 and DP Variable lesion loads: from 0.1 cm3 to 71 cm3 An outlier case One patient with no lesions 5 experts out of 7 delineated no visible lesions

Example delineated datasets FLAIR from center 01 FLAIR from center 03 FLAIR from center 07 FLAIR from center 08

Outline Testing data overview Metrics overview and ranking strategy Outlier case results Scanner-wise results Overall results and ranking Discussion

Metric categories Evaluation of MS lesions segmentation: tough topic Consensus vs ground truth What is of interest to the clinician? Two metric categories: Detection: are the lesions detected, independently of the precision of their contours? Segmentation: are the lesions contours exact? Overlap measures Surface-based measures https://portal.fli-iam.irisa.fr/msseg-challenge/evaluation

Segmentation quality measures Overlap measures Sensitivity Positive predictive value Specificity Dice score Average surface distance

Detection measures Is a lesion detected: 2 criterions Sufficient overlap with consensus Connected components responsible for overlap not too large Two quantities measured TPG: lesions overlapped in ground truth TPA: lesions overlapped in automatic segmentation Metrics Lesion sensitivity and PPV, F1 score

Outline Testing data overview Metrics overview and ranking strategy Outlier case results Scanner-wise results Overall results and ranking Discussion

Preamble to results: teams naming 13 pipelines evaluated Team 1: Beaumont et al. (graph cuts) Team 2: Beaumont et al. (intensity normalization) Team 3: Doyle et al. Team 4: Knight et al. Team 5: Mahbod et al. Team 6: McKinley et al. Team 7: Muschelli et al.

Preamble to results: teams naming 13 pipelines evaluated (continued) Team 8: Roura et al. Team 9: Santos et al. Team 10 (*): Tomas-Fernandez et al. Team 11: Urien et al. Team 12: Valverde et al. Team 13: Vera-Olmos et al.

Evaluation metrics for outlier case Most evaluation metrics are undefined No consensus label Two substitution metrics computed Number of lesions detected Number of connected components Total volume of lesions detected Both scores are optimal at 0

Evaluation on outlier case Evaluated method Lesion volume (cm3) Number of lesions Team 1 8.26 21 Team 2 0.008 4 Team 3 Team 4 N/A Team 5 29.06 854 Team 6 0.47 7 Team 7 6.38 411 Team 8 Team 9 2.66 102 Team 10 (*) 28.84 1045 Team 11 3.48 66 Team 12 0.06 1 Team 13 0.07

Outline Testing data overview Metrics overview and ranking strategy Outlier case results Scanner-wise results Overall results and ranking Discussion

Visual results for center 01 FLAIR Team 1 Team 2 Team 3 Consensus Team 4 Team 5 Team 6

Visual results for center 01 Consensus Team 7 Team 8 Team 9 Team 10 Team 11 Team 12 Team 13

Visual results for center 03 FLAIR Team 1 Team 2 Team 3 Consensus Team 4 Team 5 Team 6

Visual results for center 03 Consensus Team 7 Team 8 Team 9 Team 10 Team 11 Team 12 Team 13

Dice measure average per center Evaluated method Center 01 Center 03 Center 07 Center 08 Team 1 0.335 0.421 0.551 0.507 Team 2 0.548 0.416 0.463 0.499 Team 3 0.572 0.215 0.585 0.538 Team 4 0.528 0.409 0.502 0.506 Team 5 0.454 0.361 0.382 0.503 Team 6 0.653 0.568 0.606 0.536 Team 7 0.367 0.249 0.363 Team 8 0.610 0.555 0.591 0.529 Team 9 0.341 0.359 0.317 0.345 Team 10 (*) 0.298 0.204 0.180 0.218 Team 11 0.321 0.404 0.288 0.379 Team 12 0.613 0.426 0.579 0.526 Team 13 0.664 0.188 0.622 Average 0.510 0.386 0.491 0.492

F1 score average per center Evaluated method Center 01 Center 03 Center 07 Center 08 Team 1 0.314 0.286 0.417 0.343 Team 2 0.352 0.172 0.369 0.267 Team 3 0.443 0.070 0.524 0.360 Team 4 0.295 0.266 0.446 0.271 Team 5 0.220 0.102 0.110 0.219 Team 6 0.377 0.404 0.470 0.306 Team 7 0.125 0.060 0.100 0.233 Team 8 0.465 0.474 0.518 0.359 Team 9 0.179 0.147 0.163 Team 10 (*) 0.050 0.062 0.028 0.057 Team 11 0.190 0.088 0.264 0.198 Team 12 0.542 0.280 0.611 0.498 Team 13 0.499 0.059 0.510 0.514 Average 0.336 0.200 0.392 0.322

Surface distance average per center Evaluated method Center 01 Center 03 Center 07 Center 08 Team 1 2.69 1.91 0.34 3.63 Team 2 2.07 0.57 0.85 0.74 Team 3 0.89 4.34 1.00 1.74 Team 4 1.06 1.38 1.37 2.32 Team 5 1.33 3.27 0.95 1.93 Team 6 1.11 1.24 0.48 1.32 Team 7 1.41 1.29 3.18 4.75 Team 8 2.61 0.43 2.28 Team 9 6.49 0.98 8.89 4.13 Team 10 (*) 2.99 1.66 4.06 4.89 Team 11 3.99 1.82 4.65 7.16 Team 12 1.75 0.36 0.20 1.50 Team 13 1.61 11.18 1.08 1.27 Average 2.12 2.40 1.98 2.76

Outline Testing data overview Metrics overview and ranking strategy Outlier case results Scanner-wise results Overall results and ranking Discussion

Computation times (rough estimation) Evaluated method Computation time (on 38 patients) Team 1 1 day 00h14 Team 2 15h37 Team 3 2 days 00h54 Team 4 04h24 Team 5 15h28 Team 6 2 days 05h40 Team 7 10 days 23h19 Team 8 11h17 Team 9 05h40 Team 10 (*) 5 days 18h37 Team 11 2 days 11h14 Team 12 4 days 16h43 Team 13 2 days 01h03

Average results and ranking strategy Ranking performed separately on two measures Dice and F1 score From 37 patients Strategy Rank each method for each patient Compute average rank for each method over all patients Best average rank is the winner

Segmentation ranking: Dice scores Evaluated method Average Dice Average ranking Team 6 0.591 2.97 Team 8 0.572 4.03 Team 12 0.541 4.38 Team 13 0.521 4.95 Team 4 0.490 6.19 Team 2 0.485 6.22 Team 3 0.489 6.35 Team 1 0.453 6.68 Team 5 0.430 7.78 Team 11 0.347 9.11 Team 9 0.340 10.03 Team 7 0.341 10.08 Team 10 0.228 12.24

Detection ranking: F1 scores Evaluated method Average F1 Average ranking Team 12 0.490 2.38 Team 8 0.451 3.81 Team 13 0.410 4.59 Team 6 0.386 4.65 Team 3 0.360 5.14 Team 1 0.341 5.86 Team 2 0.294 6.68 Team 4 0.319 6.84 Team 11 0.188 9.14 Team 5 0.167 9.16 Team 9 0.168 9.76 Team 7 0.134 10.38 Team 10 0.049 12.30

Outline Testing data overview Metrics overview and ranking strategy Outlier case results Scanner-wise results Overall results and ranking Discussion

Results comparison to experts Segmentation performance (Dice) “Best” expert: 0.782 “Worst” expert: 0.669 “Best” pipeline: 0.591 Detection performance (F1 score) “Best” expert: 0.893 “Worst” expert: 0.656 “Best” pipeline: 0.490 All pipelines rank below experts in both categories What about a combination of those?

Statistics on total lesion load (Dice)

Statistics on total lesion load (detection)

Statistics on total lesion load (surface)

Conclusion All results (including all measures) available https://shanoir-challenges.irisa.fr Open to anyone from now on Results still far from experts Mainly on images with low lesion load All methods sensitive to image quality (center 03) A great experience To be continued by a journal article (more detailed)