Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Similar presentations


Presentation on theme: "An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21."— Presentation transcript:

1 An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21 2009

2 Outline Related Work Data Modeling Approach and Results –Similarity Measures –Artificial Neural Network –Multivariate Linear Regression Conclusions Future Work

3 Computer-Aided Diagnosis (CADx) based on low-level image features –Armato et al. developed a linear discriminant classifier using features of lung nodules –Need to find the relationship between the image features and radiologists’ ratings Related Work

4 Image features and the semantic ratings –Lung Interpretations Barb et al. developed Evolutionary System for Semantic Exchange of Information in Collaborative Environments (ESSENCE) Raicu et al. used ensemble classifiers and decision trees to predict semantic ratings Samala et al. used several combinations of image features and the radiologists’ ratings to classify nodules Related Work

5 –Similarity Li et al. investigated four different methods to compute similarity measures for lung nodules –Feature-based –Pixel-value-difference –Cross correlation –ANN Related Work

6 Materials LIDC Dataset 149 Unique Nodules –One slice per nodule, largest nodule area 9 Semantic Characteristics –Calcification and Internal Structure had little variation, thus were not used 64 Content Features –Shape, size, intensity, and texture 6 Data

7 Related Work Data Modeling Approach and Results –Similarity Measures –Artificial Neural Network –Multivariate Linear Regression Conclusions Future Work Outline

8 Cosine Similarity Jeffrey Divergence Euclidean Distance Similarity Measures

9

10

11 Computed feature distance measures Similarity Measures

12 Outline Related Work Data Modeling Approach and Results –Similarity Measures –Artificial Neural Network –Multivariate Linear Regression Conclusions Future Work

13 Two three-layer ANNs –Input (64 neurons), hidden layer (5 neurons), output (1) –Input (64 neurons), hidden layer (5 neurons), output (7) Input = 64 feature distances Output = Semantic similarity or difference in semantic ratings Hyperbolic tangent function, backpropagation algorithm, 200 iterations Methods

14 ANN with a single output –640 random pairs from all 109 nodules –231 pairs from nodules with malignancy > 3 –496 pairs from nodules with area > 122 mm 2 Methods

15 ANN with seven outputs –640 random pairs from all 109 nodules

16 Leave-one-out method –Cosine similarity or Jeffrey divergence or difference in Semantic ratings used as teaching data –An ANN trained with entire dataset minus one image pair –The pair left out used for testing –Correlation between calculated radiologists’ similarity and ANN output calculated Methods

17 ANN with a single output –640 random pairs from all 109 nodules –231 pairs from nodules with malignancy > 3 –496 pairs from nodules with area > 122 mm 2 ANN with seven outputs –640 random pairs from all 109 nodules Methods

18 ANN using 640 random pairs Results

19 ANN using 231 pairs with malignancy rating > 3 Results

20 ANN using 496 pairs with area > 122 mm 2 Results

21 ANN output vs. target values using Jeffrey divergence for the 640 pairs (r = 0.438) Results

22 ANN using random 640 pairs and the Jeffrey divergence with seven semantic ratings Results

23 Outline Related Work Data Modeling Approach and Results –Similarity Measures –Artificial Neural Network –Multivariate Linear Regression Conclusions Future Work

24 Methods Normalization of Features –Min-Max Technique –Z-Score Technique Pair Selection –Looked for matches between k number of most similar images based on semantic and content 24 Methods

25 Multivariate Regression Analysis –Select features with highest correlation coefficients –Feature distance measures 25 Methods

26 Nodule Analysis –Determine differences between selected and non-selected nodules –Define requirements for our model Methods

27 Results 27 Results

28 d(i, j)d 2 (i, j)exp(d(i, j)) Cosine0.8710.8490.866 Jeffrey0.6470.6330.608

29 Results Correlation CoefficientFeature 0.1175Equivalent Diameter 0.1085Energy (Haralick) 0.0823Gabor Mean 135_05 0.0647Convex Area 0.0467Gabor STD 135_04 0.0322Min Intensity BG 0.0295Markov 4 0.0280Variance (Haralick) 0.0265Gabor STD 45_05 0.0238SD Intensity R 2 = 0.871 29 Results

30 30 Results

31 31 Results

32 32 Results

33 A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety

34 Preliminary Issues The ANN also is not yet sufficient to predict semantic similarity from content –Best correlation 0.438 –Malignancy correlation 0.521 –Jeffrey performed better unlike linear model A semantic gap still exists Conclusions

35 Our linear model applies to a specific type of nodule –Characteristics: High malignancy, high texture, low lobulation, and low spiculation –Features: Larger diameter, greater intensity Linear models are not sufficient for determination of similarities –R 2 of 0.871 with chosen nodules 35 Conclusions

36 Future Work Reduce variability among radiologists –Use only nodules with radiologists’ agreement Find best combination of content features –64 may be too many –Currently only using 2D Future Work

37 Different semantic distance measures –Some ratings are ordinal, Jeffery is for categorical Different methods of machine learning –Incorporate radiologists’ feedback into training –Ensemble of classifiers Future Work

38 Thanks for Listening Any Questions? 38 Thanks for Listening


Download ppt "An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21."

Similar presentations


Ads by Google