Download presentation
Presentation is loading. Please wait.
1
NSF REU Program in Medical Informatics 1 D. Raicu, 1 J. Furst, 2 D. Channin, 3 S. Armato, and 3 K. Suzuki 1 DePaul University, 2 Northwestern University, and University of Chicago REU Data Overview Goal: continue promoting interdisciplinary studies at the frontier between information technology and medicine to undergraduate students - especially students from groups historically underrepresented in exact sciences Duration: 10 weeks over the summer Example Teaching: Interdisciplinary tutorials: Image processing, machine learning Technology tools tutorials: MatLab, SPSS Presentations by mentors about projects Example Activities: Follow-on activities Bi-weekly group meetings. presentations to entire MedIX group, final reports (in conference formats), seminars to support student publication Special events Day in the life of a PhD student”, “Developing a research career”, “Women in science”, Tours of medical facilities, etc Unique Site Aspect: Multi-institution & multi-disciplinary site @ the frontier between computer science & medicine Outcomes (2005-2007) 88% students had at least one research publication over 23 publications (1 journal paper, 15 conference papers, 8 extended abstracts) 3 honor theses & senior projects, 4 graduate fellowships, and 1 CRA) honor mention for outstanding undergraduate research Statistics (2005-2007) Students demographics: 8 per year Female: 46 %; First generation college: 15%; Outside of home institutions: 73% Previously presenting a visual (poster) research presentation (31%) or an oral research presentation (27%), (co-) authored a publication in an academic journal (12%), or in the previous two years been involved in any research projects (42%). Total number of Faculty mentors: 4 Years of operation: 2005 to 2010 Example Research topics: see on the left side Introduction This work thoroughly investigates ways to predict the results of a semantic-based image retrieval system by using solely content-based image features. We extend our previous work 1 by studying the relationships between the two types of retrieval, content- based and semantic-based, with the final goal of integrating them into a system that will take advantage of both retrieval approaches. Our results on the Lung Image Database Consortium (LIDC) dataset show that a substantial number of nodules identified as similar based on image features are also identified as similar based on semantic characteristics. Furthermore, by integrating the two types of features, the similarity retrieval improves with respect to certain nodule characteristics. Methodology Computation to best represent semantic- based similarity values using only content-based features. The goal is to find similar nodules to make a better diagnosis of the query. Content-based image retrieval is the goal, as that would involve little human interaction on very large data sets. The 149 CT scans - one of each nodule - are from the Lung Imaging Database Consortium (LIDC). Results improve usefulness of content- based image retrieval system greatly. Up to four radiologists rated the nodules on 9 distinct features. Only 7 features varied enough to incorporate, which are rated on a scale of 1 to 5. The radiologist compares similar nodules to aid in his diagnosis. Often, comparing similar nodules can lead to a more certain diagnosis. 3 Figure 1 — Methodology The LIDC contains complete thoracic CT scans for 85 patients with lesions. Nodules with a diameter larger than three millimeters were rated by a panel of four radiologists. 2 They rated 9 characteristics of the nodules the masses that they considered nodules. Seven of those characteristics are useful to our analysis, which were all on a scale of one to five: Lobulation, Malignancy, Margin, Sphericity, Spiculation, Subtlety, and Texture For each image, we calculated 64 different content-based features 1 : Shape Features: circularity, roughness, elongation, compactness, eccentricity, solidity, extent, and standard deviation of radial distance Size Features: area, convex area, perimeter, convex perimeter, equivalence diameter, major axis length, and minor axis length Gray-Level Intensity Features: minimum, maximum, mean, standard deviation, and difference Texture Features based on co-occurrence matrices, Gabor filters, and Markov random fields Content-based versus Semantic-based Similarity Retrieval: A LIDC Case Study Sarah Jabon a, Jacob Furst b, Daniela Raicu b a Rose-Hulman Institute of Technology, Terre Haute, IN 47803, b Intelligent Multimedia Processing Laboratory, School of Computer Science, Telecommunications, and Information Systems, DePaul University, Chicago, IL, USA, 60604 Using k Number of Matches The number of nodules that had 2 - 5 matches was relatively consistent throughout all image features, but slightly higher for Gabor and Markov. No combination of image features had more than 10 matches out of the twenty most similar. Below is a scatter plot of the content-based similarity versus the semantic-based similarity value. [1] Lam, M., Disney, T., Pham, M., Raicu, D., Furst, J., “Content-BasedRetrievalComputed Tomography Nodule Images”, SPIE Medical Imaging Conference, San Diego, CA, February 2007. [2] The National Cancer Institute, “Lung Imaging Database Consortium (LIDC), http://imaging.cancer.gov/programsandresources/InformationSystems/LIDC. [3] Li, Q., Li, F., Shiraishi, J., Katsuragwa, S., Sone, S., Doi, K., “Investigation of New Psychophysical Measures for Evaluation of Similar Images on Thoracic Computed Tomography for Distinction between Benign and Malignant Nodules”, Medical Physics 30:2584-2593, 2003. [4] Han, J., Kamber, M., [Data Mining: Concepts and Techniques], London: Academic P, 2001. Image Data Calculating SimilaritySimilarity Comparisons In order to assess the correlation between the two similarity measures, we used a round robin approach where we extracted one nodule as a query and compared it to the remaining 148 nodules. We took the k most similar values from each query’s semantic-based similarity ordered list and content-based similarity ordered list and counted how many nodules were common to both lists. Here is an example with nodule 117 as the query nodule. Below are the most similar nodules listed with their attributes. Notice that the semantic similarity values have a much smaller range— from 0 to about 0.3, whereas the content-based similarities range from 0 to 1. Most of the semantic features are very similar. A ranking of i signifies that nodule was the i th most similar nodule in the list of similar nodules based on the appropriate feature set. Analysis References Conclusions Our preliminary results show that a substantial number of nodules identified as similar based on image features are also identified as similar based on semantic characteristics and therefore, the image features capture properties that radiologists look at when interpreting lung nodules. There are many similarity metrics that can be used to try to correlate the two retrieval systems. We found the Euclidean distance to be better for the content–based features and the cosine similarity measure to be best for the semantic-based characteristics. In our future work, we will try principle component analysis and linear regression on the data. Further research is necessary to investigate further the correlations between the two types of features and integrate them in one retrieval system that will be of clinical use. Rad. Lob. Mal. Marg. Spher. Spic. Subt. Text. A 3 4 2 4 3 4 B 4 3 4 3 5 C 4 2 3 4 3 4 5 D 4 3 2 4 3 4 3 4 3 5 Summarized: Figure 2 — Sample CT Scan with Four Radiologists’ Ratings Semantic-Based Features Content-Based Features At right is a histogram of the content-based similarity values for all 11,026 nodule pairs. The similarity values are calculated with the Euclidean distance, which is defined below, and then min-max normalization is applied. 4 At the end of the feature extraction process, each nodule is represented by a vector as shown below, where c stands for a semantic concept and f for a image feature. Figure 4 — Histogram of Content- Based Similarity Figure 3 — Histogram of Semantic- Based Similarity The cosine similarity measure minimized the ceiling effect. The similarity value calculation using the cosine formula is shown below. The histogram to the right is of the semantic-based similarity values for all 11,026 nodule pairs. Although the values do not represent a perfect normal curve, the ceiling effect was drastically improved from performing a simple distance on the seven characteristics. Query Nodule (Q): Database Nodule (N): No. Image Semantic-Based Content-Based Semantic Feature Vector Ranking Similarity Value Ranking Similarity Value Lob Mal Mar Sph Spic Sub Tex 117  - 0 - 0 2 3 5 2 4 5 104  2 0.004452 5 0.415918 2 3 4 2 3 4 126  3 0.004596 6 0.421249 2 3 5 1 4 5 98  6 0.006817 17 0.505317 2 3 4 5 2 3 5 28  8 0.009119 16 0.504996 1 3 5 1 4 5 27  11 0.012752 2 0.380517 1 3 5 1 3 5 137  14 0.013072 9 0.430289 1 3 4 5 1 3 4 127  16 0.013606 11 0.474226 2 4 5 4 3 4 5 119  17 0.015268 20 0.538589 3 4 2 3 5 90  20 0.016383 7 0.425751 1 2 3 4 1 2 4 Figure 5 — Example of Image Retrieval Results Applying a Threshold We analyzed the difference in the scales of similarity by seeing how many matches there were based on thresholds. Below is a graph of two different thresholds of similarity—0.02 and 0.04. These thresholds are applied to the semantic similarity values. There were many more matches within these thresholds. Matches Gabor Markov Co-Occurrence Gabor, Markov, and Co-Occurrence All Features 6 – 10 24 18 31 36 43 2 – 5 107 104 94 98 93 0 – 1 18 27 24 15 13 Figure 6 — Match Count in 20 Most Similar Nodules Figure 7 — Content-Based Similarity vs. Semantic-Based Similarity Figure 6 — Match Based on All Features and Thresholds
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.