NSF REU Program in Medical Informatics 1 D. Raicu, 1 J. Furst, 2 D. Channin, 3 S. Armato, and 3 K. Suzuki 1 DePaul University, 2 Northwestern University,

Slides:

Advertisements

Similar presentations

Widening the Research Pipeline Update to NSF/CISE BPC Evaluation Workshop December 7, 2006.

Advertisements

Integration of Radiologists Feedback into Computer-Aided Diagnosis Systems Sarah A. Jabon a Daniela S. Raicu b Jacob D. Furst b a Rose-Hulman Institute.

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.

NCKU CSIE Visualization & Layout for Image Libraries Baback Moghaddam, Qi Tian IEEE Int’l Conf. on CVPR 2001 Speaker: 蘇琬婷.

A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and Feature Extraction Techniques By Mohammed Jirari Shanghai,

Texture-Based Image Retrieval for Computerized Tomography Databases Winnie Tsang, Andrew Corboy, Ken Lee, Daniela Raicu and Jacob Furst.

Three-dimensional co-occurrence matrices & Gabor filters: Current progress Gray-level co-occurrence matrices Carl Philips Gabor filters Daniel Li Supervisor:

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.

A study on the effect of imaging acquisition parameters on lung nodule image interpretation Presenters: Shirley Yu (University of Southern California)

Medical Imaging Projects Daniela S. Raicu, PhD Assistant Professor Lab URL:

NSF MedIX REU Program Medical Imaging DePaul CDM Daniela S. Raicu, PhD Associate Professor Lab URL:

PROJECT 1: Voronoi Probability Maps for Seed Region Detection in Abdominal CT Images PROJECT 2: Kidney Seed Region Detection in Abdominal CT Images.

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:

R U There? Looking for those Teaching Moments in Chat Transcripts Frances Devlin, John Stratton and Lea Currie University of Kansas ALA Annual Conference.

REGRESSION AND CORRELATION

Texture-based Deformable Snake Segmentation of the Liver Aaron Mintz Daniela Stan Raicu, PhD Jacob Furst, PhD.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Applications of Machine Learning to Medical Informatics Daniela S. Raicu, PhD Assistant Professor Lab URL:

Introduction Project goal was to develop simple way to characterize level of access to journal literature in physical sciences and engineering provided.

Relationships Among Variables

Ulrich Bick, MD Maryellen L. Giger PhD Robert A. Schmidt, MD Robert M. Nishikawa, PhD Kunio Doi, PhD 1 報告者：劉治元.

Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.

IDR Snapshot: Quantitative Assessment Methodology Evaluating Size and Comprehensiveness of an Integrated Data Repository Vojtech Huser, MD, PhD a James.

Data Mining Techniques

Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.

Content-based Retrieval of 3D Medical Images Y. Qian, X. Gao, M. Loomes, R. Comley, B. Barn School of Engineering and Information Sciences Middlesex University,

1 Stuart West Content-Based Information Retrieval (CBIR) in Images The Applications and the Real World Uses.

Measuring shape complexity of breast lesions on ultrasound images Wei Yang, Su Zhang, Yazhu Chen Dept. of Biomedical Engineering, Shanghai Jiao Tong Univ.,

The Scientific Method Honors Biology Laboratory Skills.

CS 6825: Binary Image Processing – binary blob metrics

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.

Chapter 21 Basic Statistics.

Examining Relationships in Quantitative Research

Understanding The Semantics of Media Chapter 8 Camilo A. Celis.

Relevance Feedback: New Trends Derive global optimization methods: More computationally robust Consider the correlation between different attributes Incorporate.

References: [1]S.M. Smith et al. (2004) Advances in functional and structural MR image analysis and implementation in FSL. Neuroimage 23: [2]S.M.

A way to integrate IR and Academic activities to enhance institutional effectiveness. Introduction The University of Alabama (State of Alabama, USA) was.

Content-Based Image Retrieval Using Fuzzy Cognition Concepts Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung University.

Topic: Quadratics and Complex Numbers Grade: 10 Key Learning(s): Analyzes the graphs of and solves quadratic equations and inequalities by factoring, taking.

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

1 A Compact Feature Representation and Image Indexing in Content- Based Image Retrieval A presentation by Gita Das PhD Candidate 29 Nov 2005 Supervisor:

Linear Regression Day 1 – (pg )

Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.

1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.

Chapter 3: Describing Relationships

CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.

Constructing a Predictor to Identify Drug and Adverse Event Pairs

MATH-138 Elementary Statistics

In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania

Lecture 2-2 Data Exploration: Understanding Data

Assessing Students' Understanding of the Scientific Process Amy Marion, Department of Biology, New Mexico State University Abstract The primary goal of.

SIMPLE LINEAR REGRESSION MODEL

Texture Classification of Normal Tissues in Computed Tomography

Brain Hemorrhage Detection and Classification Steps

MedIX Site: Medical Informatics

Texture Analysis for Pulmonary Nodules Interpretation and Retrieval

Texture Classification of Normal Tissues in Computed Tomography

Amber Settle, DePaul University Co-PI: Ljubomir Perkovic

Midterm Exam Closed book, notes, computer Similar to test 1 in format:

Midterm Exam Closed book, notes, computer Similar to test 1 in format:

Daniela Raicu, Assistant Professor DePaul University, Chicago

Presentation transcript:

NSF REU Program in Medical Informatics 1 D. Raicu, 1 J. Furst, 2 D. Channin, 3 S. Armato, and 3 K. Suzuki 1 DePaul University, 2 Northwestern University, and University of Chicago REU Data Overview  Goal: continue promoting interdisciplinary studies at the frontier between information technology and medicine to undergraduate students - especially students from groups historically underrepresented in exact sciences  Duration: 10 weeks over the summer  Example Teaching: Interdisciplinary tutorials: Image processing, machine learning Technology tools tutorials: MatLab, SPSS Presentations by mentors about projects  Example Activities: Follow-on activities  Bi-weekly group meetings. presentations to entire MedIX group, final reports (in conference formats), seminars to support student publication Special events  Day in the life of a PhD student”, “Developing a research career”, “Women in science”, Tours of medical facilities, etc  Unique Site Aspect: Multi-institution & multi-disciplinary the frontier between computer science & medicine Outcomes ( )  88% students had at least one research publication  over 23 publications (1 journal paper, 15 conference papers, 8 extended abstracts)  3 honor theses & senior projects, 4 graduate fellowships, and 1 CRA) honor mention for outstanding undergraduate research Statistics ( )  Students demographics: 8 per year  Female: 46 %; First generation college: 15%; Outside of home institutions: 73%  Previously presenting a visual (poster) research presentation (31%) or an oral research presentation (27%), (co-) authored a publication in an academic journal (12%), or in the previous two years been involved in any research projects (42%).  Total number of Faculty mentors: 4  Years of operation: 2005 to 2010  Example Research topics: see on the left side Introduction This work thoroughly investigates ways to predict the results of a semantic-based image retrieval system by using solely content-based image features. We extend our previous work 1 by studying the relationships between the two types of retrieval, content- based and semantic-based, with the final goal of integrating them into a system that will take advantage of both retrieval approaches. Our results on the Lung Image Database Consortium (LIDC) dataset show that a substantial number of nodules identified as similar based on image features are also identified as similar based on semantic characteristics. Furthermore, by integrating the two types of features, the similarity retrieval improves with respect to certain nodule characteristics. Methodology Computation to best represent semantic- based similarity values using only content-based features. The goal is to find similar nodules to make a better diagnosis of the query. Content-based image retrieval is the goal, as that would involve little human interaction on very large data sets. The 149 CT scans - one of each nodule - are from the Lung Imaging Database Consortium (LIDC). Results improve usefulness of content- based image retrieval system greatly. Up to four radiologists rated the nodules on 9 distinct features. Only 7 features varied enough to incorporate, which are rated on a scale of 1 to 5. The radiologist compares similar nodules to aid in his diagnosis. Often, comparing similar nodules can lead to a more certain diagnosis. 3 Figure 1 — Methodology The LIDC contains complete thoracic CT scans for 85 patients with lesions. Nodules with a diameter larger than three millimeters were rated by a panel of four radiologists. 2 They rated 9 characteristics of the nodules the masses that they considered nodules. Seven of those characteristics are useful to our analysis, which were all on a scale of one to five: Lobulation, Malignancy, Margin, Sphericity, Spiculation, Subtlety, and Texture For each image, we calculated 64 different content-based features 1 :  Shape Features: circularity, roughness, elongation, compactness, eccentricity, solidity, extent, and standard deviation of radial distance  Size Features: area, convex area, perimeter, convex perimeter, equivalence diameter, major axis length, and minor axis length  Gray-Level Intensity Features: minimum, maximum, mean, standard deviation, and difference  Texture Features based on co-occurrence matrices, Gabor filters, and Markov random fields Content-based versus Semantic-based Similarity Retrieval: A LIDC Case Study Sarah Jabon a, Jacob Furst b, Daniela Raicu b a Rose-Hulman Institute of Technology, Terre Haute, IN 47803, b Intelligent Multimedia Processing Laboratory, School of Computer Science, Telecommunications, and Information Systems, DePaul University, Chicago, IL, USA, Using k Number of Matches The number of nodules that had matches was relatively consistent throughout all image features, but slightly higher for Gabor and Markov. No combination of image features had more than 10 matches out of the twenty most similar. Below is a scatter plot of the content-based similarity versus the semantic-based similarity value. [1] Lam, M., Disney, T., Pham, M., Raicu, D., Furst, J., “Content-BasedRetrievalComputed Tomography Nodule Images”, SPIE Medical Imaging Conference, San Diego, CA, February [2] The National Cancer Institute, “Lung Imaging Database Consortium (LIDC), [3] Li, Q., Li, F., Shiraishi, J., Katsuragwa, S., Sone, S., Doi, K., “Investigation of New Psychophysical Measures for Evaluation of Similar Images on Thoracic Computed Tomography for Distinction between Benign and Malignant Nodules”, Medical Physics 30: , [4] Han, J., Kamber, M., [Data Mining: Concepts and Techniques], London: Academic P, Image Data Calculating SimilaritySimilarity Comparisons In order to assess the correlation between the two similarity measures, we used a round robin approach where we extracted one nodule as a query and compared it to the remaining 148 nodules. We took the k most similar values from each query’s semantic-based similarity ordered list and content-based similarity ordered list and counted how many nodules were common to both lists. Here is an example with nodule 117 as the query nodule. Below are the most similar nodules listed with their attributes. Notice that the semantic similarity values have a much smaller range— from 0 to about 0.3, whereas the content-based similarities range from 0 to 1. Most of the semantic features are very similar. A ranking of i signifies that nodule was the i th most similar nodule in the list of similar nodules based on the appropriate feature set. Analysis References Conclusions Our preliminary results show that a substantial number of nodules identified as similar based on image features are also identified as similar based on semantic characteristics and therefore, the image features capture properties that radiologists look at when interpreting lung nodules. There are many similarity metrics that can be used to try to correlate the two retrieval systems. We found the Euclidean distance to be better for the content–based features and the cosine similarity measure to be best for the semantic-based characteristics. In our future work, we will try principle component analysis and linear regression on the data. Further research is necessary to investigate further the correlations between the two types of features and integrate them in one retrieval system that will be of clinical use. Rad. Lob. Mal. Marg. Spher. Spic. Subt. Text. A B C D Summarized: Figure 2 — Sample CT Scan with Four Radiologists’ Ratings Semantic-Based Features Content-Based Features At right is a histogram of the content-based similarity values for all 11,026 nodule pairs. The similarity values are calculated with the Euclidean distance, which is defined below, and then min-max normalization is applied. 4 At the end of the feature extraction process, each nodule is represented by a vector as shown below, where c stands for a semantic concept and f for a image feature. Figure 4 — Histogram of Content- Based Similarity Figure 3 — Histogram of Semantic- Based Similarity The cosine similarity measure minimized the ceiling effect. The similarity value calculation using the cosine formula is shown below. The histogram to the right is of the semantic-based similarity values for all 11,026 nodule pairs. Although the values do not represent a perfect normal curve, the ceiling effect was drastically improved from performing a simple distance on the seven characteristics. Query Nodule (Q): Database Nodule (N): No. Image Semantic-Based Content-Based Semantic Feature Vector Ranking Similarity Value Ranking Similarity Value Lob Mal Mar Sph Spic Sub Tex 117 Figure 5 — Example of Image Retrieval Results Applying a Threshold We analyzed the difference in the scales of similarity by seeing how many matches there were based on thresholds. Below is a graph of two different thresholds of similarity—0.02 and These thresholds are applied to the semantic similarity values. There were many more matches within these thresholds. Matches Gabor Markov Co-Occurrence Gabor, Markov, and Co-Occurrence All Features 6 – – – Figure 6 — Match Count in 20 Most Similar Nodules Figure 7 — Content-Based Similarity vs. Semantic-Based Similarity Figure 6 — Match Based on All Features and Thresholds