Applications of Machine Learning to Medical Imaging

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

The Lung Image Database Consortium (LIDC) Data Collection Process This presentation based on the RSNA 2004 InfoRAD theater presentation titled “The Lung.

Neuroradiology Natasha Wehrli, MS4 University of Pennsylvania School of Medicine.

Department of Biomedical Informatics 1 APIII Slide 1 Use of a ‘Mathematical Microscope’ to Understand Radiologists’ Errors in Breast Cancer Detection Claudia.

Computer Aided Diagnosis: CAD overview

· Information gathering · Data analysis · Decision making · “ Human life is too important to be left to a computer “ Patients receive the best treatment.

16 November 2004Biomedical Imaging BMEN Biomedical Imaging of the Future Alvin T. Yeh Department of Biomedical Engineering Texas A&M University.

WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.

Outline Introduction Anotation Segmentation Detection.

A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and Feature Extraction Techniques By Mohammed Jirari Shanghai,

Texture-Based Image Retrieval for Computerized Tomography Databases Winnie Tsang, Andrew Corboy, Ken Lee, Daniela Raicu and Jacob Furst.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

MedIX – Summer 06 Lucia Dettori (room 745)

NSF REU Program in Medical Informatics 1 D. Raicu, 1 J. Furst, 2 D. Channin, 3 S. Armato, and 3 K. Suzuki 1 DePaul University, 2 Northwestern University,

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

A study on the effect of imaging acquisition parameters on lung nodule image interpretation Presenters: Shirley Yu (University of Southern California)

Medical Imaging Projects Daniela S. Raicu, PhD Assistant Professor Lab URL:

Evaluating Hypotheses

NSF MedIX REU Program Medical Imaging DePaul CDM Daniela S. Raicu, PhD Associate Professor Lab URL:

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

PROJECT 1: Voronoi Probability Maps for Seed Region Detection in Abdominal CT Images PROJECT 2: Kidney Seed Region Detection in Abdominal CT Images.

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Evaluating the Quality of Image Synthesis and Analysis Techniques Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.

Presented by Zeehasham Rasheed

Texture-based Deformable Snake Segmentation of the Liver Aaron Mintz Daniela Stan Raicu, PhD Jacob Furst, PhD.

Applications of Machine Learning to Medical Informatics Daniela S. Raicu, PhD Assistant Professor Lab URL:

February 13, 1997CWU B.Kovalerchuk1 DESIGN OF CONSISTENT SYSTEM FOR RADIOLOGISTS TO SUPPORT BREAST CANCER DIAGNOSIS.

SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,

Medical Informatics Basics

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.

Data Mining Techniques

1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.

ENDA MOLLOY, ELECTRONIC ENG. FINAL PRESENTATION, 31/03/09. Automated Image Analysis Techniques for Screening of Mammography Images.

Dr. Yaseen Hayajneh Radiology Services Yaseen Hayajneh RN, MPH, PhD.

Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.

Multimedia Databases (MMDB)

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

Medical Informatics Basics

Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.

Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Image Analysis for Neuroblastoma Classification: Hysteresis Thresholding for Nuclei Segmentation Metin Gurcan 1, PhD Tony Pan 1, MS Hiro Shimada 2, MD,

I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

MedIX – Summer 07 Lucia Dettori (room 745)

Radiology started with simple traditional x-ray technology.

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

3D Digital Cleansing Using Segmentation Rays Authors: Sarang Lakare, Ming Wan, Mie Sato and Arie Kaufman Source: In Proceedings of the IEEE Visualization.

AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University Automatic 3D Image Segmentation of Internal Lung Structures.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Information Extraction for Clinical Data Mining: A Mammography Case Study H. Nassif, R. Woods, E. Burnside, M. Ayvaci, J. Shavlik and D. Page University.

Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.

TUMOR BURDEN ANALYSIS ON CT BY AUTOMATED LIVER AND TUMOR SEGMENTATION RAMSHEEJA.RR Roll : No 19 Guide SREERAJ.R ( Head Of Department, CSE)

M5L: servizio di analisi on demand di CT polmonari.

Applications in Medical Imaging

Texture Classification of Normal Tissues in Computed Tomography

MedIX Site: Medical Informatics

Texture Analysis for Pulmonary Nodules Interpretation and Retrieval

Texture Classification of Normal Tissues in Computed Tomography

Visual Computing CTI, DePaul University

A Similarity Retrieval System for Multimodal Functional Brain Images

Computerized Decision Support for Medical Imaging

Authors: C. Shyu, C.Brodley, A. Kak, A. Kosaka, A. Aisen, L. Broderick

Submitted By : Pratish Singh Kuldeep Choudhary Chinmay Panchal

Daniela Raicu, Assistant Professor DePaul University, Chicago

Multiple Organ detection in CT Volumes - Week 3

Presentation transcript:

Applications of Machine Learning to Medical Imaging Daniela S. Raicu, PhD Associate Professor, CDM DePaul University Email: draicu@cs.depaul.edu Lab URL: http://facweb.cs.depaul.edu/research/vc/

About me… BS in Mathematics from University of Bucharest, Romania MS in CS from Wayne State University, Michigan PhD in CS from Oakland University, Michigan

My dissertation work Research areas: Data Mining & Computer Vision Dissertation topic: Content-based image retrieval Research hypothesis: “A picture is worth thousands of words…” “There is enough information in the image content to perform image retrieval whose similarity results correspond to the human perceived similarity”.

My dissertation work (cont) Research hypothesis: “There is enough information in the image content to perform image retrieval whose similarity results correspond to the human perceived similarity”. Methodology: 1) extract color image features, 2) define color-based similarity, 3) cluster images based on color, 4) retrieve similar images Output: Color-based CBIR for general purpose image datasets Proof of hypothesis: Google similar images: http://similar-images.googlelabs.com/

Towards an academic career Assistant Professor at DePaul, 2002-2008 Associate Professor, 2008- Present Teaching areas & research interests: data analysis, data mining, image processing, computer vision & medical informatics Co-director of the Intelligent Multimedia Processing, Medical Informatics lab & the NSF REU Program in Medical Informatics

Outline Part I: Introduction to Medical Informatics Clinical Decision Making Imaging Modalities and Medical Imaging Basic Concepts in Image Processing Part II: Advances in Medical Imaging Research Computer-Aided Diagnosis Computer-Aided Diagnostic Characterization Texture-based Classification Content-based Image Retrieval

Medical informatics research What is medical informatics? Medical informatics is the application of computers, communications and information technology and systems to all fields of medicine - medical care - medical education - medical research. MF Collen, MEDINFO '80, Tokyo

What is medical informatics? Medical informatics is the branch of science concerned with the use of computers and communication technology to acquire, store, analyze, communicate, and display medical information and knowledge to facilitate understanding and improve the accuracy, timeliness, and reliability of decision-making. Warner, Sorenson and Bouhaddou, Knowledge Engineering in Health Informatics, 1997

Clinical decision making Making sound clinical decisions requires: – right information, right time, right format Clinicians face a surplus of information – ambiguous, incomplete, or poorly organized Rising tide of information – Expanding knowledge sources 40K new biomedical articles per month Publicly accessible online health info Hundreds of pictures per scan for one patient

Clinical decision making: What is the problem? Man is an imperfect data processor – We are sensitive to the quantity and organization of information Army officers and pilots commit ‘fatal errors’ when given too many, too few, or poorly organized data The same is true for clinicians who ‘watch’ for events Clinicians are particularly susceptible to errors of omission

Clinical decision making: What is the problem? Humans are “non-perfectable” data processors - Better performance requires more time to process - Irony • Clinicians increasingly face productivity expectations • Clinicians face increasing administrative tasks

Subdomains of medical informatics (by Wikipedia) imaging informatics clinical informatics nursing informatics consumer health informatics public health informatics dental informatics clinical research informatics bioinformatics pharmacy informatics

What is medical imaging (MI)? The study of medical imaging is concerned with the interaction of all forms of radiation with tissue and the development of appropriate technology to extract clinically useful information (usually displayed in an image format) from observation of this technology. Sources of Images: Structural/anatomical information (CT, MRI, US) - within each elemental volume, tissue-differentiating properties are measured. Information about function (PET, SPECT, fMRI).

Examples of medical images

The imaging “chain” Raw data Reconstruction Filtering “Raw data” Signal acquisition Processing Analysis 123…………… 2346………….. 65789………… 6578………….. Quantitative output

Image analysis: Turning an image into data User extracted qualitative features User extracted quantitative features Semi automated Automated Exam Level: Feature 1 Feature 2 Feature 3 . Finding: Feature 1

Major advances in medical imaging Image Segmentation Image Classification Computer-Aided Diagnosis Systems Computer-Aided Diagnostic Characterization Content-based Image Retrieval Image Annotation These major advances can play a major role in early detection, diagnosis, and computerized treatment planning in cancer radiation therapy.

Computer-Aided Diagnosis Computed Aided Diagnosis (CAD) is diagnosis made by a radiologist when the output of computerized image analysis methods has been incorporated into his or her medical decision-making process. CAD may be interpreted broadly to incorporate both the detection of the abnormality task and the classification task: likelihood that the abnormality represents a malignancy Classification, comparison, or analysis of images is performed almost always in terms of a set of features extracted from the images. Usual this is necessary for one of the following reasons: Reduction of dimensionality: an 8-bit per pixel image of size 256x256 pixels has 25665,536 =10157,826 possible realisations. Clearly, it is worth –while to express structure within and similarities between images in ways that depends on fewer, higher-level representations of their pixels and relationship. It will important to show that the reduction nevertheless preserves information important to the task. Incorporation of cues from human perception. Much is known about the effects of basic stimuli on the visual system. In many situations, we have considerable insight into how humans analyse images (essential in the training of radiologist and photo interpreters). Use of the right kinds of features would allow for the incorporation of that experience into automated analysis. Transcend the limit of human perception. Though we can very easily understand many kinds of images, there are properties (e.g. some textures) of images that we cannot perceive visually, but which could be useful in characterising them. Features can be constructed from various manipulations of the images that make those properties evident. Need for invariance. The meaning and the utility of an image are often unchanged when the image is perturbed in various way. Changes in one or more of scale, location, brightness and orientation for example and the presence of noise, artefacts and intrinsic variation are image alteration to which well-designed featured are wholly or partially invariant.

Motivation for CAD systems The amount of image data acquired during a CT scan is becoming overwhelming for human vision and the overload of image data for interpretation may result in oversight errors. Computed Aided Diagnosis for: Breast Cancer Lung Cancer A thoracic CT scan generates about 240 section images for radiologists to interpret. Colon Cancer CT colonography (virtual colonoscopy) is being examined as a potential screening device (400-700 images) Classification, comparison, or analysis of images is performed almost always in terms of a set of features extracted from the images. Usual this is necessary for one of the following reasons: Reduction of dimensionality: an 8-bit per pixel image of size 256x256 pixels has 25665,536 =10157,826 possible realisations. Clearly, it is worth –while to express structure within and similarities between images in ways that depends on fewer, higher-level representations of their pixels and relationship. It will important to show that the reduction nevertheless preserves information important to the task. Incorporation of cues from human perception. Much is known about the effects of basic stimuli on the visual system. In many situations, we have considerable insight into how humans analyse images (essential in the training of radiologist and photo interpreters). Use of the right kinds of features would allow for the incorporation of that experience into automated analysis. Transcend the limit of human perception. Though we can very easily understand many kinds of images, there are properties (e.g. some textures) of images that we cannot perceive visually, but which could be useful in characterising them. Features can be constructed from various manipulations of the images that make those properties evident. Need for invariance. The meaning and the utility of an image are often unchanged when the image is perturbed in various way. Changes in one or more of scale, location, brightness and orientation for example and the presence of noise, artefacts and intrinsic variation are image alteration to which well-designed featured are wholly or partially invariant.

CAD for Breast Cancer A mammogram is an X-ray of breast tissue used as a screening tool searching for cancer when there are no symptoms of anything being wrong. A mammogram detects lumps, changes in breast tissue or calcifications when they're too small to be found in a physical exam. Abnormal tissue shows up a dense white on mammograms. The left scan shows a normal breast while the right one shows malignant calcifications.

CAD for Lung Cancer Identification of lung nodules in thoracic CT scan; the identification is complicated by the blood vessels Once a nodule has been detected, it may be quantitatively analyzed as follows: The classification of the nodule as benign or malignant The evaluation of the temporal size in the nodule size.

CAD for Colon Cancer Virtual colonoscopy (CT colonography) is a minimally invasive imaging technique that combines volumetrically acquired helical CT data with advanced graphical software to create two and three-dimensional views of the colon. Three-dimensional endoluminal view of the colon showing the appearance of normal haustral folds and a small rounded polyp.

Role of Image Analysis & Machine Learning for CAD An overall scheme for computed aided diagnosis systems

SoC Medical imaging research projects 1. Computer-aided characterization for lung nodules Goal: establish the link between computer-based image features of lung nodules in CT scans and visual descriptors defined by human experts (semantic concepts) for automatic interpretation of lung nodules Example: This lung nodule has a “solid” texture and has a “sharp” margin

Why computer-aided characterization? Reader 1 Reader 2 Reader 3 Reader 4 Lobulation=4 Malignancy=5 “highly suspicious” Sphericity=2 Lobulation=1 “marked” Malignancy=5 “highly suspicious” Sphericity=4 Lobulation=2 Malignancy=5 “highly suspicious” Sphericity=5 “round” Lobulation=5 “none” Malignancy=5 “highly suspicious” Sphericity=3 “ovoid” Show how outlines can also be different. Explain that for the same nodule, slices with biggest nodule can be different for different radiologists. Start talking about calculating image features of a nodule, go to the next slide. Ratings and Boundaries across radiologists are different!!! 25

Computer-aided characterization Research Hypothesis “The working hypothesis is that certain radiologists’ assessments can be mapped to the most important low-level image features”. Methodology new semi-supervised probabilistic learning approaches that will deal with both the inter-observer variability and the small set of labeled data (annotated lung nodules). Our proposed learning approach will be based on an ensemble of classifiers (instead of a single classifier as with most CAD systems) built to emulate the LIDC ensemble (panel) of radiologists.

Computer-aided characterization (cont.) Expected outcome: an optimal set of quantitative diagnostic features linked to the visual descriptors (semantic concepts). Significance: The derived mappings can serve to show the computer interpretation of the corresponding radiologist rating in terms of a set of standard and objective image features, automatically annotate new images, and augment the lung nodule retrieval results with their probabilistic diagnostic interpretations. 27

Computer-aided characterization Preliminary results NIH Lung Image Database Consortium (LIDC): 149 distinct nodules from about 85 cases/patients; four radiologists marked the nodules using 9 semantic characteristics on a scale from 1 to 5 except for calcification (1 to 6) and internal structure (1 to 4)

LIDC high level concepts & ratings Computer-aided characterization LIDC high level concepts & ratings Characteristic Possible Scores Margin 1. Poorly Defined 2. . 3. . 4. . 5. Sharp Sphericity 1. Linear 2. . 3. Ovoid 4. . 5. Round Spiculation 1. Marked 5. None Subtlety 1. Extremely Subtle 2. Moderately Subtle 3. Fairly Subtle 4. Moderately Obvious 5. Obvious Texture 1. Non-Solid 3. Part Solid/(Mixed) 5. Solid Characteristic Possible Scores Calcification 1. Popcorn 2. Laminated 3. Solid 4. Non-central 5. Central 6. Absent Internal structure 1. Soft Tissue 2. Fluid 3. Fat 4. Air Lobulation 1. Marked 2. . 3. . 4. . 5. None Malignancy 1. Highly Unlikely 2. Moderately Unlikely 3. Indeterminate 4. Moderately Suspicious 5. Highly Suspicious Talk more about interpretation (and interpretation variability) of separate semantic characteristics and move to the next two slides to show a specific example. 29

Computer-aided characterization Low-level image features Shape Features Size Features Intensity Features Texture Features Circularity Area MinIntensity 11 Haralick features calculated from co-occurrence matrices Roughness ConvexArea Maxintensity 24 Gabor features Elongation Perimeter SDIntensity 5 Markov Random Field features Compactness ConvexPerimeter MinIntensityBG Eccentricity EquivDiameter MaxIntensityBG Solidity MajorAxisLength MeanIntensityBG Extent MinorAxisLength SDIntensityBG RadialDistanceSD IntensityDifference Describe 4 types of features used in a study. Explain how features are mapped to the semantic characteristics. Describe vector representation of a nodule after mapping is done {c1…c7, f1…f64} as input for automatic interpretation algorithm. 30

Computer-aided characterization Accuracy results Characteristics Decision trees Add instances predicted with high confidence (60%) Add instances predicted with high confidence (60%) and instances with low margin (5%) Lobulation 27.44% 81.00% 69.66% Malignancy 42.22% 96.31% Margin 35.36% 98.68% 96.83% Sphericity 36.15% 91.03% 90.24% Spiculation 63.06% 58.84% Subtlety 38.79% 93.14% 92.88% Texture 53.56% 97.10% 97.36% Average 38.52% 88.62% 86.02% Present the results. Show that both approaches improved the accuracy for all semantic characteristics in comparison with the decision trees. Mention that difference in accuracies between two approaches are not significant except for lobulation. Depending on how much time will be left talk about further work (what we are doing right now) either show and explain the next slide or list what we have tried to do. 31

Computer-aided characterization Challenges Small number of training samples and large number of features “curse of dimensionality” problem Nodule size Variation in the nodules’ boundaries Different types of imaging acquisition parameters Clinical evaluation: observer performance studies require collaboration with medical schools or hospitals

SoC Medical imaging research projects - 2. Texture-based Pixel Classification - tissue segmentation - context-sensitive tools for radiology reporting Pixel Level Texture Extraction Pixel Level Classification Organ Segmentation

Texture-based Pixel Classification Texture Feature extraction: consider texture around the pixel of interest. Capture texture characteristic based on estimation of joint conditional probability of pixel pair occurrences Pij(d,θ). Pij denotes the normalized co-occurrence matrix of specify by displacement vector (d) and angle (θ). Neighborhood of a pixel

Haralick Texture Features

Haralick Texture Features

Examples of Texture Images Texture images: original image, energy and cluster tendency, respectively. M. Kalinin, D. S. Raicu, J. D. Furst, D. S. Channin,, " A Classification Approach for Anatomical Regions Segmentation", The IEEE International Conference on Image Processing (ICIP), Genoa, Italy, September 11-14, 2005.

Texture Classification of Tissues in CT Chest/Abdomen Example of Liver Segmentation: (J.D. Furst, R. Susomboon, and D.S. Raicu, "Single Organ Segmentation Filters for Multiple Organ Segmentation", IEEE 2006 International Conference of the Engineering in Medicine and Biology Society (EMBS'06)) Original Image Initial Seed at 90% Split & Merge at 85% Split & Merge at 80% Region growing at 70% Region growing at 60% Segmentation Result

Classification models: challenges (a) Optimal selection of an adequate set of textural features is a challenge, especially with the limited data we often have to deal with in clinical problems. Consequently, the effectiveness of any classification system will always be conditional on two things: (i) how well the selected features describe the tissues (ii) how well the study group reflects the overall target patient population for the corresponding diagnosis

Classification models: challenges (b) how other type of information can be incorporated into the classification models: - metadata - image features from other imaging modalities (need of image fusion) (c) how stable and general the classification models are

Content-based medical image retrieval (CBMS) systems Definition of Content-based Image Retrieval: Content-based image retrieval is a technique for retrieving images on the basis of automatically derived image features such as texture and shape. Applications of Content-based Image Retrieval: Teaching Research Diagnosis PACS and Electronic Patient Records

Diagram of a CBIR Query Results Image Features [D1, D2,…Dn] Feature Extraction Similarity Retrieval Image Features [D1, D2,…Dn] Image Database Query Image Query Results Feedback Algorithm User Evaluation Diagram of a CBIR http://viper.unige.ch/~muellerh/demoCLEFmed/index.php

CBIR as a Diagnosis Aid An image retrieval system can help when the diagnosis depends strongly on direct visual properties of images in the context of evidence-based medicine or case-based reasoning.

CBIR as a Teaching Tool An image retrieval system will allow students/teachers to browse available data themselves in an easy and straightforward fashion by clicking on “show me similar images”. Advantages: - stimulate self-learning and a comparison of similar cases - find optimal cases for teaching Teaching files: Casimage: http://www.casimage.com myPACS: http://www.mypacs.net

CBIR as a Research Tool Image retrieval systems can be used: to complement text-based retrieval methods for visual knowledge management whereby the images and associated textual data can be analyzed together multimedia data mining can be applied to learn the unknown links between visual features and diagnosis or other patient information for quality control to find images that might have been misclassified

CBIR as a tool for lookup and reference in CT chest/abdomen Case Study: lung nodules retrieval Lung Imaging Database Resource for Imaging Research http://imaging.cancer.gov/programsandresources/Inf ormationSystems/LIDC/page7 29 cases, 5,756 DICOM images/slices, 1,143 nodule images 4 radiologists annotated the images using 9 nodule characteristics: calcification, internal structure, lobulation, malignancy, margin, sphericity, spiculation, subtlety, and texture Goals: Retrieve nodules based on image features: Texture, Shape, and Size Find the correlations between the image features and the radiologists’ annotations

Choose a nodule

Choose an image feature& a similarity measure M. Lam, T. Disney, M. Pham, D. Raicu, J. Furst, “Content-Based Image Retrieval for Pulmonary Computed Tomography Nodule Images”, SPIE Medical Imaging Conference, San Diego, CA, February 2007

Retrieved Images

CBIR systems: challenges Type of features image features: - texture features: statistical, structural, model and filter-based - shape features textual features (such as physician annotations) Similarity measures -point-based and distribution based metrics Retrieval performance: precision and recall clinical evaluation

uestions ?