1 Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept.

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Pattern Recognition and Machine Learning
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
An Overview of Machine Learning
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
Principal Component Analysis
Week 9 Data Mining System (Knowledge Data Discovery)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
University of Minnesota
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Decision Support: Data Mining Introduction.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Biomedical Image Analysis and Machine Learning BMI 731 Winter 2005 Kun Huang Department of Biomedical Informatics Ohio State University.
Introduction to Artificial Neural Network and Fuzzy Systems
Chapter 5 Data mining : A Closer Look.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
APPLICATION : DIAGNOSTIC CODING 1 SIEMENS  Coding is the translation of diagnosis terms describing patients diagnosis or treatment into a coded number.
Crash Course on Machine Learning
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Computer Science and Engineering Dept.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Principles of Pattern Recognition
Pattern Recognition & Machine Learning Debrup Chakraborty
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Knowledge Discovery and Data Mining Evgueni Smirnov.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
CSE 3802 / ECE 3431 Numerical Methods in Scientific Computation
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Computational BioMedical Informatics
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Introduction to Data Mining Jinze Liu April 8 th, 2009.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Mining of Massive Datasets Edited based on Leskovec’s from
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Usman Roshan Dept. of Computer Science NJIT
CSE 4705 Artificial Intelligence
Machine Learning for Computer Security
Semi-Supervised Clustering
Data Mining: Introduction
Object Orie’d Data Analysis, Last Time
School of Computer Science & Engineering
Special Topics in Data Mining Applications Focus on: Text Mining
Data Mining: Concepts and Techniques Course Outline
Sangeeta Devadiga CS 157B, Spring 2007
Pattern Recognition and Image Analysis
Data Mining: Introduction
Multivariate Methods Berlin Chen
Presentation transcript:

1 Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept.

2 Course Information l Instructor: Dr. Jinbo Bi –Office: ITEB 233 –Phone: – –Web : –Time: Mon / Wed. 2:00pm – 3:15pm –Location: CAST 204 –Office hours: Mon. 3:30-4:30pm l HuskyCT – –Login with your NetID and password –Illustration

3 Introduction of the instructor l Ph.D in Mathematics l Previous work experience: –Siemens Medical Solutions Inc. –Department of Defense, Bioanalysis –Massachusetts General Hospital l Research Interests subtypingGWAS Color of flowers Cancer, Psychiatric disorders, … du/EasyBreathing

4 Course Information l Prerequisite: Basics of linear algebra, calculus, and basics of programming l Course textbook (not required): –Introduction to Data Mining (2005) by Pang-Ning Tan, Michael Steinbach, Vipin Kumar –Pattern Recognition and Machine Learning (2006) Christopher M. BishopPattern Recognition and Machine Learning –Pattern Classification (2 nd edition, 2000) Richard O. Duda, Peter E. Hart and David G. Stork l Additional class notes and copied materials will be given l Reading material links will be provided

5 l Objectives: –Introduce students knowledge about the basic concepts of machine learning and the state-of-the-art literature in data mining/machine learning –Get to know some general topics in medical informatics –Focus on some high-demanding medical informatics problems with hands-on experience of applying data mining techniques l Format: –Lectures, Labs, Paper reviews, A term project Course Information

6 Survey l Why are you taking this course? l What would you like to gain from this course? l What topics are you most interested in learning about from this course? l Any other suggestions? (Please respond before NEXT THUR. You can also Login HuskyCT and download the MS word file, fill in, and shoot me an .)

7 Grading l In-Class Lab Assignments (3): 30% l Paper review (1): 10% l Term Project (1): 50% l Participation (1): 10%

8 Policy l Computers l Assignments must be submitted electronically via HuskyCT l Make-up policy –If a lab assignment or a paper review assignment is missed, there will be a final take-home exam to make up –If two of these assignments are missed, an additional lab assignment and a final take- home exam will be used to make up.

9 Three In-class Lab Assignments l At the class where in-class lab assignment is given, the class meeting will take place in a computer lab, and no lecture l Computer lab will be at ITEB 138 (TA reserve) l The assignment is due at the beginning of the class one week after the assignment is given l If the assignment is handed in one-two days late, 10 credits will be reduced for each additional day l Assignments will be graded by our teaching assistant

10 Paper review l Topics of papers for review will be discussed l Each student selects 1 paper in each assignment, prepares slides and presents the paper in 8 – 15 mins in the class l The goal is to take a look at the state-of-the-art research work in the related field l Paper review assignment is on topics of state-of- the-art data mining techniques

11 Term Project l Possible project topics will be provided as links, students are encouraged to propose their own l Teams of 1-2 students can be created l Each team needs to give a presentation in the last 1-2 weeks of the class (10-15min) l Each team needs to submit a project report –Definition of the problem –Data mining approaches used to solve the problem –Computational results –Conclusion (success or failure)

12 Final Exam l If you need make-up final exam, the exam will be provided on May. 1 st (Wed) l Take-home exam l Due on May 9 th (Thur.)

13 Three In-class Lab Assignments l BioMedical Informatics Topics –So many –Cardiac Ultrasound image categorization –Computerized decision support for Trauma Patient Care –Computer assisted diagnostic coding

14 Cardiac ultrasound view separation

15 Cardiac ultrasound view separation Classification (or clustering) Apical 4 chamber view Parasternal long axis view Parasternal short axis view

16 l 25 min of transport time/patient l High-frequency vital-sign waveforms (3 waveforms) –ECG, SpO2, Respiratory l Low-frequency vital-sign time series (9 variables) Derived variables –ECG heart rate –SpO2 heart rate –SaO2 arterial O2 saturation –Respiratory rate l Discrete patient attribute data (100 variables) –Demographics, injury description, prehospital interventions, etc. Measured variables ► NIBP (systolic, diastolic, MAP) ► NIBP heart rate ► End tidal CO2 Vital signs used in decision- support algorithms HR RR SaO2 SBP DBP Propaq Trauma Patient Care

17 Trauma Patient Care

18 Heart Rate Respiratory Rate Saturation of Oxygen Blood Pressure Major Bleeding Make a prediction Trauma Patient Care

19 Patients– Criteria Patient diagnosis 250 AMI SCIP... heart failure diabetes Code database Look up ICD-9 codes Patient– Notes Patient 1 A Note B C D E 2 F G... Hospital Document DB Diagnostic Code DB Statistics reimbursement Insurance 19 SIEMENS /38 Diagnostic coding

20 Patients– Criteria Patient diagnosis 250 AMI SCIP... heart failure diabetes Code database Look up ICD-9 codes Patient– Notes Patient 1 A Note B C D E 2 F G... Hospital Document DB Diagnostic Code DB Statistics reimbursement Insurance 20 SIEMENS /38 Diagnostic coding

21 Patients– Criteria Patient diagnosis 250 AMI SCIP... heart failure diabetes Code database Look up ICD-9 codes Patient– Notes Patient 1 A Note B C D E 2 F G... Hospital Document DB Diagnostic Code DB Statistics reimbursement Insurance 21 SIEMENS /38 Diagnostic coding

22 Machine Learning / Data Mining l Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information l The ultimate goal of machine learning is the creation and understanding of machine intelligence l The main goal of statistical learning theory is to provide a framework for studying the problem of inference, that is of gaining knowledge, making predictions, and making decisions from a set of data.

23 Traditional Topics in Data Mining /AI l Fuzzy set and fuzzy logic –Fuzzy if-then rules l Evolutionary computation –Genetic algorithms –Evolutionary strategies l Artificial neural networks –Back propagation network (supervised learning) –Self-organization network (unsupervised learning, will not be covered)

24 Next Class l Continue with data mining topics l Review of some basics of linear algebra and probability

25 Last Class l Described the syllabus of this course l Talked about HuskyCT website (Illustration) l Briefly introduce 3 medical informatics topics –Medical images: cardiac echo view recognition –Numerical: Trauma patient care –Free text: ICD-9 diagnostic coding l Introduce a little bit about definition of data mining, machine learning, statistical learning theory.

26 l Lack theoretical analysis about the behavior of the algorithms l Traditional Techniques may be unsuitable due to –Enormity of data –High dimensionality of data –Heterogeneous, distributed nature of data Challenges in traditional techniques Machine Learning/ Pattern Recognition Statistics/ AI Soft Computing

27 Recent Topics in Data Mining l Supervised learning such as classification and regression –Support vector machines –Regularized least squares –Fisher discriminant analysis (LDA) –Graphical models (Bayesian nets) –others Draw from Machine Learning domains

28 Recent Topics in Data Mining l Unsupervised learning such as clustering –K-means –Gaussian mixture models –Hierarchical clustering –Graph based clustering (spectral clustering) l Dimension reduction –Feature selection –Compact feature space into low-dimensional space (principal component analysis)

29 Statistical Behavior l Many perspectives to analyze how the algorithm handles uncertainty l Simple examples: –Consistency analysis –Learning bounds (upper bound on test error of the constructed model or solution) l “Statistical” not “deterministic” –With probability p, the upper bound holds P( > p) <= Upper_bound

30 Tasks may be in Data Mining l Prediction tasks (supervised problem) –Use some variables to predict unknown or future values of other variables. l Description tasks (unsupervised problem) –Find human-interpretable patterns that describe the data. From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

31 Problems in Data Mining l Inference l Classification [Predictive] l Regression [Predictive] l Clustering [Descriptive] l Deviation Detection [Predictive]

32 Classification: Definition l Given a collection of examples (training set ) –Each example contains a set of attributes, one of the attributes is the class. l Find a model for class attribute as a function of the values of other attributes. l Goal: previously unseen examples should be assigned a class as accurately as possible. –A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

33 Classification Example categorical continuous class Test Set Training Set Model Learn Classifier

34 Classification: Application 1 l High Risky Patient Detection –Goal: Predict if a patient will suffer major complication after a surgery procedure –Approach:  Use patients vital signs before and after surgical operation. –Heart Rate, Respiratory Rate, etc.  Monitor patients by expert medical professionals to label which patient has complication, which has not.  Learn a model for the class of the after-surgery risk.  Use this model to detect potential high-risk patients for a particular surgical procedure

35 Classification: Application 2 l Face recognition –Goal: Predict the identity of a face image –Approach:  Align all images to derive the features  Model the class (identity) based on these features

36 Classification: Application 3 l Cancer Detection –Goal: To predict class (cancer or normal) of a sample (person), based on the microarray gene expression data –Approach:  Use expression levels of all genes as the features  Label each example as cancer or normal  Learn a model for the class of all samples

37 Classification: Application 4 l Alzheimer's Disease Detection –Goal: To predict class (AD or normal) of a sample (person), based on neuroimaging data such as MRI and PET –Approach:  Extract features from neuroimages  Label each example as AD or normal  Learn a model for the class of all samples Reduced gray matter volume (colored areas) detected by MRI voxel-based morphometry in AD patients compared to normal healthy controls.

38 Regression l Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. l Greatly studied in statistics, neural network fields. l Examples: –Predicting sales amounts of new product based on advertising expenditure. –Predicting wind velocities as a function of temperature, humidity, air pressure, etc. –Time series prediction of stock market indices.

39 Classification algorithms l K-Nearest-Neighbor classifiers l Naïve Bayes classifier l Neural Networks l Linear Discriminant Analysis (LDA) l Support Vector Machines (SVM) l Decision Tree l Logistic Regression l Graphical models

40 Clustering Definition l Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that –Data points in one cluster are more similar to one another. –Data points in separate clusters are less similar to one another. l Similarity Measures: –Euclidean Distance if attributes are continuous. –Other Problem-specific Measures

41 Illustrating Clustering xEuclidean Distance Based Clustering in 3-D space. Intracluster distances are minimized Intracluster distances are minimized Intercluster distances are maximized Intercluster distances are maximized

42 Clustering: Application 1 l High Risky Patient Detection –Goal: Predict if a patient will suffer major complication after a surgery procedure –Approach:  Use patients vital signs before and after surgical operation. –Heart Rate, Respiratory Rate, etc.  Find patients whose symptoms are dissimilar from most of other patients.

43 Clustering: Application 2 l Document Clustering: –Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. –Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. –Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.

44 Illustrating Document Clustering l Clustering Points: 3204 Articles of Los Angeles Times. l Similarity Measure: How many words are common in these documents (after some word filtering).

45 Clustering algorithms l K-Means l Hierarchical clustering l Graph based clustering (Spectral clustering) l Semi-supervised clustering l Others

46 Basics of probability l An experiment (random variable) is a well- defined process with observable outcomes. l The set or collection of all outcomes of an experiment is called the sample space, S. l An event E is any subset of outcomes from S. l Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S.

47 Probability Theory Apples and Oranges Assume P(Y=r) = 40%, P(Y=b) = 60% (prior) P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% X: identity of the fruit Y: identity of the box Marginal P(X=a) = 11/20, P(X=o) = 9/20 Posterior P(Y=r|X=o) = 2/3 P(Y=b|X=o) = 1/3

48 Probability Theory l Marginal Probability l Conditional Probability Joint Probability

49 Probability Theory l Sum Rule Product Rule The marginal prob of X equals the sum of the joint prob of x and y with respect to y The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X

50 Illustration Y=1 Y=2 p(X) p(Y) p(X|Y=1) p(X,Y)

51 The Rules of Probability l Sum Rule l Product Rule l Bayes’ Rule posterior  likelihood × prior = p(X|Y)p(Y)

52 Mean and Variance l The mean of a random variable X is the average value X takes. l The variance of X is a measure of how dispersed the values that X takes are. l The standard deviation is simply the square root of the variance.

53 Simple Example l X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2 l Mean –0.8 X X 2 = 1.2 l Variance –0.8 X (1 – 1.2) X (1 – 1.2) X (2 – 1.2) X (2-1.2)

54 References l SC_prob_basics1.pdf (necessary) l SC_prob_basic2.pdf Loaded to HuskyCT

55 Basics of Linear Algebra

56 Matrix Multiplication l The product of two matrices l Special case: vector-vector product, matrix-vector product C AB

57 Matrix Multiplication

58 Rules of Matrix Multiplication C A B

59 Orthogonal Matrix

60 Square Matrix – EigenValue, EigenVector where

61 Symmetric Matrix – EigenValue EigenVector eigen-decomposition of A

62 Matrix Norms and Trace Frobenius norm

63 Singular Value Decomposition orthogonal diagonal

64 References l SC_linearAlg_basics.pdf (necessary) l SVD_basics.pdf loaded to HuskyCT

65 Summary l This is the end of the FIRST chapter of this course l Next Class Cluster analysis –General topics –K-means l Slides after this one are backup slides, you can also check them to learn more

66 Neural Networks l Motivated by biological brain neuron model introduced by McCulloch and Pitts in 1943 l A neural network consists of  Nodes (mimic neurons)  Links between nodes (pass message around, represent causal relationship)  All parts of NN are adaptive (modifiable parameters)  Learning rules specify these parameters to finalize the NN soma Dendrite Nucleus Axon Myelin Sheath Node of Ranvier Schwann cell Axon terminal

67 Illustration of NN x1 x2 y w11 w12 Activation function

68 Many Types of NN l Adaptive NN l Single-layer NN (perceptrons) l Multi-layer NN l Self-organizing NN l Different activation functions l Types of problems: –Supervised learning –Unsupervised learning

69 Classification: Addiitonal Application l Sky Survey Cataloging –Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory). –3000 images with 23,040 x 23,040 pixels per image. –Approach:  Segment the image.  Measure image attributes (features) - 40 of them per object.  Model the class based on these features.  Success Story: Could find 16 new high red-shift quasars, some of the farthest objects that are difficult to find! From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

70 Classifying Galaxies Early Intermediate Late Data Size: 72 million stars, 20 million galaxies Object Catalog: 9 GB Image Database: 150 GB Class: Stages of Formation Attributes: Image features, Characteristics of light waves received, etc. Courtesy:

71 Challenges of Data Mining l Scalability l Dimensionality l Complex and Heterogeneous Data l Data Quality l Data Ownership and Distribution l Privacy Preservation

72 Application of Prob Rules p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25* *0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3 Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25%

73 The Gaussian Distribution

74 Gaussian Mean and Variance

75 The Multivariate Gaussian x y