Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept.

Similar presentations


Presentation on theme: "1 Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept."— Presentation transcript:

1 1 Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept.

2 2 Course Information l Instructor: Dr. Jinbo Bi –Office: ITEB 233 –Phone: 860-486-1458 –Email: jinbo@engr.uconn.edu –Web : http://www.engr.uconn.edu/~jinbo/ http://www.engr.uconn.edu/~jinbo/ –Time: Mon / Wed. 2:00pm – 3:15pm –Location: CAST 204 –Office hours: Mon. 3:30-4:30pm l HuskyCT –http://learn.uconn.eduhttp://learn.uconn.edu –Login with your NetID and password –Illustration

3 3 Introduction of the instructor l Ph.D in Mathematics l Previous work experience: –Siemens Medical Solutions Inc. –Department of Defense, Bioanalysis –Massachusetts General Hospital l Research Interests subtypingGWAS Color of flowers Cancer, Psychiatric disorders, … http://labhealthinfo.uconn.e du/EasyBreathing

4 4 Course Information l Prerequisite: Basics of linear algebra, calculus, and basics of programming l Course textbook (not required): –Introduction to Data Mining (2005) by Pang-Ning Tan, Michael Steinbach, Vipin Kumar –Pattern Recognition and Machine Learning (2006) Christopher M. BishopPattern Recognition and Machine Learning –Pattern Classification (2 nd edition, 2000) Richard O. Duda, Peter E. Hart and David G. Stork l Additional class notes and copied materials will be given l Reading material links will be provided

5 5 l Objectives: –Introduce students knowledge about the basic concepts of machine learning and the state-of-the-art literature in data mining/machine learning –Get to know some general topics in medical informatics –Focus on some high-demanding medical informatics problems with hands-on experience of applying data mining techniques l Format: –Lectures, Labs, Paper reviews, A term project Course Information

6 6 Survey l Why are you taking this course? l What would you like to gain from this course? l What topics are you most interested in learning about from this course? l Any other suggestions? (Please respond before NEXT THUR. You can also Login HuskyCT and download the MS word file, fill in, and shoot me an email.)

7 7 Grading l In-Class Lab Assignments (3): 30% l Paper review (1): 10% l Term Project (1): 50% l Participation (1): 10%

8 8 Policy l Computers l Assignments must be submitted electronically via HuskyCT l Make-up policy –If a lab assignment or a paper review assignment is missed, there will be a final take-home exam to make up –If two of these assignments are missed, an additional lab assignment and a final take- home exam will be used to make up.

9 9 Three In-class Lab Assignments l At the class where in-class lab assignment is given, the class meeting will take place in a computer lab, and no lecture l Computer lab will be at ITEB 138 (TA reserve) l The assignment is due at the beginning of the class one week after the assignment is given l If the assignment is handed in one-two days late, 10 credits will be reduced for each additional day l Assignments will be graded by our teaching assistant

10 10 Paper review l Topics of papers for review will be discussed l Each student selects 1 paper in each assignment, prepares slides and presents the paper in 8 – 15 mins in the class l The goal is to take a look at the state-of-the-art research work in the related field l Paper review assignment is on topics of state-of- the-art data mining techniques

11 11 Term Project l Possible project topics will be provided as links, students are encouraged to propose their own l Teams of 1-2 students can be created l Each team needs to give a presentation in the last 1-2 weeks of the class (10-15min) l Each team needs to submit a project report –Definition of the problem –Data mining approaches used to solve the problem –Computational results –Conclusion (success or failure)

12 12 Final Exam l If you need make-up final exam, the exam will be provided on May. 1 st (Wed) l Take-home exam l Due on May 9 th (Thur.)

13 13 Three In-class Lab Assignments l BioMedical Informatics Topics –So many –Cardiac Ultrasound image categorization –Computerized decision support for Trauma Patient Care –Computer assisted diagnostic coding

14 14 Cardiac ultrasound view separation

15 15 Cardiac ultrasound view separation Classification (or clustering) Apical 4 chamber view Parasternal long axis view Parasternal short axis view

16 16 l 25 min of transport time/patient l High-frequency vital-sign waveforms (3 waveforms) –ECG, SpO2, Respiratory l Low-frequency vital-sign time series (9 variables) Derived variables –ECG heart rate –SpO2 heart rate –SaO2 arterial O2 saturation –Respiratory rate l Discrete patient attribute data (100 variables) –Demographics, injury description, prehospital interventions, etc. Measured variables ► NIBP (systolic, diastolic, MAP) ► NIBP heart rate ► End tidal CO2 Vital signs used in decision- support algorithms HR RR SaO2 SBP DBP Propaq Trauma Patient Care

17 17 Trauma Patient Care

18 18 Heart Rate Respiratory Rate Saturation of Oxygen Blood Pressure Major Bleeding Make a prediction Trauma Patient Care

19 19 Patients– Criteria Patient 1 428 diagnosis 250 AMI 2414 3 250 429 SCIP... heart failure diabetes Code database Look up ICD-9 codes Patient– Notes Patient 1 A Note B C D E 2 F G... Hospital Document DB Diagnostic Code DB Statistics reimbursement Insurance 19 SIEMENS /38 Diagnostic coding

20 20 Patients– Criteria Patient 1 428 diagnosis 250 AMI 2414 3 250 429 SCIP... heart failure diabetes Code database Look up ICD-9 codes Patient– Notes Patient 1 A Note B C D E 2 F G... Hospital Document DB Diagnostic Code DB Statistics reimbursement Insurance 20 SIEMENS /38 Diagnostic coding

21 21 Patients– Criteria Patient 1 428 diagnosis 250 AMI 2414 3 250 429 SCIP... heart failure diabetes Code database Look up ICD-9 codes Patient– Notes Patient 1 A Note B C D E 2 F G... Hospital Document DB Diagnostic Code DB Statistics reimbursement Insurance 21 SIEMENS /38 Diagnostic coding

22 22 Machine Learning / Data Mining l Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information l The ultimate goal of machine learning is the creation and understanding of machine intelligence l The main goal of statistical learning theory is to provide a framework for studying the problem of inference, that is of gaining knowledge, making predictions, and making decisions from a set of data.

23 23 Traditional Topics in Data Mining /AI l Fuzzy set and fuzzy logic –Fuzzy if-then rules l Evolutionary computation –Genetic algorithms –Evolutionary strategies l Artificial neural networks –Back propagation network (supervised learning) –Self-organization network (unsupervised learning, will not be covered)

24 24 Next Class l Continue with data mining topics l Review of some basics of linear algebra and probability

25 25 Last Class l Described the syllabus of this course l Talked about HuskyCT website (Illustration) l Briefly introduce 3 medical informatics topics –Medical images: cardiac echo view recognition –Numerical: Trauma patient care –Free text: ICD-9 diagnostic coding l Introduce a little bit about definition of data mining, machine learning, statistical learning theory.

26 26 l Lack theoretical analysis about the behavior of the algorithms l Traditional Techniques may be unsuitable due to –Enormity of data –High dimensionality of data –Heterogeneous, distributed nature of data Challenges in traditional techniques Machine Learning/ Pattern Recognition Statistics/ AI Soft Computing

27 27 Recent Topics in Data Mining l Supervised learning such as classification and regression –Support vector machines –Regularized least squares –Fisher discriminant analysis (LDA) –Graphical models (Bayesian nets) –others Draw from Machine Learning domains

28 28 Recent Topics in Data Mining l Unsupervised learning such as clustering –K-means –Gaussian mixture models –Hierarchical clustering –Graph based clustering (spectral clustering) l Dimension reduction –Feature selection –Compact feature space into low-dimensional space (principal component analysis)

29 29 Statistical Behavior l Many perspectives to analyze how the algorithm handles uncertainty l Simple examples: –Consistency analysis –Learning bounds (upper bound on test error of the constructed model or solution) l “Statistical” not “deterministic” –With probability p, the upper bound holds P( > p) <= Upper_bound

30 30 Tasks may be in Data Mining l Prediction tasks (supervised problem) –Use some variables to predict unknown or future values of other variables. l Description tasks (unsupervised problem) –Find human-interpretable patterns that describe the data. From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

31 31 Problems in Data Mining l Inference l Classification [Predictive] l Regression [Predictive] l Clustering [Descriptive] l Deviation Detection [Predictive]

32 32 Classification: Definition l Given a collection of examples (training set ) –Each example contains a set of attributes, one of the attributes is the class. l Find a model for class attribute as a function of the values of other attributes. l Goal: previously unseen examples should be assigned a class as accurately as possible. –A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

33 33 Classification Example categorical continuous class Test Set Training Set Model Learn Classifier

34 34 Classification: Application 1 l High Risky Patient Detection –Goal: Predict if a patient will suffer major complication after a surgery procedure –Approach:  Use patients vital signs before and after surgical operation. –Heart Rate, Respiratory Rate, etc.  Monitor patients by expert medical professionals to label which patient has complication, which has not.  Learn a model for the class of the after-surgery risk.  Use this model to detect potential high-risk patients for a particular surgical procedure

35 35 Classification: Application 2 l Face recognition –Goal: Predict the identity of a face image –Approach:  Align all images to derive the features  Model the class (identity) based on these features

36 36 Classification: Application 3 l Cancer Detection –Goal: To predict class (cancer or normal) of a sample (person), based on the microarray gene expression data –Approach:  Use expression levels of all genes as the features  Label each example as cancer or normal  Learn a model for the class of all samples

37 37 Classification: Application 4 l Alzheimer's Disease Detection –Goal: To predict class (AD or normal) of a sample (person), based on neuroimaging data such as MRI and PET –Approach:  Extract features from neuroimages  Label each example as AD or normal  Learn a model for the class of all samples Reduced gray matter volume (colored areas) detected by MRI voxel-based morphometry in AD patients compared to normal healthy controls.

38 38 Regression l Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. l Greatly studied in statistics, neural network fields. l Examples: –Predicting sales amounts of new product based on advertising expenditure. –Predicting wind velocities as a function of temperature, humidity, air pressure, etc. –Time series prediction of stock market indices.

39 39 Classification algorithms l K-Nearest-Neighbor classifiers l Naïve Bayes classifier l Neural Networks l Linear Discriminant Analysis (LDA) l Support Vector Machines (SVM) l Decision Tree l Logistic Regression l Graphical models

40 40 Clustering Definition l Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that –Data points in one cluster are more similar to one another. –Data points in separate clusters are less similar to one another. l Similarity Measures: –Euclidean Distance if attributes are continuous. –Other Problem-specific Measures

41 41 Illustrating Clustering xEuclidean Distance Based Clustering in 3-D space. Intracluster distances are minimized Intracluster distances are minimized Intercluster distances are maximized Intercluster distances are maximized

42 42 Clustering: Application 1 l High Risky Patient Detection –Goal: Predict if a patient will suffer major complication after a surgery procedure –Approach:  Use patients vital signs before and after surgical operation. –Heart Rate, Respiratory Rate, etc.  Find patients whose symptoms are dissimilar from most of other patients.

43 43 Clustering: Application 2 l Document Clustering: –Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. –Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. –Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.

44 44 Illustrating Document Clustering l Clustering Points: 3204 Articles of Los Angeles Times. l Similarity Measure: How many words are common in these documents (after some word filtering).

45 45 Clustering algorithms l K-Means l Hierarchical clustering l Graph based clustering (Spectral clustering) l Semi-supervised clustering l Others

46 46 Basics of probability l An experiment (random variable) is a well- defined process with observable outcomes. l The set or collection of all outcomes of an experiment is called the sample space, S. l An event E is any subset of outcomes from S. l Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S.

47 47 Probability Theory Apples and Oranges Assume P(Y=r) = 40%, P(Y=b) = 60% (prior) P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% X: identity of the fruit Y: identity of the box Marginal P(X=a) = 11/20, P(X=o) = 9/20 Posterior P(Y=r|X=o) = 2/3 P(Y=b|X=o) = 1/3

48 48 Probability Theory l Marginal Probability l Conditional Probability Joint Probability

49 49 Probability Theory l Sum Rule Product Rule The marginal prob of X equals the sum of the joint prob of x and y with respect to y The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X

50 50 Illustration Y=1 Y=2 p(X) p(Y) p(X|Y=1) p(X,Y)

51 51 The Rules of Probability l Sum Rule l Product Rule l Bayes’ Rule posterior  likelihood × prior = p(X|Y)p(Y)

52 52 Mean and Variance l The mean of a random variable X is the average value X takes. l The variance of X is a measure of how dispersed the values that X takes are. l The standard deviation is simply the square root of the variance.

53 53 Simple Example l X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2 l Mean –0.8 X 1 + 0.2 X 2 = 1.2 l Variance –0.8 X (1 – 1.2) X (1 – 1.2) + 0.2 X (2 – 1.2) X (2-1.2)

54 54 References l SC_prob_basics1.pdf (necessary) l SC_prob_basic2.pdf Loaded to HuskyCT

55 55 Basics of Linear Algebra

56 56 Matrix Multiplication l The product of two matrices l Special case: vector-vector product, matrix-vector product C AB

57 57 Matrix Multiplication

58 58 Rules of Matrix Multiplication C A B

59 59 Orthogonal Matrix 1 1 1...

60 60 Square Matrix – EigenValue, EigenVector where

61 61 Symmetric Matrix – EigenValue EigenVector eigen-decomposition of A

62 62 Matrix Norms and Trace Frobenius norm

63 63 Singular Value Decomposition orthogonal diagonal

64 64 References l SC_linearAlg_basics.pdf (necessary) l SVD_basics.pdf loaded to HuskyCT

65 65 Summary l This is the end of the FIRST chapter of this course l Next Class Cluster analysis –General topics –K-means l Slides after this one are backup slides, you can also check them to learn more

66 66 Neural Networks l Motivated by biological brain neuron model introduced by McCulloch and Pitts in 1943 l A neural network consists of  Nodes (mimic neurons)  Links between nodes (pass message around, represent causal relationship)  All parts of NN are adaptive (modifiable parameters)  Learning rules specify these parameters to finalize the NN soma Dendrite Nucleus Axon Myelin Sheath Node of Ranvier Schwann cell Axon terminal

67 67 Illustration of NN x1 x2 y w11 w12 Activation function

68 68 Many Types of NN l Adaptive NN l Single-layer NN (perceptrons) l Multi-layer NN l Self-organizing NN l Different activation functions l Types of problems: –Supervised learning –Unsupervised learning

69 69 Classification: Addiitonal Application l Sky Survey Cataloging –Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory). –3000 images with 23,040 x 23,040 pixels per image. –Approach:  Segment the image.  Measure image attributes (features) - 40 of them per object.  Model the class based on these features.  Success Story: Could find 16 new high red-shift quasars, some of the farthest objects that are difficult to find! From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

70 70 Classifying Galaxies Early Intermediate Late Data Size: 72 million stars, 20 million galaxies Object Catalog: 9 GB Image Database: 150 GB Class: Stages of Formation Attributes: Image features, Characteristics of light waves received, etc. Courtesy: http://aps.umn.edu

71 71 Challenges of Data Mining l Scalability l Dimensionality l Complex and Heterogeneous Data l Data Quality l Data Ownership and Distribution l Privacy Preservation

72 72 Application of Prob Rules p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25*0.4 + 0.75*0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3 Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25%

73 73 The Gaussian Distribution

74 74 Gaussian Mean and Variance

75 75 The Multivariate Gaussian x y


Download ppt "1 Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept."

Similar presentations


Ads by Google