CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering http://www.engr.uconn.edu/~jinbo
Machine learning (1) Supervised learning algorithms
Topics in machine learning Supervised learning such as classification and regression Unsupervised learning such as cluster analysis, outlier/novelty detection Dimension reduction Semi-supervised learning Active learning Online learning
Common techniques Supervised learning Regularized least squares Least-absolute-shrinkage-and-selection operator Neural networks Logistic regression Decision trees Fisher’s discriminant analysis Support vector machines Graphical models
Common techniques Unsupervised learning K-means Gaussian mixture models Hierarchical clustering Graph-based clustering (e.g., Spectral clustering)
Common techniques Dimension reduction Principal component analysis Independent component analysis Canonical correlation analysis Feature selection Sparse modeling
Machine learning / Data mining Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information http://www.kdd.org/kdd2016/ ACM SIGKDD conference The ultimate goal of machine learning is the creation and understanding of machine intelligence http://icml.cc/2016/ ICML conference Heavily related to statistical learning theory Artificial intelligence is the intelligence exhibited by machines or software. It is to study how to create computers and computer software that are capable of intelligent behavior. http://www.aaai.org/Conferences/AAAI/aaai16.php AAAI conference
Supervised learning: definition Given a collection of examples (training set ) Each example contains a set of attributes (independent variables), one of the attributes is the target (dependent variables). Find a model to predict the target as a function of the values of other attributes. Goal: previously unseen examples should be predicted as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
Supervised learning: definition Given a collection of examples (training set ) Each example contains a set of attributes (independent variables), one of the attributes is the target (dependent variables). Find a model to predict the target as a function of the values of other attributes. Goal: previously unseen examples should be predicted as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
Supervised learning: classification When the dependent variable is categorical, a classification problem
Classification: example Face recognition Goal: Predict the identity of a face image Approach: Align all images to derive the features Model the class (identity) based on these features
Supervised learning: regression When the dependent variable is continuous, a regression problem
Regression: example Risk prediction for patients Goal: Predict the likelihood if a patient will suffer major complication after a surgery procedure Approach: Use patients vital signs before and after surgical operation. Heart Rate, Respiratory Rate, etc. Monitor patients by expert medical professionals to rate the likelihood of a patient having complication Learn a model as patient vital signs to map to the risk ratings. Use this model to detect potential high-risk patients for a particular surgical procedure
Unsupervised learning: clustering Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that Data points in one cluster are more similar to one another. Data points in separate clusters are less similar to one another. Similarity Measures: Euclidean Distance if attributes are continuous. Other Problem-specific Measures
Clustering: example High Risky Patient Detection Goal: Predict if a patient will suffer major complication after a surgery procedure Approach: Use patients vital signs before and after surgical operation. Heart Rate, Respiratory Rate, etc. Find patients whose symptoms are dissimilar from most of other patients.
Practice Judge what kind of the problem it is in the following scenarios A student collected a couple of online documents about movies, and try to identify which movie the documents discuss In a cognitive test, a person is asked if he could recognize the “red” color from a screen. The person needs to press a button if he thinks he sees red, or otherwise not. Then an EEG recording is made during the test. A researcher wants to use the EEG recordings to predict whether the red color is recognized. A researcher observed and recorded whether conditions (temperature, wind speed, snow etc.) from the past month, then he wants to use the data to predict the temperature in the next day.
Practice Judge what kind of the problem it is in the following scenarios A student collected a couple of online documents about movies, and try to identify which movie the documents discuss In a cognitive test, a person is asked if he could recognize the “red” color from a screen. The person needs to press a button if he thinks he sees red, or otherwise not. Then an EEG recording is made during the test. A researcher wants to use the EEG recordings to predict whether the red color is recognized. A researcher observed and recorded whether conditions (temperature, wind speed, snow etc.) from the past month, then he wants to use the data to predict the temperature in the next day.
Review of probability and linear algebra
Basics of probability An experiment (random variable) is a well-defined process with observable outcomes. The set or collection of all outcomes of an experiment is called the sample space, S. An event E is any subset of outcomes from S. Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S.
Probability theory
Probability theory Joint Probability Marginal Probability Conditional Probability Joint Probability
Probability theory Sum Rule Product Rule The marginal prob of X equals the sum of the joint prob of x and y with respect to y Product Rule The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X
Illustration Y=1 Y=2 p(X) p(Y) p(X|Y=1) p(X,Y)
The rules of probability Sum Rule Product Rule Bayes’ Rule = p(X|Y)p(Y) posterior likelihood × prior
Application of probability rules Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25*0.4 + 0.75*0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3
Application of probability rules Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25*0.4 + 0.75*0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3
Mean and variance The mean of a random variable X is the average value X takes. The variance of X is a measure of how dispersed the values that X takes are. The standard deviation is simply the square root of the variance.
Simple example X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2 Mean 0.8 X 1 + 0.2 X 2 = 1.2 Variance 0.8 X (1 – 1.2) X (1 – 1.2) + 0.2 X (2 – 1.2) X (2-1.2)
Gaussian distribution
Gaussian distribution
Multivariate Gaussian x y
Basics of linear algebra
Matrix multiplication The product of two matrices Special case: vector-vector product, matrix-vector product C A B
Matrix multiplication
Rules of matrix multiplication B
Vector norms
Matrix norms and trace
A bit more on matrix
Orthogonal matrix 1 .
Square matrix – eigenvalue, eigenvector where
Symmetric matrix eigen-decomposition of A
Singular value decomposition orthogonal orthogonal diagonal
Supervised learning – practical issues Underfitting Overfitting Before introducing these important concept, let us study a simple regression algorithm – linear regression
Questions?