Computer Science and Engineering Dept.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Component Analysis (Review)
Data Mining Classification: Alternative Techniques
An Overview of Machine Learning
What is Statistical Modeling
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Principal Component Analysis
Week 9 Data Mining System (Knowledge Data Discovery)
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning CMPT 726 Simon Fraser University
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
University of Minnesota
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Introduction to Artificial Neural Network and Fuzzy Systems
Chapter 5 Data mining : A Closer Look.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
CSE 4705 Artificial Intelligence
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Principles of Pattern Recognition
1 Machine Learning in BioMedical Informatics SCE 5095: Special Topics Course Instructor: Jinbo Bi Computer Science and Engineering Dept.
Knowledge Discovery and Data Mining Evgueni Smirnov.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Computational BioMedical Informatics
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Data Mining Jinze Liu April 8 th, 2009.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Data Mining and Decision Support
CPE542: Pattern Recognition Course Introduction Dr. Gheith Abandah د. غيث علي عبندة.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Usman Roshan Dept. of Computer Science NJIT
CSE 4705 Artificial Intelligence
Data Mining ICCM
Semi-Supervised Clustering
Object Orie’d Data Analysis, Last Time
School of Computer Science & Engineering
Data Mining: Introduction
CSE 4705 Artificial Intelligence
Special Topics in Data Mining Applications Focus on: Text Mining
Sangeeta Devadiga CS 157B, Spring 2007
Prepared by: Mahmoud Rafeek Al-Farra
Pattern Recognition and Image Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
Data Mining: Introduction
Generally Discriminant Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Multivariate Methods Berlin Chen
Presentation transcript:

Computer Science and Engineering Dept. Machine Learning SCE 5820: Machine Learning Instructor: Jinbo Bi Computer Science and Engineering Dept.

Course Information Instructor: Dr. Jinbo Bi Office: ITEB 233 Phone: 860-486-1458 Email: jinbo@engr.uconn.edu Web: http://www.engr.uconn.edu/~jinbo/ Time: Tue / Thur. 2:00pm – 3:15pm Location: BCH 302 Office hours: Thur. 3:15-4:15pm HuskyCT http://learn.uconn.edu Login with your NetID and password Illustration

Introduction of the instructor and TA Ph.D in Mathematics Research interests: machine learning, data mining, optimization, biomedical informatics, bioinformatics subtyping GWAS Color of flowers Cancer, Psychiatric disorders, … http://labhealthinfo.uconn.edu/EasyBreathing

Course Information Prerequisite: Basics of linear algebra, calculus, optimization and basics of programming Course textbook (not required): Introduction to Data Mining (2005) by Pang-Ning Tan, Michael Steinbach, Vipin Kumar Pattern Recognition and Machine Learning (2006) Christopher M. Bishop Pattern Classification (2nd edition, 2000) Richard O. Duda, Peter E. Hart and David G. Stork Additional class notes and copied materials will be given Reading material links will be provided

Course Information Objectives: Introduce students knowledge about the basic concepts of machine learning and the state-of-the-art machine learning algorithms Focus on some high-demanding application domains with hands-on experience of applying data mining/ machine learning techniques Format: Lectures, Micro teaching assignment, Quizzes, A term project

Grading Micro teaching assignment (1): 20% In-class/In-lab open-book open notes quizzes (4-5): 40% Term Project (1): 30% Participation: 10% Term Project is one for each term. A term can consist of one or two students. Each student in the team needs to specify his/her roles in the project. Term projects can be chosen from a list of pre-defined projects

Policy Computers Participation in micro-teaching sessions is very important, and itself accounts for 50% of the credits for micro-teaching assignment Quizzes are graded by the instructor Final term projects will be graded by the instructor If you miss two quizzes, there will be a take- home quiz to make up the credits (missing one may be ok for your final grade.)

Micro-teaching sessions Students in our class need to form THREE roughly-even study groups The instructor will help to balance off the study groups Each study group will be responsible of teaching one specific topic chosen from the following: Support Vector Machines Spectral Clustering Boosting (PAC learning model)

Term Project Each team needs to give two presentations: a progress or preparation presentation (10-15min); a final presentation in the last week (15-20min) Each team needs to submit a project report Definition of the problem Data mining approaches used to solve the problem Computational results Conclusion (success or failure)

Machine Learning / Data Mining Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information http://www.kdd.org/kdd2013/ ACM SIGKDD conference The ultimate goal of machine learning is the creation and understanding of machine intelligence http://icml.cc/2013/ ICML conference The main goal of statistical learning theory is to provide a framework for studying the problem of inference, that is of gaining knowledge, making predictions, and decisions from a set of data. http://nips.cc/Conferences/2012/ NIPS conference

Traditional Topics in Data Mining /AI Fuzzy set and fuzzy logic Fuzzy if-then rules Evolutionary computation Genetic algorithms Evolutionary strategies Artificial neural networks Back propagation network (supervised learning) Self-organization network (unsupervised learning, will not be covered)

Challenges in traditional techniques Lack theoretical analysis about the behavior of the algorithms Traditional Techniques may be unsuitable due to Enormity of data High dimensionality of data Heterogeneous, distributed nature of data Statistics/ AI Machine Learning/ Pattern Recognition Soft Computing

Recent Topics in Data Mining Supervised learning such as classification and regression Support vector machines Regularized least squares Fisher discriminant analysis (LDA) Graphical models (Bayesian nets) Boosting algorithms Draw from Machine Learning domains

Recent Topics in Data Mining Unsupervised learning such as clustering K-means Gaussian mixture models Hierarchical clustering Graph based clustering (spectral clustering) Dimension reduction Feature selection Compact feature space into low-dimensional space (principal component analysis)

Statistical Behavior Many perspectives to analyze how the algorithm handles uncertainty Simple examples: Consistency analysis Learning bounds (upper bound on test error of the constructed model or solution) “Statistical” not “deterministic” With probability p, the upper bound holds P( > p) <= Upper_bound

Tasks may be in Data Mining Prediction tasks (supervised problem) Use some variables to predict unknown or future values of other variables. Description tasks (unsupervised problem) Find human-interpretable patterns that describe the data. From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

Classification: Definition Given a collection of examples (training set ) Each example contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen examples should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Classification Example categorical categorical continuous class Test Set Learn Classifier Model Training Set

Classification: Application 1 High Risky Patient Detection Goal: Predict if a patient will suffer major complication after a surgery procedure Approach: Use patients vital signs before and after surgical operation. Heart Rate, Respiratory Rate, etc. Monitor patients by expert medical professionals to label which patient has complication, which has not. Learn a model for the class of the after-surgery risk. Use this model to detect potential high-risk patients for a particular surgical procedure

Classification: Application 2 Face recognition Goal: Predict the identity of a face image Approach: Align all images to derive the features Model the class (identity) based on these features

Classification: Application 3 Cancer Detection Goal: To predict class (cancer or normal) of a sample (person), based on the microarray gene expression data Approach: Use expression levels of all genes as the features Label each example as cancer or normal Learn a model for the class of all samples

Classification: Application 4 Alzheimer's Disease Detection Goal: To predict class (AD or normal) of a sample (person), based on neuroimaging data such as MRI and PET Approach: Extract features from neuroimages Label each example as AD or normal Learn a model for the class of all samples Reduced gray matter volume (colored areas) detected by MRI voxel-based morphometry in AD patients compared to normal healthy controls.

Regression Predict a value of a real-valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Extensively studied in statistics, neural network fields. Find a model to predict the dependent variable as a function of the values of independent variables. Goal: previously unseen examples should be predicted as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Regression application 1 Continuous target categorical categorical continuous Current data, want to use the model to predict Tid Refund Marital Status Taxable Income Loss 1 Yes Single 125K 100 2 No Married 100K 120 3 70K -200 4 120K -300 5 Divorced 95K -400 6 60K -500 7 220K -190 8 85K 300 9 75K -240 10 90K 90 Test Set Learn Regressor Model Training Set Past transaction records, label them goals: Predict the possible loss from a customer

Regression applications Examples: Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.

Clustering Definition Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that Data points in one cluster are more similar to one another. Data points in separate clusters are less similar to one another. Similarity Measures: Euclidean Distance if attributes are continuous. Other Problem-specific Measures

Illustrating Clustering Euclidean Distance Based Clustering in 3-D space. Intracluster distances are minimized Intercluster distances are maximized

Clustering: Application 1 High Risky Patient Detection Goal: Predict if a patient will suffer major complication after a surgery procedure Approach: Use patients vital signs before and after surgical operation. Heart Rate, Respiratory Rate, etc. Find patients whose symptoms are dissimilar from most of other patients.

Clustering: Application 2 Document Clustering: Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.

Illustrating Document Clustering Clustering Points: 3204 Articles of Los Angeles Times. Similarity Measure: How many words are common in these documents (after some word filtering).

Algorithms to solve these problems

Classification algorithms K-Nearest-Neighbor classifiers Naïve Bayes classifier Neural Networks Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Decision Trees Logistic Regression Graphical models

Regression methods Linear Regression Ridge Regression LASSO – Least Absolute Shrinkage and Selection Operator Neural Networks

Clustering algorithms K-Means Hierarchical clustering Graph-based clustering (Spectral clustering) Semi-supervised clustering Others

Challenges of Data Mining Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation

Basics of probability An experiment (random variable) is a well- defined process with observable outcomes. The set or collection of all outcomes of an experiment is called the sample space, S. An event E is any subset of outcomes from S. Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S.

Probability Theory Apples and Oranges X: identity of the fruit Y: identity of the box Assume P(Y=r) = 40%, P(Y=b) = 60% (prior) P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% Marginal P(X=a) = 11/20, P(X=o) = 9/20 Posterior P(Y=r|X=o) = 2/3 P(Y=b|X=o) = 1/3

Probability Theory Joint Probability Marginal Probability Conditional Probability Joint Probability

Probability Theory Sum Rule Product Rule The marginal prob of X equals the sum of the joint prob of x and y with respect to y Product Rule The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X

Illustration Y=1 Y=2 p(X) p(Y) p(X|Y=1) p(X,Y)

The Rules of Probability Sum Rule Product Rule Bayes’ Rule = p(X|Y)p(Y) posterior  likelihood × prior

Application of Prob Rules Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25*0.4 + 0.75*0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3

Application of Prob Rules Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25*0.4 + 0.75*0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3

Mean and Variance The mean of a random variable X is the average value X takes. The variance of X is a measure of how dispersed the values that X takes are. The standard deviation is simply the square root of the variance.

Simple Example X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2 Mean 0.8 X 1 + 0.2 X 2 = 1.2 Variance 0.8 X (1 – 1.2) X (1 – 1.2) + 0.2 X (2 – 1.2) X (2-1.2)

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian x y

References SC_prob_basics1.pdf (necessary) SC_prob_basic2.pdf Loaded to HuskyCT

Basics of Linear Algebra

Matrix Multiplication The product of two matrices Special case: vector-vector product, matrix-vector product C A B

Matrix Multiplication

Rules of Matrix Multiplication B

Orthogonal Matrix 1 .

Square Matrix – EigenValue, EigenVector where

Symmetric Matrix – EigenValue EigenVector eigen-decomposition of A

Matrix Norms and Trace Frobenius norm

Singular Value Decomposition orthogonal orthogonal diagonal

References SC_linearAlg_basics.pdf (necessary) SVD_basics.pdf loaded to HuskyCT

Summary This is the end of the FIRST chapter of this course Next Class Cluster analysis General topics K-means Slides after this one are backup slides, you can also check them to learn more