Advanced Artificial Intelligence Classification

Slides:

Advertisements

Similar presentations

Naïve-Bayes Classifiers Business Intelligence for Managers.

Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

INC 551 Artificial Intelligence Lecture 11 Machine Learning (Continue)

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Supervised Learning Recap

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

CS292 Computational Vision and Language Pattern Recognition and Classification.

CES 514 – Data Mining Lecture 8 classification (contd…)

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Support Vector Machines

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

This week: overview on pattern recognition (related to machine learning)

Efficient Model Selection for Support Vector Machines

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Non-Bayes classifiers. Linear discriminants, neural networks.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

CSSE463: Image Recognition Day 14

CS 9633 Machine Learning Support Vector Machines

Semi-Supervised Clustering

Chapter 7. Classification and Prediction

k-Nearest neighbors and decision tree

Fun with Hyperplanes: Perceptrons, SVMs, and Friends

Artificial Intelligence

Neural Networks Winter-Spring 2014

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Instance Based Learning

Trees, bagging, boosting, and stacking

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Basic machine learning background with Python scikit-learn

Data Mining Lecture 11.

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

CS 4/527: Artificial Intelligence

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

K Nearest Neighbor Classification

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Hidden Markov Models Part 2: Algorithms

CSSE463: Image Recognition Day 14

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Artificial Intelligence Lecture No. 28

The Naïve Bayes (NB) Classifier

CSSE463: Image Recognition Day 15

Other Classification Models: Support Vector Machine (SVM)

Parametric Methods Berlin Chen, 2005 References:

Linear Discrimination

A task of induction to find patterns

SVMs for Document Ranking

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

A task of induction to find patterns

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Advanced Artificial Intelligence Classification Chung-Ang University, Hae-Cheon Kim Reference #1: Pattern Classification (Richard O. Duda, etc.) #2: Wikipedia (https://en.wikipedia.org/wiki/Statistical_classification) Hello, I am Hae-Cheon Kim in Machine Intelligence Lab. In this representation, I would like to announce about classification.

Contents 1. Introduction about Classification 2. List of classification Rule based Decision Tree K-Nearest Neighbor Support Vector Machine Naïve Bayes Neural Network 3. Applications 4. Q&A I will introduce the classification in artificial intelligence and then talk about six representative classification methods in this presentation. And I will explain some simple applications using classification and QA, and over this presentation. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Introduction No Name Antigen-a Antigen-b Antigen-d Blood Type 1 Alice Ο × A 2 Bob B 3 Eve 4 Mallory 5 Trent O There is a dataset of showing five people’s blood type and characteristics. We already know, there are four blood types, A type, B type, O type and AB type. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Introduction No Name Antigen-a Antigen-b Antigen-d Blood Type 1 Alice Ο × A 2 Bob B 3 Eve 4 Mallory 5 Trent O A B AB O Hae-Cheon × Ο … And there is a characteristic of my blood, my blood has antigen-b and antigen-d, and don’t has antigen-a. What is my blood type here? This process is called classification. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Classification ? There is a set of labels 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a test instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classification All task of finding the most relevant label 𝑦 𝑥 for instance 𝑥. For example Test instance 𝑥= ×, O, O Label 𝑌= 𝐴, 𝐵, 𝐴𝐵, 𝑂 ℎ ×, O, O = ? A B AB O ? Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο … By definition, if there are a set of possible labels Y and test instance with feature vector x, the classification means all task of finding the most relevant label 𝑦 𝑥 for instance 𝑥. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Classification ? There is a set of labels 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a test instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classification All task of finding the most relevant label 𝑦 𝑥 for instance 𝑥. For example Test instance 𝑥= ×, O, O Label 𝑌= 𝐴, 𝐵, 𝐴𝐵, 𝑂 ℎ ×, O, O = ? A B AB O ? Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο … In the above example, the classification is to find Hae-Cheon’s blood type when his feature vector 𝑥 is ×, O, O and the result exists in 𝑌= 𝐴, 𝐵, 𝐴𝐵, 𝑂 . Then, how to find that? Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Sample Dataset in real world Classifier There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. Assumption Training dataset contains the world. Sample Dataset in real world Training Classifier ℎ(𝑥) Name Antigen-a Antigen-b Antigen-d Hae-Cheon × Ο Input Output To find the answer, the function is need, the function is called 'classifier’. To predict right answers, many classifiers use a sample data set from real world for high accuracy. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

List of Classifiers Rule-based Decision Tree K-Nearest Neighbor Support Vector Machine Naïve Bayes Neural Network There is six representative classifiers and they train using given dataset in their own way. Let's see how each classifier tries to classify given instances. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Rule-based Classifier Classify records by using a collection of “if…then…”, “switch” rules. There is a people 𝑃. If 𝑃 has Antigen-a & has not Antigen-b: Blood Type is A If 𝑃 has Antigen-a & has Antigen-b: Blood Type is B If 𝑃 has not Antigen-a & has not Antigen-b: Blood Type is O Blood Type is AB Not Training! First, there is a rule-based classifier. The classifier uses handmade rules, for example, “if…then…”, or “switch”. The rules are written by the programmer directly, not training! Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Rule-based Classifier Classify records by using a collection of “if…then…” rules. Example) Classification rule of the blood type. There is a people 𝑃. If 𝑃 has Antigen-a & has not Antigen-b: Blood Type is A If 𝑃 has Antigen-a & has Antigen-b: Blood Type is B If 𝑃 has not Antigen-a & has not Antigen-b: Blood Type is O Blood Type is AB Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο … Anti-a Anti-b O X AB B A Blood Type B For example, in the blood types problem, the rule-based classifier classifies using scientific facts. And if the classifier gets a new instance data, the classifier runs the rule-code and classifies the new instance. Advanced Artificial Intelligence / Chung-Ang University / Your name here

Rule-based Classifier Classify records by using a collection of “if…then…” rules. Example) Classification rule of the blood type. There is a people 𝑃. If 𝑃 has Antigen-a & has not Antigen-b: Blood Type is A If 𝑃 has not Antigen-a & has Antigen-b: Blood Type is B If 𝑃 has not Antigen-a & has not Antigen-b: Blood Type is O If 𝑃 has Antigen-a & has Antigen-b: Blood Type is AB Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο … Anti-a Anti-b O X AB B A Blood Type B The picture shows the process of classifying by my blood type according to the defined rules. The people has not Antigen-a and has Antigen-b. Then, the people’s blood type is ‘B’. Advanced Artificial Intelligence / Chung-Ang University / Your name here

Decision Tree The second classifier is decision tree. The decision tree classifier uses a tree structure which a node has dividing rules and leaf node has class (or label). Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Decision Tree If new instance comes in, it starts from the top-node, searches for leaf nodes according to the criteria of each node and returns the class that the leaf has. Therefore, It is important to create efficient dividing rules. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Decision Tree Dataset: List of Titanic passengers SibSp: Number of Siblings/Spouses Aboard Training with high accuracy & divided as large as possible. Name Age Gender SibSp … Survived DiCaprio 24 Man 1 × Kate 17 Woman 4 𝐎 Edward 62 To be efficient, the decision tree makes rules from dataset, and the process is called training in decision tree. For example, if you want to make about alive passengers of Titanic, the classifier needs to train a list of Titanic passengers. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Decision Tree CART algorithm(Classification And Regression Tree) There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. CART algorithm(Classification And Regression Tree) The goal: Maximize information gain (IG) 𝐼𝐺(𝐷, 𝑓)= 𝐼(𝐷)− 𝑗=1 𝑚 𝑁 𝑗 𝑁 𝐼 𝐷 𝑐ℎ𝑖𝑙𝑑 Metric Gini impurity 𝐼 𝐺 𝑡 = 𝑖=1 𝑐 𝑝 𝑖 𝑡 1−𝑝 𝑖 𝑡 =1− 𝑖=1 𝑐 𝑝 𝑖 𝑡 2 Entropy 𝐼 𝐻 𝑡 =− 𝑖=1 𝑐 𝑝 𝑖 𝑡 log 2 𝑝 𝑖 𝑡 How does the classifier find the dividing rules in the dataset? A typical example of such an algorithm is the Classification And Regression Tree (CART) algorithm. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Decision Tree CART algorithm(Classification And Regression Tree) There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. CART algorithm(Classification And Regression Tree) The goal: Maximize information gain (IG) 𝐼𝐺(𝐷, 𝑓)= 𝐼(𝐷)− 𝑗=1 𝑚 𝑁 𝑗 𝑁 𝐼 𝐷 𝑐ℎ𝑖𝑙𝑑 Metric Gini impurity 𝐼 𝐺 𝑡 = 𝑖=1 𝑐 𝑝 𝑖 𝑡 1−𝑝 𝑖 𝑡 =1− 𝑖=1 𝑐 𝑝 𝑖 𝑡 2 Entropy 𝐼 𝐻 𝑡 =− 𝑖=1 𝑐 𝑝 𝑖 𝑡 log 2 𝑝 𝑖 𝑡 This algorithm creates a dividing rules of decision tree, so that maximize information gain when it passes through each branch point. Typically, the Gini impurity and Entropy are used to obtain the gain of information. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Decision Tree An example tree which estimates the probability of kyphosis after surgery, given the age of the patient and the vertebra at which surgery was started. Using CART algorithm, a decision tree can get a fairly accurate tree from given dataset. There is a example of which estimates the probability of kyphosis after surgery, given the age of the patient and the vertebra at which surgery was started. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Decision Tree An example tree which estimates the probability of kyphosis after surgery, given the age of the patient and the vertebra at which surgery was started. The first figure shows the probability of the patient is present in kyphosis. The second graph is a 3D graph of the probability of the feature dimension and the third is 2D representation. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Nearest Neighbor Find the one data in training data. There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. Find the one data in training data. Most similar to 𝑥. Most closet to 𝑥. Find 𝑥 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 Then, there is 𝑦 𝑘 that match 𝑥 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 . Return 𝑦 𝑘 Third, there is Nearest Neighbor classifier. The principle of Nearest Neighbor is very simple, finds the data closest to the input data, and outputs the result of the data. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

K-Nearest Neighbor Find the 𝑘 data in training data. There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. Find the 𝑘 data in training data. More similar to 𝑥. More closet to 𝑥. Find 𝑥 𝑠 1 , 𝑥 𝑠 2 ,…, 𝑥 𝑠 𝑘 . Then, there is 𝑦 𝑠 1 , …, 𝑦 𝑠 𝑘 . Return  average of 𝑦 𝑠 1 , …, 𝑦 𝑠 𝑘 . If the classifier is k-Nearest Neighbor, find the k data and return the average of k data’s labels. Assume the k is 3, then in the above figure, the green object is like triangle. Assume the k is 5, then the green object is like square. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

K-Nearest Neighbor Divide area using K-Nearest Neighbor. There are the graphs to show the result area in feature dimension space, and these figures show why we use the k-Nearest Neighbor. We can see the k-Nearest Neighbor classifier is smoother then Nearest Neighbor classifier because is averaged. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

K-Nearest Neighbor How to calculate distance? Minkowski distance There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. How to calculate distance? Minkowski distance Manhattan distance (𝑝=1) Euclidean distance (𝑝=2) 𝐷 𝑥, 𝑧 = 𝑖=1 𝑑 𝑥 𝑖 − 𝑧 𝑖 𝑝 1 𝑝 Then, how to calculate the distance between each data? In k-NN classifier, measuring method becomes an important issue. Most use the Minkowski distance method. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

K-Nearest Neighbor How to calculate distance? Minkowski distance There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. How to calculate distance? Minkowski distance Manhattan distance (𝑝=1) Euclidean distance (𝑝=2) 𝐷 𝑥, 𝑧 = 𝑖=1 𝑑 𝑥 𝑖 − 𝑧 𝑖 𝑝 1 𝑝 The Manhattan distance when 𝑝=1 and Euclidean distance when 𝑝=2. The right figure shows the area of distance 1 according to the variation of 𝑝. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Support Vector Machine A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) Fourth, there is Support Vector Machine. The objective of the support vector machine is to find a hyperplane in an 𝑑-dimensional space that distinctly classifies the data points. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Support Vector Machine A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) For example, two classes of data are distributed in feature dimension space as shown in first figure. Then you can decide the position and angle of hyperplane that divides to two classes like the second picture. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Support Vector Machine A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) The Support Vector Machine (SVM) is based on finding a hyperplane that gives the maximum margin between the two classes, as given in the third figure. After finding the SVM, classifier uses the hyperplane to classify input instances. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Support Vector Machine A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) Hyperplane equation 𝑤𝑥+𝑏=𝑦 Training matrix 𝑤 and vector 𝑏 such that Where 𝑤 𝑥 𝑖 −𝑏≥1 arg min (𝑤, 𝑏) 𝑤 Mathematically, when the hyperplane linear equation is 𝑤𝑥+𝑏=𝑦, find matrix 𝑤 and vector 𝑏 is minimized to a matrix of where 𝑤 𝑥 𝑖 + 𝑏 is bigger then 1. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Support Vector Machine If the dataset do not divide linear function? Kernel mapping function If the dataset doesn’t divide linear function, then the classifier can use kernel mapping technique. If you apply a kernel mapping function to a dataset, you can find a better hyperplane because the dimension is higher, the figure shows the dataset mapping into higher dimension space (like Gaussian shape mountain). Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑥 ≈𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘 Naïve Bayes There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classification All task of finding the label 𝑦 𝑥 that most closely relates to the instance 𝑥. Find 𝑦 𝑥 such that arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑥 ≈𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘 Fifth, there is a Naïve Bayes classifier. The classifier uses a method of finding the most probable label by estimating the posterior probability by applying the prior probability obtained from the data set. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Naïve Bayes Using Bayes' theorem 𝑃 𝐴 𝐵 = 𝑃 𝐵|𝐴 ⋅𝑃 𝐴 𝑃 𝐵 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Using Bayes' theorem 𝑃 𝐴 𝐵 = 𝑃 𝐵|𝐴 ⋅𝑃 𝐴 𝑃 𝐵 Find y x such that arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑥 = 𝑃 𝑦 𝑘 𝑃 𝑥 𝑦 𝑘 𝑃 𝑥 =𝑃 𝑦 𝑘 𝑃 𝑥 𝑦 𝑘 ≈𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘 This is where we apply the Bayes' rule to derive the approximate equation. Since 𝑥 is a given value, 𝑃(𝑥) can be ignored, and if 𝑦 𝑘 is determined, conditional independence from 𝑥 1 to 𝑥 𝑑 , then we can derive as a most right-hand side. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Neural Network Perceptron(Neural) 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Perceptron(Neural) It is a perceptron made from human neurons. Linear function: 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 :=𝑎 Non-linear function 𝑓 𝑎 Neural Network using Perceptron Finally, there is a Neural Network a classifier, that made by perceptron like neural network like human neuron. What perceptron does is read the values from the previous neurons and combine them linearly, then apply a non-linear function(that called activation function) to produce better performance. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Neural Network Three Layer Input Layer – 𝑑 neural Hidden Layer 𝑓 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 =𝑦 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Three Layer Input Layer – 𝑑 neural Hidden Layer Output Layer – 𝑞 neural The form of neural network has three or more layers, input, hidden and output. Input layer must be composed 𝑑 neural and output layer must be composed 𝑞 neural. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Neural Network Three Layer Input Layer – 𝑑 neural Hidden Layer 𝑓 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 =𝑦 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Three Layer Input Layer – 𝑑 neural Hidden Layer Output Layer – 𝑞 neural How to training Backpropagation (Differential) 𝑤 𝑖𝑘  𝑤 𝑖𝑘 −𝜎 𝜗 𝐸 𝑟𝑟𝑜𝑟 𝜗 𝑤 𝑖𝑘 A typical training method uses a backpropagation method. That use differential and change the variable in the direction that the error decreases. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Neural Network 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] 𝑓 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 =𝑦 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] There is example, if the new instance is given. If the new instance is coming, import the neural network. And find the label that highest probability and select highest thing. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Applications Final, the classification is used in many problems, for example, text categorization or image classification. The text categorization is to judge what a kind of document, likes spam filter. The image classification is what is the object in images, in the figure, the classifier finds using feature in image. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

Advanced Artificial Intelligence Classification Chung-Ang University, Hae-Cheon Kim QnA

After this slice, not used in presentation, just temptation slices.

Naïve Bayes 𝑥= 𝐹, 𝐹, 𝑇 If 𝑦=𝑇: = 2 5 ⋅ 1 2 ⋅ 1 2 ⋅ 1 2 = 1 20 If 𝑦=𝐹: arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] 𝑥= 𝐹, 𝐹, 𝑇 If 𝑦=𝑇: 𝑃 𝑐𝑙𝑎𝑠𝑠=𝑇 ⋅𝑃 𝑊=𝐹 𝑐𝑙𝑎𝑠𝑠=𝑇 ⋅ 𝑃 𝑋=𝐹 𝑐𝑙𝑎𝑠𝑠=𝑇 ⋅𝑃 𝑌=𝑇 𝑐𝑙𝑎𝑠𝑠=𝑇 = 2 5 ⋅ 1 2 ⋅ 1 2 ⋅ 1 2 = 1 20 If 𝑦=𝐹: 𝑃 𝑐𝑙𝑎𝑠𝑠=𝐹 ⋅𝑃 𝑊=𝐹 𝑐𝑙𝑎𝑠𝑠=𝐹 ⋅ 𝑃 𝑋=𝐹 𝑐𝑙𝑎𝑠𝑠=𝐹 ⋅𝑃 𝑌=𝑇 𝑐𝑙𝑎𝑠𝑠=𝐹 = 3 5 ⋅ 1 3 ⋅ 2 3 ⋅ 2 3 = 4 45 1 20 < 4 45  Class F Let’s see the example. When the table is a dataset, what is the label of 𝑥= 𝐹, 𝐹, 𝑇 ? Using the approximate equation, we can estimate the posterior probability for each class. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim