What is machine learning? 1. A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Machine Learning & Bioinformatics Tien-Hao Chang (Darby Chang) Machine Learning & Bioinformatics 1.
Learning in ECE 156A,B A Brief Summary Li-C. Wang, ECE, UCSB.
Chapter 5: Introduction to Information Retrieval
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Support Vector Machines and Margins
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Indian Statistical Institute Kolkata
Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.
Classification and Decision Boundaries
ECE 8527 Homework Final: Common Evaluations By Andrew Powell.
Machine Learning Case study. What is ML ?  The goal of machine learning is to build computer systems that can adapt and learn from their experience.”
Learning from Observations Chapter 18 Section 1 – 4.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Introduction to databases from a bioinformatics perspective Misha Taylor.
Presented by Zeehasham Rasheed
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Non-intrusive Energy Disaggregation using Signal Unmixing Undergraduate: Philip Wolfe Mentors: Alireza Rahimpour, Yang Song Professor: Dr. Hairong Qi Final.
Supervised Learning and k Nearest Neighbors Business Intelligence for Managers.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
An Example of Course Project Face Identification.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
A Language Independent Method for Question Classification COLING 2004.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Classification Techniques: Bayesian Classification
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Algorithms & FlowchartsLecture 10. Algorithm’s CONCEPT.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
CSC 221: Computer Programming I Spring 2010
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CSC 221: Computer Programming I Fall 2005
CSSE463: Image Recognition Day 11
Machine Learning Week 1.
Classification Techniques: Bayesian Classification
CSSE463: Image Recognition Day 20
CSSE463: Image Recognition Day 11
Nearest-Neighbor Classifiers
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
COSC 4335: Other Classification Techniques
iSRD Spam Review Detection with Imbalanced Data Distributions
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Leave your phone , ipod/ipad and anything else that is likely to distract you in another room. You CAN live without them! Test yourself on your knowledge.
Model generalization Brief summary of methods
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
Type Topic in here! Created by Educational Technology Network
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Implementation of Learning Systems
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Austin Karingada, Jacob Handy, Adviser : Dr
Presentation transcript:

What is machine learning? 1

A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting among its k nearest neighbors 2 O X X O O X O ? X X O O X X O

3 When k = 3 O X X O O X O O X X O

4 When k = 5 O X X O O X O X X X O O X X O

Although KNN is very trivial, it can Example: in vitro fertilization –Given: embryos described by 60 features –Problem: selection of embryos that will survive –Data: historical records of embryos and outcome Given a set of known instances Predict outcome for newly coming instances So, KNN learnt something related to “the definition of embryo goodness” 5

6 Can machines really learn? Notice that here we call KNN a machine Definitions of “learning” from dictionary: –To get knowledge of by study, experience, or being taught –To become aware by information or from observation –To commit to memory –To be informed of, ascertain; to receive instruction Operational definition: –Things learn when they change their behavior in a way that makes them perform better in the future Difficult to measure Trivial for computers Does a slipper learn?

7 Shortly speaking, machine learning is Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information

8 Furthermore, learning is Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information When training data increases It delivers better (e.g. higher accuracy) outcome

9 Usually, we don’t invent the wheel Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information Convert data (e.g. embryos) to vector is not trivial

Feature 10

Data representation Format (for LIBSVM) –1 1: :0.25 3: : –though this is for LIBSVM, a famous implementation of support vector machine (SVM), all other machine learning tools share the same concept Label is also the answer or class of an sample Feature is also called features or feature vector 11 LabelFeature

Label and feature Label is defined by the experts –usually biologists in bioinformatics Data representation is also called feature encoding or feature extraction –you may not know which feature is important –you may not have the key feature –you need to know the domain knowledge to design good features –if you don’t design new algorithms (most researchers don’t), the only thing you can do is to design new features 12

Evaluation 13

Evaluation issues Recall that in KNN algorithm, the predicted classes of query samples require comparing the query samples to a collection of reference samples whose classes are known This collection is called training set and these reference samples are called training samples When evaluating, we need to know the classes of the query samples so that we can compare the answers and the predictions These query samples with known classes are called testing set or testing samples 14

The answer of query is needless theatrically 15 Actually, it should not exist or we don’t need to predict. However, we always need to evaluate our methods/features, and thus we always have the answer of the testing set in this course.

Sample arrangement How to split n samples whose classes are known into training and testing sets? It’s getting worse if the algorithm has parameters –is KNN a method? –are 3NN and 5NN different methods? 16