Download presentation
Presentation is loading. Please wait.
1
Theme Introduction : Learning from Data
Dr Gavin Brown Machine Learning and Optimization Research Group
2
Learning from Data Where does all this fit? Learning from Data
Artificial Intelligence Statistics / Mathematics Data Mining Learning from Data Computer Vision Robotics (No definition of a field is perfect – the diagram above is just one interpretation, mine ;-)
3
Learning from Data The world is drowning in data.
Book sales : Amazon makes 250,000 sales/deliveries per day Genetics : 100,000 genes sequenced while-u-wait (almost) Search : ~10 billion Google Images / 48hrs per min uploaded to YouTube Health records : NHS plan to have 60m electronic records in place by 2015 This theme studies algorithms that enable us to extract meaning from data.
4
Learning from Data Data is recorded from some real-world phenomenon.
What might we want to do with that data? Prediction - what can we predict about this phenomenon? Description - how can we describe/understand this phenomenon in a new way?
5
Prediction Description
Period 1 Oct/Nov Period 2 Nov/Dec COMP61011 Foundations of Machine Learning COMP61021 Modeling & Visualization of High Dimensional Data Prediction Description Lecturer: Dr Gavin Brown
6
Machine Learning and Data Mining
Spam s How can we predict if something is spam/genuine?
7
Machine Learning and Data Mining
Medical Records / Novel Drugs What characteristics of a patient indicate they may react well/badly to a new drug? How can we predict whether it will potentially hurt rather then help them?
8
Building “Models” of the Data
HISTORICAL HEALTH RECORDS x x2 Label Learning Algorithm Predicted Health Status x x2 85.2, 160.3 1 (healthy) Model
9
Building “Models” of the Data
(Week 1, 9am) (Weeks 3-4)
10
Prediction Description
Period 1 Oct/Nov Period 2 Nov/Dec COMP61011 Foundations of Machine Learning COMP61021 Modeling & Visualization of High Dimensional Data Prediction Description Lecturer: Dr Ke Chen
11
Modeling and Visualization of High Dimensional Data
Gene Maps The human body has about 24,000 active genes – soon you will be able to buy your own gene map for a few hundred pounds. How can we visualize this?
12
Modeling and Visualization of High Dimensional Data
Image processing Gesture recognition – how can we represent the motion of a human with so many complex joints and angles?
13
Pre-requisite knowledge
(week 1, 9am) • Vectors • Matrix properties, e.g. determinant, rank, inverse • Vector Space properties, e.g. orthonormal basis • Eigenvectors and Eigenvalues • Matrix Calculus, e.g. derivatives in matrix form • Optimisation basics, e.g. Lagrange multipliers
14
Learning from Data ….. Prerequisites
MATHEMATICS This is a mathematical subject. You must be comfortable with probabilities and algebra. PROGRAMMING You must be able to program, and pick up a new language relatively easily. We provide support for Matlab.
15
Matlab MATrix LABoratory Interactive scripting language
Interpreted (i.e. no compiling) Objects possible, not compulsory Dynamically typed Flexible GUI / plotting framework Large libraries of tools Highly optimized for maths Available free from Uni, but usable only when connected to our network (e.g. via VPN) Module-specific software supported on school machines only.
16
Learning from Data ….. Why NOT to do this!
If you don’t like maths. 61011 is reasonably challenging is HARD. Another valid name for machine learning is “Computational Statistics”. If you are not a confident programmer. This is an MSc in computer science. You HAVE to be able to code well. You are highly likely to fail this unit if you cannot. People did last year. If you have the “I want to use machine learning to do X” syndrome This is a real technical subject. It’s not magic. BTW… You will learn nothing about “Big Data”, or how to deal with it
17
Syllabus COMP61011 (Foundations of Machine Learning)
Linear Models Support Vector Machines Nearest Neighbour Methods Decision Trees Combining Models - ensemble methods, mixtures of experts, boosting Feature Selection Probabilistic Classifiers and Bayes Theorem Algorithm assessment - overfitting, generalisation, comparing two algorithms COMP61021 (Modeling and Visualizing High Dimensional Data) Background/introduction Mathematics Basics Principal component analysis (PCA) Linear discriminative analysis (LDA) Self-organising map (SOM) Multi-dimensional scaling (MDS) Isometric feature mapping (ISOMAP) Locally linear embedding (LLE)
18
Textbooks Not compulsory purchase. Notes will be provided in class.
“Introduction to Machine Learning” By Ethem Alpaydin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.