Download presentation
Presentation is loading. Please wait.
Published byBjörn Beltz Modified over 5 years ago
1
Welcome everyone. Been to good sessions, exciting ones coming up. My first SQL Bits session. Introduction to me.
2
Machine Learning The Maths Behind My introduction to ML.
Data analytics world getting more interested in ML. Noticed a trend. Black box A shame Need to understand to effectively use. More frustration. Missing fun. Things I want to put right today. To show how quick, 3 algorithms. 3 popular, used before, use again.
3
Quick overview – classic classification problem.
Variables & class.
4
Classification Supervised – we know if it’s dog or cat. height weight
5
Decision Tree Classification Widely used, popular, easy to understand.
6
Would I have survived?
7
All passengers 500 : 809 Died
8
All passengers 500 : 809 Men 142 : 640 Women 308 : 112 Children
50 : 57 Survived Died Died
9
All passengers 500 : 809 Men 142 : 640 Women 308 : 112 Children
50 : 57 1st Class 55 : 114 2nd Class 22 : 138 3rd Class 65 : 388 1st Class 124 : 4 2nd Class 85 : 12 3rd Class 99 : 96 1st Class 21 : 5 2nd Class 12 : 8 3rd Class 17 : 44 Died Died Died Survived Survived Survived Survived Survived Died
10
14% All passengers 500 : 809 Men 142 : 640 Women 308 : 112 Children
50 : 57 1st Class 55 : 114 2nd Class 22 : 138 3rd Class 65 : 388 1st Class 124 : 4 2nd Class 85 : 12 3rd Class 99 : 96 1st Class 21 : 5 2nd Class 12 : 8 3rd Class 17 : 44 Died Died Died Survived Survived Survived Survived Survived Died 14%
11
Classification Demo
12
Which variable do we split on? The best variable
pclass age name cabin fare boat body Which variable do we split on? The best variable parch ticket survived embarked home.dest sex sibsp
13
The best variable pclass age name cabin fare boat body parch ticket
survived embarked home.dest sex sibsp
14
𝐺𝑖𝑛𝑖=1− 𝑖=1 𝐶 ( 𝑝 𝑖 ) 2 The Gini Index
𝐺𝑖𝑛𝑖=1− 𝑖=1 𝐶 ( 𝑝 𝑖 ) 2 Gini index originally used to calculate income inequality. Derived 1912 – present. Low Gini index is desired.
15
𝑔 𝑃 𝑠 = 1 − − ≈0.4721 𝑔 𝑃 𝑚 = 1 − − ≈0.3090 𝑔 𝑃 𝑓 = 1 − − ≈0.3965 𝐺= − × − × ≈0.132
16
sex = 0.132 sex = male sex = female fare pclass sibsp parch age embarked age pclass fare parch embarked sibsp pclass fare sibsp embarked parch age
17
pclass Continue recursively, until sub group reaches min size, or no improvement. Greedy. Cross validation performed to trim.
18
Used a lot in data analytics.
Similar to classification.
19
Regression profit time
20
Regression Support Vector SVR similar to SVM, getting more popular.
Works well with non-linear.
21
How much is my car worth?
22
𝑟= (𝑥− 𝑥 )(𝑦− 𝑦 ) (𝑥− 𝑥 ) 2 (𝑦− 𝑦 ) 2 𝑎= 𝑦 - b 𝑥
15,000- 𝑦=𝑎+𝑏𝑥 𝑟= (𝑥− 𝑥 )(𝑦− 𝑦 ) (𝑥− 𝑥 ) (𝑦− 𝑦 ) 2 𝑎= 𝑦 - b 𝑥 𝑏=𝑟 𝜎 𝑦 𝜎 𝑥 price (€) Uses a Kernel function, this case Radial Basis function Kernel. 0 - | 1980 | 2015 year
23
10,000- 15,000- price (€) 3,000 - 0 - | 2003 | 1980 | 2011 | 2015 year
24
15,000- price (€) 0 - | 1980 | 2015 year
25
price (€) year 15,000- 𝑓 𝑥 + 𝜀 𝑓 𝑥 − 𝜀 0 - | 1980 | 2015
Outside margin points penalised. Principle of maximal margin. Important to prevent overfitting. 0 - | 1980 | 2015 year
27
Regression Demo
28
200,000- price (€) 0 - | 1980 | 2015 year
29
A third dimension …and so on forth third fifth sixth
30
Netflix clustering. Similar to HBO.
31
Clustering
32
K-Means Clustering K-Means – most popular, released decades ago, improvements.
33
What new music can I discover?
35
Euclidean distance= ( 𝑥 2 − 𝑥 1 ) 2 + ( 𝑦 2 − 𝑦 1 ) 2
55
Clustering Demo Better algorithms for problem: Gaussian mixture models, hierarchical clustering.
56
We’ve gone through 3 algorithms in 1 hour.
List of good sources…
57
github.com/matt-willis
alex.smola.org/papers/2004/SmoSch04.pdf data-flair.training/blogs cowlet.org github.com/matt-willis hackernoon.com r4ds.had.co.nz blogs.adatis.co.uk towardsdatascience.com archive.ics.uci.edu/ml/datasets.html elitedatascience.com trevorstephens.com kernelsvm.tripod.com wiki.icub.org/images/8/82/OnlineSVR_Thesis.pdf
58
matt.willis@adatis.bg Email me. Speak to me.
I hope you’ve learnt something. More importantly, changed your approach.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.