Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome everyone. Been to good sessions, exciting ones coming up.

Similar presentations


Presentation on theme: "Welcome everyone. Been to good sessions, exciting ones coming up."— Presentation transcript:

1 Welcome everyone. Been to good sessions, exciting ones coming up. My first SQL Bits session. Introduction to me.

2 Machine Learning The Maths Behind My introduction to ML.
Data analytics world getting more interested in ML. Noticed a trend. Black box A shame Need to understand to effectively use. More frustration. Missing fun. Things I want to put right today. To show how quick, 3 algorithms. 3 popular, used before, use again.

3 Quick overview – classic classification problem.
Variables & class.

4 Classification Supervised – we know if it’s dog or cat. height weight

5 Decision Tree Classification Widely used, popular, easy to understand.

6 Would I have survived?

7 All passengers 500 : 809 Died

8 All passengers 500 : 809 Men 142 : 640 Women 308 : 112 Children
50 : 57 Survived Died Died

9 All passengers 500 : 809 Men 142 : 640 Women 308 : 112 Children
50 : 57 1st Class 55 : 114 2nd Class 22 : 138 3rd Class 65 : 388 1st Class 124 : 4 2nd Class 85 : 12 3rd Class 99 : 96 1st Class 21 : 5 2nd Class 12 : 8 3rd Class 17 : 44 Died Died Died Survived Survived Survived Survived Survived Died

10 14% All passengers 500 : 809 Men 142 : 640 Women 308 : 112 Children
50 : 57 1st Class 55 : 114 2nd Class 22 : 138 3rd Class 65 : 388 1st Class 124 : 4 2nd Class 85 : 12 3rd Class 99 : 96 1st Class 21 : 5 2nd Class 12 : 8 3rd Class 17 : 44 Died Died Died Survived Survived Survived Survived Survived Died 14%

11 Classification Demo

12 Which variable do we split on? The best variable
pclass age name cabin fare boat body Which variable do we split on? The best variable parch ticket survived embarked home.dest sex sibsp

13 The best variable pclass age name cabin fare boat body parch ticket
survived embarked home.dest sex sibsp

14 𝐺𝑖𝑛𝑖=1− 𝑖=1 𝐶 ( 𝑝 𝑖 ) 2 The Gini Index
𝐺𝑖𝑛𝑖=1− 𝑖=1 𝐶 ( 𝑝 𝑖 ) 2 Gini index originally used to calculate income inequality. Derived 1912 – present. Low Gini index is desired.

15 𝑔 𝑃 𝑠 = 1 − − ≈0.4721 𝑔 𝑃 𝑚 = 1 − − ≈0.3090 𝑔 𝑃 𝑓 = 1 − − ≈0.3965 𝐺= − × − × ≈0.132

16 sex = 0.132 sex = male sex = female fare pclass sibsp parch age embarked age pclass fare parch embarked sibsp pclass fare sibsp embarked parch age

17 pclass Continue recursively, until sub group reaches min size, or no improvement. Greedy. Cross validation performed to trim.

18 Used a lot in data analytics.
Similar to classification.

19 Regression profit time

20 Regression Support Vector SVR similar to SVM, getting more popular.
Works well with non-linear.

21 How much is my car worth?

22 𝑟= (𝑥− 𝑥 )(𝑦− 𝑦 ) (𝑥− 𝑥 ) 2 (𝑦− 𝑦 ) 2 𝑎= 𝑦 - b 𝑥
15,000- 𝑦=𝑎+𝑏𝑥 𝑟= (𝑥− 𝑥 )(𝑦− 𝑦 ) (𝑥− 𝑥 ) (𝑦− 𝑦 ) 2 𝑎= 𝑦 - b 𝑥 𝑏=𝑟 𝜎 𝑦 𝜎 𝑥 price (€) Uses a Kernel function, this case Radial Basis function Kernel. 0 - | 1980 | 2015 year

23 10,000- 15,000- price (€) 3,000 - 0 - | 2003 | 1980 | 2011 | 2015 year

24 15,000- price (€) 0 - | 1980 | 2015 year

25 price (€) year 15,000- 𝑓 𝑥 + 𝜀 𝑓 𝑥 − 𝜀 0 - | 1980 | 2015
Outside margin points penalised. Principle of maximal margin. Important to prevent overfitting. 0 - | 1980 | 2015 year

26

27 Regression Demo

28 200,000- price (€) 0 - | 1980 | 2015 year

29 A third dimension …and so on forth third fifth sixth

30 Netflix clustering. Similar to HBO.

31 Clustering

32 K-Means Clustering K-Means – most popular, released decades ago, improvements.

33 What new music can I discover?

34

35 Euclidean distance= ( 𝑥 2 − 𝑥 1 ) 2 + ( 𝑦 2 − 𝑦 1 ) 2

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55 Clustering Demo Better algorithms for problem: Gaussian mixture models, hierarchical clustering.

56 We’ve gone through 3 algorithms in 1 hour.
List of good sources…

57 github.com/matt-willis
alex.smola.org/papers/2004/SmoSch04.pdf data-flair.training/blogs cowlet.org github.com/matt-willis hackernoon.com r4ds.had.co.nz blogs.adatis.co.uk towardsdatascience.com archive.ics.uci.edu/ml/datasets.html elitedatascience.com trevorstephens.com kernelsvm.tripod.com wiki.icub.org/images/8/82/OnlineSVR_Thesis.pdf

58 matt.willis@adatis.bg Email me. Speak to me.
I hope you’ve learnt something. More importantly, changed your approach.


Download ppt "Welcome everyone. Been to good sessions, exciting ones coming up."

Similar presentations


Ads by Google