Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction Machine Learning febr. 10.. Machine Learning How can we design a computer system whose performance improves by learning from experience?

Similar presentations


Presentation on theme: "Introduction Machine Learning febr. 10.. Machine Learning How can we design a computer system whose performance improves by learning from experience?"— Presentation transcript:

1 Introduction Machine Learning febr. 10.

2 Machine Learning How can we design a computer system whose performance improves by learning from experience?

3 Exams Oral exam Task solving exam ML software project

4 Spam filtering

5 Face/person recognition demo

6 Recommendation systems

7 Robotics

8 Natural Language Processing

9 Pattern Classification, Chapter 1 other application areas –Biometrics –Object recognition on images –DNA seqencing –Financial data mining/prediction –Process mining and optimisation

10 Big Data

11 Rule-based systems vs. Machine learning Domain expert is needed for –writing rules OR –giving training sample Which one is better? –Can the expert design rule-based systems? –Is the problem specific or general? 11

12 egyre több alkalmazásban van jelen –„úszunk az adatban, miközben szomjazunk az információra” –technológiai fejlettség és elterjedtség –igény az egyre nagyobb fokú automatizálásra és perszonalizációra Vannak megoldott problémák, de számos nyitott kutatási kérdés is! Gépi tanulás jelen és jövő 12

13 http://www.ml-class.org/course

14 Definition Machine Learning (Mitchell): „a computer program said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” 14

15

16 Most of the materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

17 Pattern Classification, Chapter 1 17 Example Classify fishes see bass Classes salmon Goal: to learn a modell from training data which can categorise fishes (eg. salmons are shorter)

18 Pattern Classification, Chapter 1 18 –Supervised learning: Based on training examples (E), learn a modell which works fine on previously unseen examples. –Classification: a supervised learning task of categorisation of entities into predefined set of classes Classification(T)

19 Pattern Classification, Chapter 1 19

20 Basic definitions IDLength (cm)LightnessType 1280.5salmon 2230.7salmon 3170.5sea bass Instance (or entity, sample) Feature (or attribute) Class label

21 Pattern Classification, Chapter 1 21 –Image processing steps E.g segmentation of fish contour and background –Feature extraction Extraction of features/attributes from images which are atomic variables Typically numerical or categorical Example - Preprocessing

22 Pattern Classification, Chapter 1 22 length lightness width number of paddles position of mouth Example features

23 Pattern Classification, Chapter 1 23 Length is a weak discriminator of fish types.

24 Pattern Classification, Chapter 1 24 Lightness is better

25 Pattern Classification, Chapter 1 25 –false positive/negative errors –E.g. if the threshold is decreased the number of sea basses falsly classified to salmon decreases Decision theory Performance evaluation (P)

26 Pattern Classification, Chapter 1 26 A vector of features describing a particular instance. InstanceA x T = [x 1, x 2 ] Lightness Width Feature vector

27 Pattern Classification, Chapter 1 27

28 Pattern Classification, Chapter 1 28 Be careful by adding to many features –noisy features (eg. measurement errors) –Unnecessary (pl. information content is similar to other feature) We need features which might have discriminative power. Feature set engineering is highly task- specific! Feature space

29 Pattern Classification, Chapter 1 29 This is not ideal. Remember supervised learning principle!

30 Pattern Classification, Chapter 1 30

31 Pattern Classification, Chapter 1 31 Number of features? Complexity of the task? Classifier speed? Task and data-dependent! Modell selection

32 Pattern Classification, Chapter 1 32 The machine learning lifecycle Data preparation Feature engineering Modell selection Modell training Performance evaluation

33 Pattern Classification, Chapter 1 33 Do we know whether we collected enough and representative sample for training a system? Data preparation

34 Pattern Classification, Chapter 1 34 –These topics are the foci of this course –Investigate the data for modell selection! No free lunch! Modell selection and training

35 Pattern Classification, Chapter 1 35 There are various evaluation metrics Simulation of supervised learning: 1.split your data into two parts 2.train your modell on the training set 3.predict and evaluate your modell on the test set (unknow during training) Performance evaluation

36 Topics of the course Classification Regression Clustering Recommendation systems Learning to rank Structure prediction Reinforcement learning 36

37 Probability theory retro

38 38 Probability (atomic) events (A) and probability space (  ) Axioms: - 0 ≤ P(A) ≤ 1 - P(  )=1 - If A 1, A 2, … mutually exclusive events (A i ∩A j = , i  j) then P(  k A k ) =  k P(A k )

39 39 - P(Ø) = 0 - P( ¬ A)=1-P(A) - P(A  B)=P(A)+P(B) – P(A∩B) - P(A) = P(A ∩ B)+P(A ∩ ¬ B) -If A  B, then P(A) ≤ P(B) and P(B-A) = P(B) – P(A)

40 40 Conditional probability Conditional probability is the probability of some event A, given the occurrence of some other event B. P(A|B) = P(A∩B)/P(B) Chain rule: P(A∩B) = P(A|B)·P(B) Example: A: headache, B: influenza P(A) = 1/10, P(B) = 1/40, P(A|B)=?

41 Conditional probability

42 42 Independence of events A and B are independent iff P(A|B) = P(A) Corollary: P(AB) = P(A)P(B) P(B|A) = P(B)

43 43 Product rule A 1, A 2, …, A n arbitrary events P(A 1 A 2 …A n ) = P(A n |A 1 …A n-1 ) P(A n-1 |A 1 …A n-2 )…P(A 2 | A 1 )P(A 1 ) If A 1, A 2, …, A n events form a complete probability space and P(A i ) > 0 for each i, then P(B) = ∑ j=1 n P(B | A i )P(A i )

44 44 Bayes rule P(A|B) = P(A∩B)/P(B) = P(B|A)P(A)/P(B)

45 45 Random variable ξ:  → R Random variable vectors…

46 46 cumulative distribution function (CDF), F(x) = P(  < x) F(x 1 ) ≤ F(x 2 ), if x 1 < x 2 lim x→-∞ F(x) = 0, lim x→∞ F(x) = 1 F(x) is non-decreasing and right- continuous

47 47 Discrete vs continous random variables Discrete: its value set forms a finite of infinate series Continous: we assume that f(x) is valid on the (a, b) interval

48 Probability density functions (pdf) F(b) - F(a) = P(a <  < b) = a ∫ b f(x)dx f(x) = F ’(x) és F(x) =.-∞ ∫ x f(t)dt

49 49 Empirical estimation of a density Histogram

50 50 Independence of random variables  and  are independent, iff any a ≤ b, c ≤ d P(a ≤  ≤ b, c ≤  ≤ d) = P(a ≤  ≤ b) P(c ≤  ≤ d).

51 51 Composition of random variables Discrete case:  =  +  iff  and  are independent r n = P(  = n) =  k=-   P(  = n - k,  = k)

52 52 Expected value  can take values x 1, x 2, … with p 1, p 2, … probability then M(  ) =  i x i p i continous case: M(  ) = -∞ ∫  xf(x)dx

53 53 Properties of expected value M(c  ) = cM(  ) M(  +  ) = M(  ) + M(  ) If  and  are independent random variables, then M(  ) = M(  )M(  )

54 54 Standard deviation D(  ) = (M[(  - M(  )) 2 ]) 1/2 D 2 (  ) = M(  2 ) – M 2 (  )

55 55 Properties of standard deviation -D 2 (a  + b) = a 2 D 2 (  ) -if  1,  2, …,  n are independent random variables then D 2 (  1 +  2 + … +  n ) = D 2 (  1 ) + D 2 (  2 ) + … + D 2 (  n )

56 56 Correlation Covariance: c = M[(  - M(  ))(  - M(  ))] c is 0 if  and  are independent Correlation coefficient: r = c / ((D(  )D(  )), normalised covariance into [-1,1]

57 57 Well-known distributions Binomial:  ~ B(n,p) M(  ) = np D(  ) = np(1-p) Normal/Gauss


Download ppt "Introduction Machine Learning febr. 10.. Machine Learning How can we design a computer system whose performance improves by learning from experience?"

Similar presentations


Ads by Google