Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico

Similar presentations


Presentation on theme: "Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico"— Presentation transcript:

1 Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu 20 15

2 Machine learning is the study of algorithms or systems that improve their performance in response to experience.

3

4

5

6 The core ML problem The World

7 The core ML problem The World - Network - CPU - Program memory footprint - User activity - Multi-process performance

8 The core ML problem The World Sensor s

9 The core ML problem The World Sensor s - Latency; bandwidth - Branches taken; cache misses - Memory allocs; object age - Keystroke rates; recent commands - Process throughput; cache activity; synch delays

10 The core ML problem The World Sensor s X

11 The core ML problem The World Sensor s Model f(X) X prediction

12 The core ML problem The World Sensor s Model f(X) X - Compression/redundancy rates - Branch prediction - Object lifetime - Legitimate/hostile - Normal/abnormal

13 The core ML problem The World Sensor s Model f(X) X ŷ

14 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure L (ŷ) assessment

15 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure L (ŷ,y) assessment y

16 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure L (ŷ,y) assessment y - accuracy (0/1 loss) - squared error - time-to-response

17 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment control

18 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment response

19 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment L (ŷ,X’)

20 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment L (ŷ,X’) - Correctness - Stability - Robustness - Total system performance (throughput, latency, etc.)

21 The core ML problem The World Sensor s Model f(X) X Performance measure assessment

22 The core ML problem The World Sensor s Model f(X) X Performance measure assessment - ??? - Do you like the model? - Does it make sense? - Does it make you feel warm and fuzzy?

23 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment The ML job: find this...

24 The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment The ML job: find this...... so that this is as good as possible.

25 Types of learning Supervised Reinforcement learning Unsupervised Special cases: Semi-supervised Anomaly detection Behavioral cloning etc...

26 Supervised Learning Characteristics: Measure features/sensor values ⇒ X Want to predict system “output”, y Have some source of example (X,y) pairs System, human-labeling, etc. Have a well-defined performance criterion

27 Example sup. learners Discriminative: only produces classifier Decision tree: fast; comprehensible models Support vector machine: high dim data; accurate Nearest-neighbor / k-nn: low-dim data; slow Neural net: special case of SVM Generative: produces complete probability model Naive Bayes: very simple; surprisingly accurate Bayesian network: powerful; descriptive; accurate Markov random field: closely related to BNs Meta-learners/ensemble methods: sets of models Boosting Bagging Winnow

28 Key assumption #1 The train/test data reflect the same data distribution that will be experienced when the learned model is embedded in performance system. System not changing over time Model doesn’t affect behavior of system

29 Key assumption #2 All data points are statistically independent. No linkage between “adjacent”/“successive” points No other process that is affecting data generation

30 Reinforcement learning Characteristics: Measure features of system ⇒ X Want to control sys. -- model outputs are “knobs” Can interact with system/simulation Have performance measure that recognizes “good” system behavior Don’t need to know “correct” control actions

31 Key criterion Are the sensor readings enough to completely characterize state of the system? Knowing X tells you everything relevant Yes: “Fully observable” Learning optimal performance fairly tractable (*) No (multiple system states produce same X ): “Partially observable” Learning barely satisfactory performance incredibly difficult (PSPACE-complete. Or worse.)

32 RL: The good news It does everything that traditional control doesn’t! Stochasticity ok Don’t need a model Don’t need linearity Discrete time ok No messy ODEs or z transforms! Delay ok

33 RL: The bad news Low dimensions Discrete variables/features Need to know state space Convergence can be slow Glacial Optimal control can be intractable

34 Example RL Fully observable systems Q-learning SARSA Dyna E 3 Partially observable Reinforce Utile distinction memories Policy gradient methods

35 Key difference #1 Unlike supervised learning... Distinct data points can be temporally correlated. Key parameter: how much history is necessary to characterize the system? Markov order 1 time unit? 2? All of them?

36 Key difference #2 Unlike supervised learning... Model is expected to influence behavior of system It’s a good thing...

37 References (partial) General: Mitchell, Machine Learning, McGraw-Hill, 1997. Duda, Hart, & Stork, Pattern Classification, Wiley, 2001. Hastie, Tibshirani, & Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001. Software (general; mostly supervised): Weka: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/

38 References (partial) Decision trees: Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, 1993. Brieman, Classification & Regression Trees (CART), Wadsworth, 1983. Support vector machines: Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, 2(2), 1998. Software: SVMlight http://svmlight.joachims.org/

39 References (partial) Reinforcement learning Sutton & Barto, Reinforcment Learning: An Introduction, MIT Press, 1998. Kaelbling, Littman, & Moore, “Reinforcement Learning: A Survey”, Journal of Artificial Intelligence Research, 4, 1996. Kaelbling, Littman, & Cassandra, “Planning and Acting in Partially Observable Stochastic Domains”, Artificial Intelligence, 101,1998.

40 Thank you! Questions?

41 ML keywords Learning Adaptive Self-tuning State estimation Parameter estimation Data mining Computational statistics Predictive modeling Pattern recognition etc...

42 The Learning Loop The World Sensor s Model f(X) X ŷ Performance measure L (ŷ,y) assessment y Generate “training” data Learning module f(X) Performance measure

43 The training process Gather large set of “training data” D train =[ (X 1,y 1 ), (X 2,y 2 ),..., (X n,y n ) ] Also large set of “testing” (eval; holdout) data D eval =[ (X 1,y 1 ),..., (X m,y m ) ] Apply learner to train to get model f() = learn(D train, L ) Evaluate results on test set [ ŷ test ] = f(X test ) assessment = L (ŷ test,y test )


Download ppt "Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico"

Similar presentations


Ads by Google