Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico

Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico terran@cs.unm.edu 20 15

Machine learning is the study of algorithms or systems that improve their performance in response to experience.

The core ML problem The World

The core ML problem The World - Network - CPU - Program memory footprint - User activity - Multi-process performance

The core ML problem The World Sensor s

The core ML problem The World Sensor s - Latency; bandwidth - Branches taken; cache misses - Memory allocs; object age - Keystroke rates; recent commands - Process throughput; cache activity; synch delays

The core ML problem The World Sensor s X

The core ML problem The World Sensor s Model f(X) X prediction

The core ML problem The World Sensor s Model f(X) X - Compression/redundancy rates - Branch prediction - Object lifetime - Legitimate/hostile - Normal/abnormal

The core ML problem The World Sensor s Model f(X) X ŷ

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure L (ŷ) assessment

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure L (ŷ,y) assessment y

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure L (ŷ,y) assessment y - accuracy (0/1 loss) - squared error - time-to-response

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment control

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment response

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment L (ŷ,X’)

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment L (ŷ,X’) - Correctness - Stability - Robustness - Total system performance (throughput, latency, etc.)

The core ML problem The World Sensor s Model f(X) X Performance measure assessment

The core ML problem The World Sensor s Model f(X) X Performance measure assessment - ??? - Do you like the model? - Does it make sense? - Does it make you feel warm and fuzzy?

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment The ML job: find this...

The core ML problem The World Sensor s Model f(X) X ŷ Performance measure assessment The ML job: find this...... so that this is as good as possible.

Types of learning Supervised Reinforcement learning Unsupervised Special cases: Semi-supervised Anomaly detection Behavioral cloning etc...

Supervised Learning Characteristics: Measure features/sensor values ⇒ X Want to predict system “output”, y Have some source of example (X,y) pairs System, human-labeling, etc. Have a well-defined performance criterion

Example sup. learners Discriminative: only produces classifier Decision tree: fast; comprehensible models Support vector machine: high dim data; accurate Nearest-neighbor / k-nn: low-dim data; slow Neural net: special case of SVM Generative: produces complete probability model Naive Bayes: very simple; surprisingly accurate Bayesian network: powerful; descriptive; accurate Markov random field: closely related to BNs Meta-learners/ensemble methods: sets of models Boosting Bagging Winnow

Key assumption #1 The train/test data reflect the same data distribution that will be experienced when the learned model is embedded in performance system. System not changing over time Model doesn’t affect behavior of system

Key assumption #2 All data points are statistically independent. No linkage between “adjacent”/“successive” points No other process that is affecting data generation

Reinforcement learning Characteristics: Measure features of system ⇒ X Want to control sys. -- model outputs are “knobs” Can interact with system/simulation Have performance measure that recognizes “good” system behavior Don’t need to know “correct” control actions

Key criterion Are the sensor readings enough to completely characterize state of the system? Knowing X tells you everything relevant Yes: “Fully observable” Learning optimal performance fairly tractable (*) No (multiple system states produce same X ): “Partially observable” Learning barely satisfactory performance incredibly difficult (PSPACE-complete. Or worse.)

RL: The good news It does everything that traditional control doesn’t! Stochasticity ok Don’t need a model Don’t need linearity Discrete time ok No messy ODEs or z transforms! Delay ok

RL: The bad news Low dimensions Discrete variables/features Need to know state space Convergence can be slow Glacial Optimal control can be intractable

Example RL Fully observable systems Q-learning SARSA Dyna E 3 Partially observable Reinforce Utile distinction memories Policy gradient methods

Key difference #1 Unlike supervised learning... Distinct data points can be temporally correlated. Key parameter: how much history is necessary to characterize the system? Markov order 1 time unit? 2? All of them?

Key difference #2 Unlike supervised learning... Model is expected to influence behavior of system It’s a good thing...

References (partial) General: Mitchell, Machine Learning, McGraw-Hill, 1997. Duda, Hart, & Stork, Pattern Classification, Wiley, 2001. Hastie, Tibshirani, & Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001. Software (general; mostly supervised): Weka: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/

References (partial) Decision trees: Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, 1993. Brieman, Classification & Regression Trees (CART), Wadsworth, 1983. Support vector machines: Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, 2(2), 1998. Software: SVMlight http://svmlight.joachims.org/

References (partial) Reinforcement learning Sutton & Barto, Reinforcment Learning: An Introduction, MIT Press, 1998. Kaelbling, Littman, & Moore, “Reinforcement Learning: A Survey”, Journal of Artificial Intelligence Research, 4, 1996. Kaelbling, Littman, & Cassandra, “Planning and Acting in Partially Observable Stochastic Domains”, Artificial Intelligence, 101,1998.

Thank you! Questions?

ML keywords Learning Adaptive Self-tuning State estimation Parameter estimation Data mining Computational statistics Predictive modeling Pattern recognition etc...

The Learning Loop The World Sensor s Model f(X) X ŷ Performance measure L (ŷ,y) assessment y Generate “training” data Learning module f(X) Performance measure

The training process Gather large set of “training data” D train =[ (X 1,y 1 ), (X 2,y 2 ),..., (X n,y n ) ] Also large set of “testing” (eval; holdout) data D eval =[ (X 1,y 1 ),..., (X m,y m ) ] Apply learner to train to get model f() = learn(D train, L ) Evaluate results on test set [ ŷ test ] = f(X test ) assessment = L (ŷ test,y test )

Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico

Similar presentations

Presentation on theme: "Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico

Similar presentations

Presentation on theme: "Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of Computer Science University of New Mexico"— Presentation transcript:

Similar presentations

About project

Feedback