Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS Fall 2011, Stuart Russell

Similar presentations


Presentation on theme: "CS Fall 2011, Stuart Russell"— Presentation transcript:

1 CS194-10 Fall 2011 Introduction to Machine Learning Machine Learning: An Overview

2 CS 194-10 Fall 2011, Stuart Russell
People Avital Steinitz 2nd year CS PhD student Stuart Russell 30th-year CS PhD student Mert Pilanci 2nd year EE PhD student Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

3 Administrative details
Web page Newsgroup Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

4 CS 194-10 Fall 2011, Stuart Russell
Course outline Overview of machine learning (today) Classical supervised learning Linear regression, perceptrons, neural nets, SVMs, decision trees, nearest neighbors, and all that A little bit of theory, a lot of applications Learning probabilistic models Probabilistic classifiers (logistic regression, etc.) Unsupervised learning, density estimation, EM Bayes net learning Time series models Dimensionality reduction Gaussian process models Language models Bandits and other exciting topics Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

5 CS 194-10 Fall 2011, Stuart Russell
Lecture outline Goal: Provide a framework for understanding all the detailed content to come, and why it matters Learning: why and how Supervised learning Classical: finding simple, accurate hypotheses Probabilistic: finding likely hypotheses Bayesian: updating belief in hypotheses Data and applications Expressiveness and cumulative learning CTBT Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

6 CS 194-10 Fall 2011, Stuart Russell
Learning is…. … a computational process for improving performance based on experience Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

7 CS 194-10 Fall 2011, Stuart Russell
Learning: Why? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

8 CS 194-10 Fall 2011, Stuart Russell
Learning: Why? The baby, assailed by eyes, ears, nose, skin, and entrails at once, feels it all as one great blooming, buzzing confusion … [William James, 1890] Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

9 CS 194-10 Fall 2011, Stuart Russell
Learning: Why? The baby, assailed by eyes, ears, nose, skin, and entrails at once, feels it all as one great blooming, buzzing confusion … [William James, 1890] Learning is essential for unknown environments, i.e., when the designer lacks omniscience Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

10 CS 194-10 Fall 2011, Stuart Russell
Learning: Why? Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain. Presumably the child brain is something like a notebook as one buys it from the stationer's. Rather little mechanism, and lots of blank sheets. [Alan Turing, 1950] Learning is useful as a system construction method, i.e., expose the system to reality rather than trying to write it down Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

11 CS 194-10 Fall 2011, Stuart Russell
Learning: How? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

12 CS 194-10 Fall 2011, Stuart Russell
Learning: How? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

13 CS 194-10 Fall 2011, Stuart Russell
Learning: How? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

14 CS 194-10 Fall 2011, Stuart Russell
Learning: How? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

15 Structure of a learning agent
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

16 Design of learning element
Key questions: What is the agent design that will implement the desired performance? Improve the performance of what piece of the agent system and how is that piece represented? What data are available relevant to that piece? (In particular, do we know the right answers?) What knowledge is already available? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

17 CS 194-10 Fall 2011, Stuart Russell
Examples Agent design Component Representation Feedback Knowledge Alpha-beta search Evaluation function Linear polynomial Win/loss Rules of game; Coefficient signs Logical planning agent Transition model (observable envt) Successor-state axioms Action outcomes Available actions; Argument types Utility-based patient monitor Physiology/sensor model Dynamic Bayesian network Observation sequences Gen physiology; Sensor design Satellite image pixel classifier Classifier (policy) Markov random field Partial labels Coastline; Continuity scales Supervised learning: correct answers for each training instance Reinforcement learning: reward sequence, no correct answers Unsupervised learning: “just make sense of the data” Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

18 CS 194-10 Fall 2011, Stuart Russell
Supervised learning 3/11/2017 3:36:17 PM To learn an unknown target function f Input: a training set of labeled examples (xj,yj) where yj = f(xj) E.g., xj is an image, f(xj) is the label “giraffe” E.g., xj is a seismic signal, f(xj) is the label “explosion” Output: hypothesis h that is “close” to f, i.e., predicts well on unseen examples (“test set”) Many possible hypothesis families for h Linear models, logistic regression, neural networks, decision trees, examples (nearest-neighbor), grammars, kernelized separators, etc etc Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

19 CS 194-10 Fall 2011, Stuart Russell
Supervised learning 3/11/2017 3:36:17 PM To learn an unknown target function f Input: a training set of labeled examples (xj,yj) where yj = f(xj) E.g., xj is an image, f(xj) is the label “giraffe” E.g., xj is a seismic signal, f(xj) is the label “explosion” Output: hypothesis h that is “close” to f, i.e., predicts well on unseen examples (“test set”) Many possible hypothesis families for h Linear models, logistic regression, neural networks, decision trees, examples (nearest-neighbor), grammars, kernelized separators, etc etc Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

20 CS 194-10 Fall 2011, Stuart Russell
Supervised learning 3/11/2017 3:36:17 PM To learn an unknown target function f Input: a training set of labeled examples (xj,yj) where yj = f(xj) E.g., xj is an image, f(xj) is the label “giraffe” E.g., xj is a seismic signal, f(xj) is the label “explosion” Output: hypothesis h that is “close” to f, i.e., predicts well on unseen examples (“test set”) Many possible hypothesis families for h Linear models, logistic regression, neural networks, decision trees, examples (nearest-neighbor), grammars, kernelized separators, etc etc Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

21 Example: object recognition
3/11/2017 3:36:17 PM x f(x) giraffe giraffe giraffe llama llama llama Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

22 Example: object recognition
3/11/2017 3:36:17 PM x f(x) giraffe giraffe giraffe llama llama llama X= f(x)=? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

23 Example: curve fitting
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

24 Example: curve fitting
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

25 Example: curve fitting
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

26 Example: curve fitting
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

27 Example: curve fitting
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

28 CS 194-10 Fall 2011, Stuart Russell
Basic questions Which hypothesis space H to choose? How to measure degree of fit? How to trade off degree of fit vs. complexity? “Ockham’s razor” How do we find a good h? How do we know if a good h will predict well? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

29 Philosophy of Science (Physics)
Which hypothesis space H to choose? Deterministic hypotheses, usually mathematical formulas and/or logical sentences; implicit relevance determination How to measure degree of fit? Ideally, h will be consistent with data How to trade off degree of fit vs. complexity? Theory must be correct up to “experimental error” How do we find a good h? Intuition, imagination, inspiration (invent new terms!!) How do we know if a good h will predict well? Hume’s Problem of Induction: most philosophers give up Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

30 Kolmogorov complexity (also MDL, MML)
Which hypothesis space H to choose? All Turing machines (or programs for a UTM) How to measure degree of fit? Fit is perfect (program has to output data exactly) How to trade off degree of fit vs. complexity? Minimize size of program How do we find a good h? Undecidable (unless we bound time complexity of h) How do we know if a good h will predict well? (recent theory borrowed from PAC learning) Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

31 Classical stats/ML: Minimize loss function
Which hypothesis space H to choose? E.g., linear combinations of features: hw(x) = wTx How to measure degree of fit? Loss function, e.g., squared error Σj (yj – wTx)2 How to trade off degree of fit vs. complexity? Regularization: complexity penalty, e.g., ||w||2 How do we find a good h? Optimization (closed-form, numerical); discrete search How do we know if a good h will predict well? Try it and see (cross-validation, bootstrap, etc.) Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

32 Probabilistic: Max. likelihood, max. a priori
Which hypothesis space H to choose? Probability model P(y | x,h) , e.g., Y ~ N(wTx,σ2) How to measure degree of fit? Data likelihood Πj P(yj | xj,h) How to trade off degree of fit vs. complexity? Regularization or prior: argmaxh P(h) Πj P(yj | xj,h) (MAP) How do we find a good h? Optimization (closed-form, numerical); discrete search How do we know if a good h will predict well? Empirical process theory (generalizes Chebyshev, CLT, PAC…); Key assumption is (i)id Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

33 Bayesian: Computing posterior over H
Which hypothesis space H to choose? All hypotheses with nonzero a priori probability How to measure degree of fit? Data probability, as for MLE/MAP How to trade off degree of fit vs. complexity? Use prior, as for MAP How do we find a good h? Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h) How do we know if a good h will predict well? Silly question! Bayesian prediction is optimal!! Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

34 Bayesian: Computing posterior over H
Which hypothesis space H to choose? All hypotheses with nonzero a priori probability How to measure degree of fit? Data probability, as for MLE/MAP How to trade off degree of fit vs. complexity? Use prior, as for MAP How do we find a good h? Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h) How do we know if a good h will predict well? Silly question! Bayesian prediction is optimal!! Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

35 Neon sculpture at Autonomy Corp.
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

36 CS 194-10 Fall 2011, Stuart Russell
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

37 CS 194-10 Fall 2011, Stuart Russell
Lots of data Web: estimated Google index 45 billion pages Clickstream data: TB/day Transaction data: 5-50 TB/day Satellite image feeds: ~1TB/day/satellite Sensor networks/arrays CERN Large Hadron Collider ~100 petabytes/day Biological data: 1-10TB/day/sequencer TV: 2TB/day/channel; YouTube 4TB/day uploaded Digitized telephony: ~100 petabytes/day Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

38 CS 194-10 Fall 2011, Stuart Russell
This is what an ICU looks like: ventilator, fluids, monitors; ~200 medical procedures per day, many potentially fatal Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

39 CS 194-10 Fall 2011, Stuart Russell
Real data are messy Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

40 Arterial blood pressure (high/low/mean) 1s
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

41 Application: satellite image analysis
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

42 Application: Discovering DNA motifs
...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA... Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

43 Application: Discovering DNA motifs
...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA... Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

44 CS 194-10 Fall 2011, Stuart Russell
Application: User website behavior from clickstream data (from P. Smyth, UCI) , -, 3/22/00, 10:35:11, W3SVC, SRVR1, , 781, 363, 875, 200, 0, GET, /top.html, -, , -, 3/22/00, 10:35:16, W3SVC, SRVR1, , 5288, 524, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 10:35:17, W3SVC, SRVR1, , 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 16:18:50, W3SVC, SRVR1, , 60, 425, 72, 304, 0, GET, /top.html, -, , -, 3/22/00, 16:18:58, W3SVC, SRVR1, , 8322, 527, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 16:18:59, W3SVC, SRVR1, , 0, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:54:37, W3SVC, SRVR1, , 140, 199, 875, 200, 0, GET, /top.html, -, , -, 3/22/00, 20:54:55, W3SVC, SRVR1, , 17766, 365, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 20:54:55, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:55:07, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:55:36, W3SVC, SRVR1, , 1061, 382, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 20:55:36, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:55:39, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:56:03, W3SVC, SRVR1, , 1081, 382, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 20:56:04, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:56:33, W3SVC, SRVR1, , 0, 262, 72, 304, 0, GET, /top.html, -, , -, 3/22/00, 20:56:52, W3SVC, SRVR1, , 19598, 382, 414, 200, 0, POST, /spt/main.html, -, User 1 2 3 2 2 3 3 3 1 1 1 3 1 3 3 3 3 User 2 3 3 3 1 1 1 User 3 7 7 7 7 7 7 7 7 User 4 1 5 1 1 1 5 1 5 1 1 1 1 1 1 User 5 5 1 1 5 Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

45 Application: social network analysis
HP Labs data 500 users, 20k connections evolving over time Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

46 Application: spam filtering
200 billion spam messages sent per day Asymmetric cost of false positive/false negative Weak label: discarded without reading Strong label (“this is spam”) hard to come by Standard iid assumption violated: spammers alter spam generators to evade or subvert spam filters (“adversarial learning” task) Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

47 CS 194-10 Fall 2011, Stuart Russell
Learning 3/11/2017 3:36:18 PM Learning knowledge data Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

48 CS 194-10 Fall 2011, Stuart Russell
Learning 3/11/2017 3:36:18 PM prior knowledge Learning knowledge data Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

49 CS 194-10 Fall 2011, Stuart Russell
Learning 3/11/2017 3:36:18 PM prior knowledge Learning knowledge data Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

50 CS 194-10 Fall 2011, Stuart Russell
Learning 3/11/2017 3:36:18 PM prior knowledge Learning knowledge data Crucial open problem: weak intermediate forms of knowledge that support future generalizations Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

51 CS 194-10 Fall 2011, Stuart Russell
Example Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

52 CS 194-10 Fall 2011, Stuart Russell
Example Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

53 CS 194-10 Fall 2011, Stuart Russell
Example Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

54 Example – arriving at Sao Paulo, Brazil
Bem-vindo! Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

55 Example – arriving at Sao Paulo, Brazil
Bem-vindo! Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

56 Example – arriving at Sao Paulo, Brazil
Bem-vindo! Bem-vindo! Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

57 Example – arriving at Sao Paulo, Brazil
Bem-vindo! Bem-vindo! Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

58 CS 194-10 Fall 2011, Stuart Russell
Weak prior knowledge In this case, people in a given country (and city) tend to speak the same language Where did this knowledge come from? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

59 CS 194-10 Fall 2011, Stuart Russell
Weak prior knowledge In this case, people in a given country (and city) tend to speak the same language Where did this knowledge come from? Experience with other countries “Common sense” – i.e., knowledge of how societies and languages work Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

60 CS 194-10 Fall 2011, Stuart Russell
Weak prior knowledge In this case, people in a given country (and city) tend to speak the same language Where did this knowledge come from? Experience with other countries “Common sense” – i.e., knowledge of how societies and languages work And where did that knowledge come from? Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

61 Knowledge? What is knowledge? All I know is samples!! [V. Vapnik]
All knowledge derives, directly or indirectly, from experience of individuals Knowledge serves as a directly applicable shorthand for all that experience – better than requiring constant review of the entire sensory/evolutionary history of the human race Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

62 CS 194-10 Fall 2011, Stuart Russell
Expressiveness Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

63 The world has things in it!!
3/11/2017 3:36:18 PM Expressive language => concise models => fast learning, sometimes fast reasoning E.g., rules of chess: 1 page in first-order logic On(color,piece,x,y,t) ~ pages in propositional logic WhiteKingOnC4Move12 ~ pages as atomic-state model R.B.KB.RPPP..PPP..N..N…..PP….q.pp..Q..n..n..ppp..pppr.b.kb.r [Note: chess is a tiny problem compared to the real world] Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

64 Brief history of expressiveness
3/11/2017 3:36:18 PM probability logic atomic propositional first-order/relational Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

65 Brief history of expressiveness
3/11/2017 3:36:18 PM probability 5th C B.C. logic atomic propositional first-order/relational Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

66 Brief history of expressiveness
3/11/2017 3:36:18 PM 17th C probability 5th C B.C. logic atomic propositional first-order/relational Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

67 Brief history of expressiveness
3/11/2017 3:36:18 PM 17th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

68 Brief history of expressiveness
3/11/2017 3:36:18 PM 17th C 20th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

69 Brief history of expressiveness
3/11/2017 3:36:18 PM 17th C 20th C 21st C probability 5th C B.C. 19th C logic atomic propositional first-order/relational Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

70 Brief history of expressiveness
3/11/2017 3:36:18 PM Bernoulli Categorical Uni. Gaussian (H)MMs Bayes nets MRFs Multi. Gaussians DBNs Kalman filters RPMs BLOG MLNs (DBLOG) probability First-order logic Database systems Programs First-order STRIPS Temporal logic OBDDs, k-CNF Decision trees Perceptrons Propositional STRIPS Register circuits logic Finite automata atomic propositional first-order/relational Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

71 CTBT: Comprehensive Nuclear-Test-Ban Treaty
3/11/2017 3:36:18 PM Bans testing of nuclear weapons on earth Allows for outside inspection of 1000km2 182/195 states have signed 153/195 have ratified Need 9 more ratifications including US, China US Senate refused to ratify in 1998 “too hard to monitor” Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

72 CS 194-10 Fall 2011, Stuart Russell
2053 nuclear explosions 3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

73 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

74 CS 194-10 Fall 2011, Stuart Russell
254 monitoring stations 3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

75 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

76 CS 194-10 Fall 2011, Stuart Russell
The problem Given waveform traces from all seismic stations, figure out what events occurred when and where Traces at each sensor station may be preprocessed to form “detections” (90% are not real) ARID ORID STA PH BEL DELTA SEAZ ESAZ TIME TDEF AZRES ADEF SLORES SDEF WGT VMODEL LDDATE WRA P d d d IASP :54:27 FITZ P d d n IASP :54:27 MKAR P d d d IASP :54:27 ASAR P d d d IASP :54:27 Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

77 CS 194-10 Fall 2011, Stuart Russell
What do we know? Events happen randomly; each has a time, location, depth, magnitude; seismicity varies with location Seismic waves of many kinds (“phases”) travel through the Earth Travel time and attenuation depend on phase and source/destination Arriving waves may or may not be detected, depending on sensor and local noise environment Local noise may also produce false detections Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

78 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

79 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

80 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

81 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

82 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

83 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

84 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

85 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

86 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

87 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

88 CS 194-10 Fall 2011, Stuart Russell
3/11/2017 3:36:18 PM # SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE]; IsEarthQuake(e) ~ Bernoulli(.999); EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution() Else UniformEarthDistribution(); Magnitude(e) ~ Exponential(log(10)) + MIN_MAG; Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s)); IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s); #Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)]; #Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0; Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION) else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a); TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a))); Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360) else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a); AzRes(a) ~ Laplace(0, AZSCALE(site(a))); Slow(a) ~ If (event(a) = null) then Uniform(0,20) else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a)); Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

89 Learning with prior knowledge
Instead of learning a mapping from detection histories to event bulletins, learn local pieces of an overall structured model: Event location prior (A6) Predictive travel time model (A1) Phase type classifier (A2) Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

90 Event location prior (A6)
3/11/2017 3:36:18 PM Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

91 Travel time prediction (A1)
How long does it take for a seismic signal to get from A to B? This is the travel time T(A,B) If we know this accurately, and we know the arrival times t1, t2, t3, … at several stations B1, B2, B3, …, we can find an accurate estimate of the location A and time t for the event, such that T(A,Bi) ≈ ti – t for all i Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

92 CS 194-10 Fall 2011, Stuart Russell
Earth 101 Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

93 Seismic “phases” (wave types/paths)
Seismic energy is emitted in different types of waves; there are also qualitatively distinct paths (e.g., direct vs reflected from surface vs. refracted through core). P and S are the direct waves; P is faster Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

94 CS 194-10 Fall 2011, Stuart Russell
Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

95 IASP 91 reference velocity model
Spherically symmetric, Vphase(depth); from this, obtain Tpredicted(A,B). Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

96 IASP91 inaccuracy is too big!
Earth is inhomogeneous: variations in crust thickness and rock properties (“fast” and “slow”) Lecture 1 8/25/11 CS Fall 2011, Stuart Russell

97 Travel time residuals (Tactual – Tpredicted)
Residual surface (wrt a particular station) is locally smooth; estimate by local regression Lecture 1 8/25/11 CS Fall 2011, Stuart Russell


Download ppt "CS Fall 2011, Stuart Russell"

Similar presentations


Ads by Google