Open universes and nuclear weapons 12/1/2018 3:35:23 PM Open universes and nuclear weapons Stuart Russell Computer Science Division, UC Berkeley
Outline Why we need expressive probabilistic languages 12/1/2018 3:35:23 PM Why we need expressive probabilistic languages BLOG combines probability and first-order logic Application to global seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT)
The world has things in it!! 12/1/2018 3:35:23 PM Expressive language => concise models => fast learning, sometimes fast reasoning E.g., rules of chess: 1 page in first-order logic On(color,piece,x,y,t) ~100000 pages in propositional logic WhiteKingOnC4Move12 ~100000000000000000000000000000000000000 pages as atomic-state model R.B.KB.RPPP..PPP..N..N…..PP….q.pp..Q..n..n..ppp..pppr.b.kb.r [Note: chess is a tiny problem compared to the real world]
Brief history of expressiveness 12/1/2018 3:35:23 PM probability logic atomic propositional first-order/relational
Brief history of expressiveness 12/1/2018 3:35:23 PM probability 5th C B.C. logic atomic propositional first-order/relational
Brief history of expressiveness 12/1/2018 3:35:23 PM 17th C probability 5th C B.C. logic atomic propositional first-order/relational
Brief history of expressiveness 12/1/2018 3:35:23 PM 17th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational
Brief history of expressiveness 12/1/2018 3:35:23 PM 17th C 20th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational
Brief history of expressiveness 12/1/2018 3:35:23 PM 17th C 20th C 21st C probability 5th C B.C. 19th C logic atomic propositional first-order/relational
Brief history of expressiveness 12/1/2018 3:35:23 PM 17th C 20th C 21st C probability (be patient!) 5th C B.C. 19th C logic atomic propositional first-order/relational
First-order probabilistic languages 12/1/2018 3:35:23 PM Gaifman [1964], Halpern [1990]: Constraints on distributions over first-order possible worlds Poole [1993], Sato [1997], Koller and Pfeffer [1998], various others: KB defines distribution exactly (cf. Bayes nets) assumes unique names and domain closure like Prolog, databases (Herbrand semantics)
Herbrand vs full first-order 12/1/2018 3:35:23 PM Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have?
Herbrand vs full first-order 12/1/2018 3:35:23 PM Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have? Herbrand semantics: 2
Herbrand vs full first-order 12/1/2018 3:35:23 PM Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have? Herbrand semantics: 2 First-order logical semantics: Between 1 and ∞
12/1/2018 3:35:23 PM Possible worlds Propositional
Possible worlds Propositional 12/1/2018 3:35:23 PM Possible worlds Propositional First-order + unique names, domain closure A B C D A B A B A B A B C D C D C D C D
Possible worlds Propositional 12/1/2018 3:35:23 PM Possible worlds Propositional First-order + unique names, domain closure First-order open-universe A B C D A B A B A B A B C D C D C D C D A B C D A B C D A B C D A B C D A B C D A B C D
Open-universe models 12/1/2018 3:35:23 PM Essential for learning about what exists, e.g., vision, NLP, information integration, tracking, life [Note the GOFAI Gap: logic-based systems going back to Shakey assumed that perceived objects would be named correctly] Key question: how to define distributions over an infinite, heterogeneous set of worlds?
Bayes nets build propositional worlds Burglary Earthquake Alarm
Bayes nets build propositional worlds Burglary Earthquake Alarm Burglary
Bayes nets build propositional worlds Burglary Earthquake Alarm Burglary not Earthquake
Bayes nets build propositional worlds Burglary Earthquake Alarm Burglary not Earthquake Alarm
Open-universe models in BLOG 12/1/2018 3:35:23 PM Open-universe models in BLOG Construct worlds using two kinds of steps, proceeding in topological order: Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values
Open-universe models in BLOG 12/1/2018 3:35:23 PM Open-universe models in BLOG Construct worlds using two kinds of steps, proceeding in topological order: Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values Number statements: Add some objects to the world, conditioned on what objects and relations exist so far
Semantics 12/1/2018 3:35:23 PM Every well-formed* BLOG model specifies a unique proper probability distribution over open-universe possible worlds; equivalent to an infinite contingent Bayes net * No infinite receding ancestor chains, no conditioned cycles, all expressions finitely evaluable
Example: Citation Matching 12/1/2018 3:35:23 PM [Lashkari et al 94] Collaborative Interface Agents, Yezdi Lashkari, Max Metral, and Pattie Maes, Proceedings of the Twelfth National Conference on Articial Intelligence, MIT Press, Cambridge, MA, 1994. Metral M. Lashkari, Y. and P. Maes. Collaborative interface agents. In Conference of the American Association for Artificial Intelligence, Seattle, WA, August 1994. Are these descriptions of the same object? Core task in CiteSeer, Google Scholar, over 300 companies in the record linkage industry
(Simplified) BLOG model 12/1/2018 3:35:23 PM #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model 12/1/2018 3:35:23 PM #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model 12/1/2018 3:35:23 PM #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model 12/1/2018 3:35:23 PM #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model 12/1/2018 3:35:23 PM #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model 12/1/2018 3:35:23 PM #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
Citation Matching Results 12/1/2018 3:35:23 PM Four data sets of ~300-500 citations, referring to ~150-300 papers
Example: Sibyl attacks 12/1/2018 3:35:23 PM Typically between 100 and 10,000 real entities About 90% are honest, have one identity Dishonest entities own between 10 and 1000 identities. Transactions may occur between identities If two identities are owned by the same entity (sibyls), then a transaction is highly likely; Otherwise, transaction is less likely (depending on honesty of each identity’s owner). An identity may recommend another after a transaction: Sibyls with the same owner usually recommend each other; Otherwise, probability of recommendation depends on the honesty of the two entities.
12/1/2018 3:35:23 PM #Entity ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Identity(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Evidence: lots of transactions and recommendations, maybe some Honest(.) assertions Query: Honest(x)
12/1/2018 3:35:23 PM #Entity ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Identity(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Evidence: lots of transactions and recommendations, maybe some Honest(.) assertions Query: Honest(x)
12/1/2018 3:35:23 PM #Entity ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Identity(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Evidence: lots of transactions and recommendations, maybe some Honest(.) assertions Query: Honest(x)
12/1/2018 3:35:23 PM #Entity ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Identity(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Evidence: lots of transactions and recommendations, maybe some Honest(.) assertions Query: Honest(x)
12/1/2018 3:35:23 PM #Entity ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Identity(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Evidence: lots of transactions and recommendations, maybe some Honest(.) assertions Query: Honest(x)
Example: classical data association 12/1/2018 3:35:23 PM
Example: classical data association 12/1/2018 3:35:23 PM
Example: classical data association 12/1/2018 3:35:23 PM
Example: classical data association 12/1/2018 3:35:23 PM
Example: classical data association 12/1/2018 3:35:23 PM
Example: classical data association 12/1/2018 3:35:23 PM
#Aircraft(EntryTime = t) ~ NumAircraftPrior(); 12/1/2018 3:35:23 PM #Aircraft(EntryTime = t) ~ NumAircraftPrior(); Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1); InFlight(a, t) if t < EntryTime(a) then = false elseif t = EntryTime(a) then = true else = (InFlight(a, t-1) & !Exits(a, t-1)); State(a, t) if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, t-1)); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsCPD(State(Source(r), Time(r)));
Inference 12/1/2018 3:35:23 PM BLOG inference algorithms (rejection sampling, importance sampling, MCMC) converge to correct posteriors for any well-formed model, for any first-order query Current generic MCMC engine is quite slow Applying compiler technology Developing user-friendly methods for specifying piecemeal MCMC proposals
CTBT Bans testing of nuclear weapons on earth 12/1/2018 3:35:23 PM Bans testing of nuclear weapons on earth Allows for outside inspection of 1000km2 182/195 states have signed 153/195 have ratified Need 9 more ratifications including US, China US Senate refused to ratify in 1998 “too hard to monitor”
2053 nuclear explosions 12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
254 monitoring stations 12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
Vertically Integrated Seismic Analysis 12/1/2018 3:35:23 PM The problem is hard: ~10000 “detections” per day, 90% false CTBT system (SEL3) finds 69% of significant events plus about twice as many spurious (nonexistent) events 16 human analysts find more events, correct existing ones, throw out spurious events, generate LEB (“ground truth”) Unreliable below magnitude 4 (1kT) Solve it by global probabilistic inference NET-VISA finds around 88% of significant events
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
Generative model for IDC arrival data 12/1/2018 3:35:23 PM Events occur in time and space with magnitude Natural spatial distribution a mixture of Fisher-Binghams Man-made spatial distribution uniform Time distribution Poisson with given spatial intensity Magnitude distribution Gutenberg-Richter (exponential) Aftershock distribution (not yet implemented) Travel time according to IASPEI91 model plus Laplacian error distribution for each of 14 phases Detection depends on magnitude, distance, station* Detected azimuth, slowness plus Laplacian error False detections with station-dependent distribution
# SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE]; 12/1/2018 3:35:23 PM # SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE]; IsEarthQuake(e) ~ Bernoulli(.999); EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution() Else UniformEarthDistribution(); Magnitude(e) ~ Exponential(log(10)) + MIN_MAG; Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s)); IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s); #Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)]; #Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0; Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION) else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a); TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a))); Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360) else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a); AzRes(a) ~ Laplace(0, AZSCALE(site(a))); Slow(a) ~ If (event(a) = null) then Uniform(0,20) else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a));
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Propagation Propagation Detected at Station 1? Detected at Station 2? Station 1 noise Station 2 noise Station 1 picks Station 2 picks
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Type Time Location Depth Magnitude Phase Propagation Propagation Detected at Station 1? Detected at Station 2? Station 1 noise Station 2 noise Station 1 picks Station 2 picks
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Propagation Propagation Detected at Station 1? Travel time Amplitude decay Detected at Station 2? Station 1 noise Station 2 noise Station 1 picks Station 2 picks
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Propagation Propagation Detected at Station 1? Detected at Station 2? Arrival time* Amplitude* Azimuth* Slowness* Phase* Station 1 noise Station 2 noise Station 1 picks Station 2 picks
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Propagation Propagation Detected at Station 1? Detected at Station 2? Station 1 noise Station 2 noise Station 1 picks Station 2 picks
12/1/2018 3:35:23 PM
12/1/2018 3:35:23 PM
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Propagation Propagation Detected at Station 1? Detected at Station 2? Station 1 noise Station 2 noise Station 1 picks Station 2 picks
Travel-time residual (station 6) 12/1/2018 3:35:23 PM
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Propagation Propagation Detected at Station 1? Detected at Station 2? Station 1 noise Station 2 noise Station 1 picks Station 2 picks
Detection probability as a function of distance (station 6, mb 3.5) 12/1/2018 3:35:23 PM Detection probability as a function of distance (station 6, mb 3.5) S phase P phase
Forward model structure 12/1/2018 3:35:23 PM Seismic event Seismic event Propagation Propagation Detected at Station 1? Detected at Station 2? Station 1 noise Station 2 noise Station 1 picks Station 2 picks
Overall Pick Error 12/1/2018 3:35:23 PM
Overall Azimuth Error 12/1/2018 3:35:23 PM
Phase confusion matrix 12/1/2018 3:35:23 PM
Fraction of LEB events missed 12/1/2018 3:35:23 PM
Fraction of LEB events missed 12/1/2018 3:35:23 PM
Event distribution: LEB vs SEL3 12/1/2018 3:35:23 PM Event distribution: LEB vs SEL3
Event distribution: LEB vs NET-VISA 12/1/2018 3:35:23 PM
Why does NET-VISA work? 12/1/2018 3:35:23 PM Multiple empirically calibrated seismological models Improving model structure and quality improves the results Sound Bayesian combination of evidence Measured arrival times, phase labels, azimuths, etc., NOT taken literally Absence of detections provides negative evidence More detections per event than SEL3 or LEB
Example of using extra detections 12/1/2018 3:35:23 PM
NEIC event (3.0) missed by LEB 12/1/2018 3:35:23 PM >python debug.py 15 visa 254 -w 4 -r .1 87
NEIC event (3.7) missed by LEB 12/1/2018 3:35:23 PM python debug.py 15 visa 2069 -w 4 -r .1 88
NEIC event (2.6) missed by LEB 12/1/2018 3:35:23 PM python debug.py 15 visa 2338 -w 4 -r .1 89
Why does NET-VISA not work (perfectly)? 12/1/2018 3:35:23 PM Needs hydroacoustic for mid-ocean events Weaknesses in model: Travel time residuals for all phases along a single path are assumed to be uncorrelated Each phase arrival is assumed to generate at most one detection; in fact, multiple detections occur Arrival detectors use high SNR thresholds, look only at local signal to make hard decisions
Detection-based and signal-based monitoring 12/1/2018 3:35:23 PM events SEL3 NET-VISA SIG-VISA detections waveform signals
Summary Expressive probability models are very useful 12/1/2018 3:35:23 PM Expressive probability models are very useful BLOG provides a generative language for defining first-order, open-universe models Inference via MCMC over possible worlds Other methods welcome! CTBT application is typical of multi-sensor monitoring applications that need vertical integration and involve data association Scaling up inference is the next step