Download presentation
Presentation is loading. Please wait.
1
Python logic Tell me what you do with witches? Burn And what do you burn apart from witches? More witches! Shh! Wood! So, why do witches burn? [pause] B--... 'cause they're made of... wood? Good! Heh heh. Oh, yeah. Oh. So, how do we tell whether she is made of wood? []. Does wood sink in water? No. No, it floats! It floats! Throw her into the pond! The pond! Throw her into the pond! What also floats in water? Bread! Apples! Uh, very small rocks! ARTHUR: A duck! CROWD: Oooh. BEDEVERE: Exactly. So, logically... VILLAGER #1: If... she... weighs... the same as a duck,... she's made of wood. BEDEVERE: And therefore? VILLAGER #2: A witch! VILLAGER #1: A witch!
2
Problematic scenarios for hill-climbing When the state-space landscape has local minima, any search that moves only in the greedy direction cannot be (asymptotically) complete Random walk, on the other hand, is asymptotically complete Idea: Put random walk into greedy hill-climbing Ridges Solution(s): Random restart hill-climbing Do the non-greedy thing with some probability p>0 Use simulated annealing
3
The middle ground between hill-climbing and systematic search Hill-climbing has a lot of freedom in deciding which node to expand next. But it is incomplete even for finite search spaces. –Good for problems which have solutions, but the solutions are non- uniformly clustered. Systematic search is complete (because its search tree keeps track of the parts of the space that have been visited). –Good for problems where solutions may not exist, Or the whole point is to show that there are no solutions (e.g. propositional entailment problem to be discussed later). –or the state-space is densely connected (making repeated exploration of states a big issue). Smart idea: Try the middle ground between the two?
4
Tabu Search A variant of hill-climbing search that attempts to reduce the chance of revisiting the same states –Idea: Keep a “Tabu” list of states that have been visited in the past. Whenever a node in the local neighborhood is found in the tabu list, remove it from consideration (even if it happens to have the best “heuristic” value among all neighbors) –Properties: As the size of the tabu list grows, hill-climbing will asymptotically become “non-redundant” (won’t look at the same state twice) In practice, a reasonable sized tabu list (say 100 or so) improves the performance of hill climbing in many problems Hill climbing O(1) space complexity! but has no termination or completeness guarantee (because it doesn’t know where it has been, it can loop even in finite search spaces)
5
Making Hill-Climbing Asymptotically Complete Random restart hill-climbing –Keep some bound B. When you made more than B moves, reset the search with a new random initial seed. Start again. Getting random new seed in an implicit search space is non-trivial! –In 8-puzzle, if you generate a random state by making random moves from current state, you are still not truly random (as you will continue to be in one of the two components) “biased random walk”: Avoid being greedy when choosing the seed for next iteration –With probability p, choose the best child; but with probability (1-p) choose one of the children randomly Use simulated annealing –Similar to the previous idea—the probability p itself is increased asymptotically to one (so you are more likely to tolerate a non-greedy move in the beginning than towards the end) With random restart or the biased random walk strategies, we can solve very large problems million queen problems in under minutes!
6
Ideas for improving convergence: -- Random restart hill-climbing After every N iterations, start with a completely random assignment --Probabilistic greedy -with probability p do what the greedy strategy suggests -with probability (1-p) pick a random variable and change its value randomly -- p can increase as the search progresses A greedier version of the above (pick both the best var and val): For each variable v, let l(v) be the value that it can take so that the number of conflicts are minimized. Let n(v) be the number of conflicts with this value. --Pick the variable v with the lowest n(v) value. --Assign it the value l(v) Both look at the Same neighborhood But 2 is greedier 2 1 This one basically searches the 1-neighborhood of the current assignment (where k-neighborhood is all assignments that differ from the current assignment in atmost k-variable values)
7
Model-checking by Stochastic Hill-climbing Start with a model (a random t/f assignment to propositions) For I = 1 to max_flips do –If model satisfies clauses then return model –Else clause := a randomly selected clause from clauses that is false in model With probability p whichever symbol in clause maximizes the number of satisfied clauses /*greedy step*/ With probability (1-p) flip the value in model of a randomly selected symbol from clause /*random step*/ Return Failure Remarkably good in practice!! --So good that people startedwondering if there actually are any hard problems out there Clauses 1. (p,s,u) 2. (~p, q) 3. (~q, r) 4. (q,~s,t) 5. (r,s) 6. (~s,t) 7. (~s,u) Consider the assignment “all false” -- clauses 1 (p,s,u) & 5 (r,s) are violated --Pick one—say 5 (r,s) [if we flip r, 1 (remains) violated if we flip s, 4,6,7 are violated] So, greedy thing is to flip r we get all false, except r otherwise, pick either randomly Applying min-conflicts idea to Satisfiability
8
If most sat problems are easy, then exactly where are the hard ones? ?
9
Hardness of 3-sat as a function of #clauses/#variables #clauses/#variables Probability that there is a satisfying assignment Cost of solving (either by finding a solution or showing there ain’t one) p=0.5 You would expect this This is what happens! ~4.3
10
Phase Transition in SAT Theoretically we only know that phase transition ratio occurs between 3.26 and 4.596. Experimentally, it seems to be close to 4.3 (We also have a proof that 3-SAT has sharp threshold)
11
http://www.ipam.ucla.edu/publications/ptac2002/ptac2002_dachlioptas_formulas.pdf Progress in nailing the bound.. (just FYI)
12
“Beam search” for Hill-climbing Hill climbing, as described, uses one seed solution that is continually updated –Why not use multiple seeds? Stochastic hill-climbing uses multiple seeds (k seeds k>1). In each iteration, the neighborhoods of all k seeds are evaluated. From the neighborhood, k new seeds are selected probabilistically –The probability that a seed is selected is proportional to how good it is. –Not the same as running k hill-climbing searches in parallel Stochastic hill-climbing is sort of “almost” close to the way evolution seems to work with one difference –Define the neighborhood in terms of the combination of pairs of current seeds (Sexual reproduction; Crossover) The probability that a seed from current generation gets to “mate” to produce offspring in the next generation is proportional to the seed’s goodness To introduce “randomness” do mutation over the offspring –This type of stochastic beam-search hillclimbing algorithms are called Genetic algorithms. Genetic algorithms limit number of matings to keep the num seeds the same
13
Illustration of Genetic Algorithms in Action Very careful modeling needed so the things emerging from crossover and mutation are still potential seeds (and not monkeys typing Hamlet) Is the “genetic” metaphor really buying anything?
14
Hill-climbing in “continuous” search spaces Gradient descent (that you study in calculus of variations) is a special case of hill- climbing search applied to continuous search spaces –The local neighborhood is defined in terms of the “gradient” or derivative of the error function. Since the error function gradient will be zero near the minimum, and higher farther from it, you tend to take smaller steps near the minimum and larger steps farther away from it. [just as you would want] Gradient descent is guranteed to converge to the global minimum if alpha (see on the right) is small, and the error function is “uni-modal” (I.e., has only one minimum). –Versions of gradient-descent algorithms will be used in neuralnetwork learning. Unfortunately, the error function is NOT unimodal for multi-layer neural networks. So, you will have to change the gradient descent with ideas such as “simulated annealing” to increase the chance of reaching global minimum. XX Err= |x 3 -a| a 1/3 xoxo Example: cube root Finding using newton- Raphson approximation Tons of variations based on how alpha is set
15
Origins of gradient descent: Newton-Raphson applied to function minimization Newton-Raphson method is used for finding roots of a polynomial –To find roots of g(x), we start with some value of x and repeatedly do x x – g(x)/g’(x) –To minimize a function f(x), we need to find the roots of the equation f’(x)=0 X x – f’(x)/f’’(x) If x is a vector then –X x – f’(x)/f’’(x) f(x) H f (x) Because hessian is costly to Compute (will have n 2 double Derivative entries for an n-dimensional vector), we try approximations
16
Between Hill-climbing and systematic search You can reduce the freedom of hill- climbing search to make it more complete –Tabu search You can increase the freedom of systematic search to make it more flexible in following local gradients –Random restart search
17
Tabu Search A variant of hill-climbing search that attempts to reduce the chance of revisiting the same states –Idea: Keep a “Tabu” list of states that have been visited in the past. Whenever a node in the local neighborhood is found in the tabu list, remove it from consideration (even if it happens to have the best “heuristic” value among all neighbors) –Properties: As the size of the tabu list grows, hill-climbing will asymptotically become “non-redundant” (won’t look at the same state twice) In practice, a reasonable sized tabu list (say 100 or so) improves the performance of hill climbing in many problems
18
Random restart search Variant of depth-first search where When a node is expanded, its children are first randomly permuted before being introduced into the open list –The permutation may well be a “biased” random permutation Search is “restarted” from scratch anytime a “cutoff” parameter is exceeded –There is a “Cutoff” (which may be in terms of # of backtracks, #of nodes expanded or amount of time elapsed) Because of the “random” permutation, every time the search is restarted, you are likely to follow different paths through the search tree. This allows you to recover from the bad initial moves. The higher the cutoff value the lower the amount of restarts (and thus the lower the “freedom” to explore different paths). When cutoff is infinity, random restart search is just normal depth-first search—it will be systematic and complete For smaller values of cutoffs, the search has higher freedom, but no guarantee of completeness A strategy to guarantee asymptotic completeness: Start with a low cutoff value, but keep increasing it as time goes on. Random restart search has been shown to be very good for problems that have a reasonable percentage of “easy to find” solutions (such problems are said to exhibit “heavy-tail” phenomenon). Many real-world problems have this property.
19
Python logic Tell me what you do with witches? Burn And what do you burn apart from witches? More witches! Shh! Wood! So, why do witches burn? [pause] B--... 'cause they're made of... wood? Good! Heh heh. Oh, yeah. Oh. So, how do we tell whether she is made of wood? []. Does wood sink in water? No. No, it floats! It floats! Throw her into the pond! The pond! Throw her into the pond! What also floats in water? Bread! Apples! Uh, very small rocks! ARTHUR: A duck! CROWD: Oooh. BEDEVERE: Exactly. So, logically... VILLAGER #1: If... she... weighs... the same as a duck,... she's made of wood. BEDEVERE: And therefore? VILLAGER #2: A witch! VILLAGER #1: A witch!
21
Representation Reasoning
23
Assertions; t/f Epistemological commitment Ontological commitment t/f/u Deg belief facts Facts Objects relations Prop logic Prob prop logic FOPCProb FOPC
24
Think of a sentence as the stand-in for a set of worlds (where it is true)
31
is true in all worlds (rows) Where KB is true…so it is entailed
32
KB&~ False So, to check if KB entails , negate , add it to the KB, try to show that the resultant (propositional) theory has no solutions (must have to use systematic methods) Proof by model checking
33
Connection between Entailment and Satisfiability The Boolean Satisfiability problem is closely connected to Propositional entailment –Specifically, propositional entailment is the “conjugate” problem of boolean satisfiability (since we have to show that KB & ~f has no satisfying model to show that KB |= f) Of late, our ability to solve very large scale satisfiability problems has increased quite significantly
34
Entailment & Satisfiability SAT (boolean satisfiability) problem Given a set of propositions And a set of (CNF) clauses Find a model (an assignment of t/f values to propositions) that satisfies all clauses –k-SAT is a SAT problem where all clauses are length less than or equal to k »SAT is NP-complete; »1-SAT and 2-SAT are polynomial »k-SAT for k> 2 is NP-complete (so 3-SAT is the smallest k-SAT that is NP-Complete) –If we have a procedure for solving SAT problems, we can use it to compute entailment If the sentence S is entailed, if negation of S, when added to the KB, gives a SAT theory that is unsatisfiable (NO MODEL) –CO-NP-Complete –SAT is useful for modeling many other “assignment” problems We will see use of SAT for planning; it can also be used for Graph coloring, n-queens, Scheduling and Circuit verification etc (the last thing makes SAT VERY interesting for Electrical Engineering folks) –Our ability to solve very large scale SAT problems has increased quite phenomenally in the recent years We can solve SAT instances with millions of variables and clauses very easily To use this technology for inference, we will have to consider systematic SAT solvers.
35
Davis-Putnam-Logeman-Loveland Procedure detect failure
36
DPLL Example Clauses (p,s,u) (~p, q) (~q, r) (q,~s,t) (r,s) (~s,t) (~s,u) Pick p; set p=true unit propagation (p,s,u) satisfied (remove) p;(~p,q) q derived; set q=T (~p,q) satisfied (remove) (q,~s,t) satisfied (remove) q;(~q,r) r derived; set r=T (~q,r) satisfied (remove) (r,s) satisfied (remove) pure literal elimination in all the remaining clauses, s occurs negative set ~s=True (i.e. s=False) At this point all clauses satisfied. Return p=T,q=T;r=T;s=False s was not Pure in all clauses (only The remaining ones)
37
Lots of work in SAT solvers DPLL was the first (late 60’s) Circa 1994 came GSAT (hill climbing search for SAT) Circa 1997 came SATZ Circa 1998-99 came RelSAT ~2000 came CHAFF Current best can be found at –http://www.satlive.org/SATCompetition/2003/results.html
38
Inference rules Sound (but incomplete) –Modus Ponens A=>B, A |= B –Modus tollens A=>B,~B |= ~A –Abduction (??) A => B,~A |= ~B –Chaining A=>B,B=>C |= A=>C Complete (but unsound) –“Python” logic How about SOUND & COMPLETE? --Resolution (needs normal forms) AB A=>B KB~A TTTFF TFFFF FTTFT FFTTT Kb true but theorem not true
39
If WMDs are found, the war is justified W=>J If WMDs are not found, the war is still justified ~W=>J Is the war justified anyway? |= J? Can Modus Ponens derive it? Need something that does case analysis
40
If WMDs are found, the war is justified W=>J If WMDs are not found, the war is still justified ~W=>J Is the war justified anyway? |= J? Can Modus Ponens derive it? Need something that does case analysis
41
Forward apply resolution steps until the fact f you want to prove appears as a resolvent Backward (Resolution Refutation) Add negation of the fact f you want to derive to KB apply resolution steps until you derive an empty clause Modus ponens, Modus Tollens etc are special cases of resolution!
42
Don’t need to use other equivalences if we use resolution in refutation style ~J ~W ~ W V J W V J J If WMDs are found, the war is justified ~W V J If WMDs are not found, the war is still justified W V J Is the war justified anyway? |= J? J V J =J
43
Don’t need to use other equivalences if we use resolution in refutation style ~J ~W ~ W V J W V J W V ~W ~W J If WMDs are found, the war is justified ~W V J If WMDs are not found, the war is still justified W V J Either WMDs are found or they are not found W V ~W Is the war justified anyway? |= J? W V J J V J =J Resolution does case analysis
44
Prolog without variables and without the cut operator Is doing horn-clause theorem proving For any KB in horn form, modus ponens is a sound and complete inference Aka the product of sums form From CSE/EEE 120 Aka the sum of products form
45
Conversion to CNF form CNF clause= Disjunction of literals –Literal = a proposition or a negated proposition –Conversion: Remove implication Pull negation in Use demorgans laws to distribute disjunction over conjunction Separate conjunctions into clauses ANY propositional logic sentence can be converted into CNF form Try: ~(P&Q)=>~(R V W)
46
Need for resolution Yankees win, it is Destiny ~YVD Dbacks win, it is Destiny ~Db V D Yankees or Dbacks win Y V Db Is it Destiny either way? |= D? Can Modus Ponens derive it? Not until Sunday, when Db won DVY DVD == D Resolution does case analysis Don’t need to use other equivalences if we use resolution in refutation style ~D ~Y ~Y V D ~Db V D Y V Db Db D
47
Solving problems using propositional logic Need to write what you know as propositional formulas Theorem proving will then tell you whether a given new sentence will hold given what you know Three kinds of queries –Is my knowledge base consistent? (i.e. is there at least one world where everything I know is true?) Satisfiability –Is the sentence S entailed by my knowledge base? (i.e., is it true in every world where my knowledge base is true?) –Is the sentence S consistent/possibly true with my knowledge base? (i.e., is S true in at least one of the worlds where my knowledge base holds?) S is consistent if ~S is not entailed But cannot differentiate between degrees of likelihood among possible sentences
48
Steps in Resolution Refutation Consider the following problem –If the grass is wet, then it is either raining or the sprinkler is on GW => R V SP ~GW V R V SP –If it is raining, then Timmy is happy R => TH ~R V TH –If the sprinklers are on, Timmy is happy SP => TH ~SP V TH –If timmy is happy, then he sings TH => SG ~TH V SG –Timmy is not singing ~SG –Prove that the grass is not wet |= ~GW? GW R V SP TH V SP SG V SP SP TH SG Is there search in inference? Yes!! Many possible inferences can be done Only few are actually relevant --Idea: Set of Support At least one of the resolved clauses is a goal clause, or a descendant of a clause derived from a goal clause -- Used in the example here!!
49
Search in Resolution Convert the database into clausal form D c Negate the goal first, and then convert it into clausal form D G Let D = D c + D G Loop –Select a pair of Clauses C1 and C2 from D Different control strategies can be used to select C1 and C2 to reduce number of resolutions tries –Idea 1: Set of Support: At least one of C1 or C2 must be either the goal clause or a clause derived by doing resolutions on the goal clause (*COMPLETE*) –Idea 2: Linear input form: Atleast one of C1 or C2 must be one of the clauses in the input KB (*INCOMPLETE*) –Resolve C1 and C2 to get C12 –If C12 is empty clause, QED!! Return Success (We proved the theorem; ) –D = D + C12 –End loop If we come here, we couldn’t get empty clause. Return “Failure” –Finiteness is guaranteed if we make sure that: we never resolve the same pair of clauses more than once; AND we use factoring, which removes multiple copies of literals from a clause (e.g. QVPVP => QVP)
50
Mad chase for empty clause… You must have everything in CNF clauses before you can resolve –Goal must be negated first before it is converted into CNF form Goal (the fact to be proved) may become converted to multiple clauses (e.g. if we want to prove P V Q, then we get two clauses ~P ; ~Q to add to the database Resolution works by resolving away a single literal and its negation –PVQ resolved with ~P V ~Q is not empty! In fact, these clauses are not inconsistent (P true and Q false will make sure that both clauses are satisfied) –PVQ is negation of ~P & ~Q. The latter will become two separate clauses--~P, ~Q. So, by doing two separate resolutions with these two clauses we can derive empty clause
51
Complexity of Propositional Inference Any sound and complete inference procedure has to be Co-NP- Complete (since model-theoretic entailment computation is Co-NP- Complete (since model-theoretic satisfiability is NP-complete)) Given a propositional database of size d –Any sentence S that follows from the database by modus ponens can be derived in linear time If the database has only HORN sentences (sentences whose CNF form has at most one +ve clause; e.g. A & B => C), then MP is complete for that database. –PROLOG uses (first order) horn sentences –Deriving all sentences that follow by resolution is Co-NP- Complete (exponential) Anything that follows by unit-resolution can be derived in linear time. –Unit resolution: At least one of the clauses should be a clause of length 1
52
Example Pearl lives in Los Angeles. It is a high-crime area. Pearl installed a burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth-quake prone. Alarm goes off when there is an earth- quake. Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls If there is a burglary, will Mary call? Check KB & E |= M If Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~B
53
Example (Real) Pearl lives in Los Angeles. It is a high- crime area. Pearl installed a burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth- quake prone. Alarm goes off when there is an earth-quake. Pearl lives in real world where (1) burglars can sometimes disable alarms (2) some earthquakes may be too slight to cause alarm (3) Even in Los Angeles, Burglaries are more likely than Earth Quakes (4) John and Mary both have their own lives and may not always call when the alarm goes off (5) Between John and Mary, John is more of a slacker than Mary.(6) John and Mary may call even without alarm going off Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls If there is a burglary, will Mary call? Check KB & E |= M If Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~B John already called. If Mary also calls, is it more likely that Burglary occurred? You now also hear on the TV that there was an earthquake. Is Burglary more or less likely now?
54
Example (Real) Pearl lives in Los Angeles. It is a high- crime area. Pearl installed a burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth- quake prone. Alarm goes off when there is an earth-quake. Pearl lives in real world where (1) burglars can sometimes disable alarms (2) some earthquakes may be too slight to cause alarm (3) Even in Los Angeles, Burglaries are more likely than Earth Quakes (4) John and Mary both have their own lives and may not always call when the alarm goes off (5) Between John and Mary, John is more of a slacker than Mary.(6) John and Mary may call even without alarm going off Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls If there is a burglary, will Mary call? Check KB & E |= M If Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~B John already called. If Mary also calls, is it more likely that Burglary occurred? You now also hear on the TV that there was an earthquake. Is Burglary more or less likely now?
55
How do we handle Real Pearl? Eager way: – Model everything! –E.g. Model exactly the conditions under which John will call He shouldn’t be listening to loud music, he hasn’t gone on an errand, he didn’t recently have a tiff with Pearl etc etc. A & c1 & c2 & c3 &..cn => J (also the exceptions may have interactions c1&c5 => ~c9 ) Ignorant (non-omniscient) and Lazy (non- omnipotent) way: –Model the likelihood –In 85% of the worlds where there was an alarm, John will actually call –How do we do this? Non-monotonic logics “certainty factors” “probability” theory? Qualification and Ramification problems make this an infeasible enterprise
56
Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we can use standard rules of probability to compute the likelihood of all queries (as I will remind you) So, Joint Probability Distribution is all that you ever need! In the case of Pearl example, we just need the joint probability distribution over B,E,A,J,M (32 numbers) --In general 2 n separate numbers (which should add up to 1) If Joint Distribution is sufficient for reasoning, what is domain knowledge supposed to help us with? --Answer: Indirectly by helping us specify the joint probability distribution with fewer than 2 n numbers ---The local relations between propositions can be seen as “constraining” the form the joint probability distribution can take! Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls Only 10 (instead of 32) numbers to specify!
57
If B=>A then P(A|B) = ? P(B|~A) = ? P(B|A) = ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.