Presentation is loading. Please wait.

Presentation is loading. Please wait.

Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Similar presentations


Presentation on theme: "Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007."— Presentation transcript:

1 Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007

2 Roadmap Problem: –Matching Topics and Documents Challenge I: Beyond literal matching –Expansion Strategies Challenge II: Authoritative source –Hubs & Authorities –Page Rank

3 Key Issue All approaches operate on term matching –If a synonym, rather than original term, is used, approach fails Develop more robust techniques –Match “concept” rather than term Expansion approaches –Add in related terms to enhance matching Mapping techniques –Associate terms to concepts »Aspect models, stemming

4 Expansion Techniques Can apply to query or document Thesaurus expansion –Use linguistic resource – thesaurus, WordNet – to add synonyms/related terms Feedback expansion –Add terms that “should have appeared” User interaction –Direct or relevance feedback Automatic pseudo relevance feedback

5 Query Refinement Typical queries very short, ambiguous –Cat: animal/Unix command –Add more terms to disambiguate, improve Relevance feedback –Retrieve with original queries –Present results Ask user to tag relevant/non-relevant –“push” toward relevant vectors, away from nr –β+γ=1 (0.75,0.25); r: rel docs, s: non-rel docs –“Roccio” expansion formula

6 Compression Techniques Reduce surface term variation to concepts Stemming –Map inflectional variants to root E.g. see, sees, seen, saw -> see Crucial for highly inflected languages – Czech, Arabic Aspect models –Matrix representations typically very sparse –Reduce dimensionality to small # key aspects Mapping contextually similar terms together Latent semantic analysis

7 Authoritative Sources Based on vector space alone, what would you expect to get searching for “search engine”? –Would you expect to get Google?

8 Issue Text isn’t always best indicator of content Example: “search engine” –Text search -> review of search engines Term doesn’t appear on search engine pages Term probably appears on many pages that point to many search engines

9 Hubs & Authorities Not all sites are created equal –Finding “better” sites Question: What defines a good site? –Authoritative –Not just content, but connections! One that many other sites think is good Site that is pointed to by many other sites –Authority

10 Conferring Authority Authorities rarely link to each other –Competition Hubs: –Relevant sites point to prominent sites on topic Often not prominent themselves Professional or amateur Good Hubs Good Authorities

11 Computing HITS Finding Hubs and Authorities Two steps: –Sampling: Find potential authorities –Weight-propagation: Iteratively estimate best hubs and authorities

12 Sampling Identify potential hubs and authorities –Connected subsections of web Select root set with standard text query Construct base set: –All nodes pointed to by root set –All nodes that point to root set Drop within-domain links –1000-5000 pages

13 Weight-propagation Weights: –Authority weight: –Hub weight: All weights are relative Updating: Converges Pages with high x: good authorities; y: good hubs

14 Weight Propagation Create adjacency matrix A –Ai,j = 1 if i links to j, o.w. 0 Create vectors x and y of corresponding values Converges to principal eigenvector

15 Google’s PageRank Identifies authorities –Important pages are those pointed to by many other pages Better pointers, higher rank –Ranks search results –t: page pointing to A; C(t): number of outbound links d: damping measure –Actual ranking on logarithmic scale –Iterate

16 Contrasts Internal links –Large sites carry more weight If well-designed –H&A ignores site-internals Outbound links explicitly penalized Lots of tweaks….

17 Web Search Search by content –Vector space model Word-based representation “Aboutness” and “Surprise” Enhancing matches Simple learning model Search by structure –Authorities identified by link structure of web Hubs confer authority

18 Learning: Perceptrons Artificial Intelligence CMSC 25000 February 1, 2007

19 Agenda Neural Networks: –Biological analogy Perceptrons: Single layer networks Perceptron training Perceptron convergence theorem Perceptron limitations Conclusions

20 Neurons: The Concept Axon Cell Body Nucleus Dendrites Neurons: Receive inputs from other neurons (via synapses) When input exceeds threshold, “fires” Sends output along axon to other neurons Brain: 10^11 neurons, 10^16 synapses

21 Artificial Neural Nets Simulated Neuron: –Node connected to other nodes via links Links = axon+synapse+link Links associated with weight (like synapse) –Multiplied by output of node –Node combines input via activation function E.g. sum of weighted inputs passed thru threshold Simpler than real neuronal processes

22 Artificial Neural Net x x x w w w Sum Threshold +

23 Perceptrons Single neuron-like element –Binary inputs –Binary outputs Weighted sum of inputs > threshold

24 Perceptron Structure x 0 =1x1x1 x3x3 x2x2 xnxn w1w1 w0w0... w2w2 w3w3 wnwn y x 0 w 0 compensates for threshold

25 Perceptron Convergence Procedure Straight-forward training procedure –Learns linearly separable functions Until perceptron yields correct output for all –If the perceptron is correct, do nothing –If the percepton is wrong, If it incorrectly says “yes”, –Subtract input vector from weight vector Otherwise, add input vector to weight vector

26 Perceptron Convergence Example LOGICAL-OR: Sample x1 x2 x3 Desired Output 1 0 0 1 0 2 0 1 1 1 3 1 0 1 1 4 1 1 1 1 Initial: w=(0 0 0);After S2, w=w+s2=(0 1 1) Pass2: S1:w=w-s1=(0 1 0);S3:w=w+s3=(1 1 1) Pass3: S1:w=w-s1=(1 1 0)

27 Perceptron Convergence Theorem If there exists a vector W s.t. Perceptron training will find it Assume for all +ive examples x ||w||^2 increases by at most ||x||^2, in each iteration ||w+x||^2 <= ||w||^2+||x||^2 <=k ||x||^2 v.w/||w|| > <= 1 Converges in k <= O steps

28 Perceptron Learning Perceptrons learn linear decision boundaries E.g. + + + 0 0 0 0 0 0 0 x1x1 x2x2 But not x2x2 x1x1 + +0 0 xor X1 X2 -1 -1 w1x1 + w2x2 < 0 1 -1 w1x1 + w2x2 > 0 => implies w1 > 0 1 1 w1x1 + w2x2 >0 => but should be false -1 1 w1x1 + w2x2 > 0 => implies w2 > 0

29 Perceptron Example Digit recognition –Assume display= 8 lightable bars –Inputs – on/off + threshold –65 steps to recognize “8”

30 Perceptron Summary Motivated by neuron activation Simple training procedure Guaranteed to converge –IF linearly separable

31 Neural Nets Multi-layer perceptrons –Inputs: real-valued –Intermediate “hidden” nodes –Output(s): one (or more) discrete-valued X1 X2 X3 X4 InputsHidden Outputs Y1 Y2

32 Neural Nets Pro: More general than perceptrons –Not restricted to linear discriminants –Multiple outputs: one classification each Con: No simple, guaranteed training procedure –Use greedy, hill-climbing procedure to train –“Gradient descent”, “Backpropagation”

33 Solving the XOR Problem x1x1 w 13 w 11 w 21 o2o2 o1o1 w 12 y w 03 w 22 x2x2 w 23 w 02 w 01 Network Topology: 2 hidden nodes 1 output Desired behavior: x1 x2 o1 o2 y 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 Weights: w11= w12=1 w21=w22 = 1 w01=3/2; w02=1/2; w03=1/2 w13=-1; w23=1

34 Neural Net Applications Speech recognition Handwriting recognition NETtalk: Letter-to-sound rules ALVINN: Autonomous driving

35 ALVINN Driving as a neural network Inputs: –Image pixel intensities I.e. lane lines 5 Hidden nodes Outputs: –Steering actions E.g. turn left/right; how far Training: –Observe human behavior: sample images, steering

36 Backpropagation Greedy, Hill-climbing procedure –Weights are parameters to change –Original hill-climb changes one parameter/step Slow –If smooth function, change all parameters/step Gradient descent –Backpropagation: Computes current output, works backward to correct error

37 Producing a Smooth Function Key problem: –Pure step threshold is discontinuous Not differentiable Solution: –Sigmoid (squashed ‘s’ function): Logistic fn

38 Neural Net Training Goal: –Determine how to change weights to get correct output Large change in weight to produce large reduction in error Approach: Compute actual output: o Compare to desired output: d Determine effect of each weight w on error = d-o Adjust weights

39 Neural Net Example y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w11w11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 xi : ith sample input vector w : weight vector yi*: desired output for ith sample Sum of squares error over training samples z3z3 z1z1 z2z2 Full expression of output in terms of input and weights - From 6.034 notes lozano-perez

40 Gradient Descent Error: Sum of squares error of inputs with current weights Compute rate of change of error wrt each weight –Which weights have greatest effect on error? –Effectively, partial derivatives of error wrt weights In turn, depend on other weights => chain rule

41 Gradient Descent E = G(w) –Error as function of weights Find rate of change of error –Follow steepest rate of change –Change weights s.t. error is minimized E w G(w) dG dw Local minima w0w1

42 MIT AI lecture notes, Lozano- Perez 2000 Gradient of Error z3z3 z1z1 z2z2 y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w11w11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 Note: Derivative of sigmoid: ds(z1) = s(z1)(1-s(z1)) dz1 - From 6.034 notes lozano-perez

43 From Effect to Update Gradient computation: –How each weight contributes to performance To train: –Need to determine how to CHANGE weight based on contribution to performance –Need to determine how MUCH change to make per iteration Rate parameter ‘r’ –Large enough to learn quickly –Small enough reach but not overshoot target values

44 Backpropagation Procedure Pick rate parameter ‘r’ Until performance is good enough, –Do forward computation to calculate output –Compute Beta in output node with –Compute Beta in all other nodes with –Compute change for all weights with i j k

45 Backprop Example y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w 11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 Forward prop: Compute z i and y i given x k, w l

46 Backpropagation Observations Procedure is (relatively) efficient –All computations are local Use inputs and outputs of current node What is “good enough”? –Rarely reach target (0 or 1) outputs Typically, train until within 0.1 of target

47 Neural Net Summary Training: –Backpropagation procedure Gradient descent strategy (usual problems) Prediction: –Compute outputs based on input vector & weights Pros: Very general, Fast prediction Cons: Training can be VERY slow (1000’s of epochs), Overfitting

48 Training Strategies Online training: –Update weights after each sample Offline (batch training): –Compute error over all samples Then update weights Online training “noisy” –Sensitive to individual instances –However, may escape local minima

49 Training Strategy To avoid overfitting: –Split data into: training, validation, & test Also, avoid excess weights (less than # samples) Initialize with small random weights –Small changes have noticeable effect Use offline training –Until validation set minimum Evaluate on test set –No more weight changes

50 Classification Neural networks best for classification task –Single output -> Binary classifier –Multiple outputs -> Multiway classification Applied successfully to learning pronunciation –Sigmoid pushes to binary classification Not good for regression

51 Neural Net Example NETtalk: Letter-to-sound by net Inputs: –Need context to pronounce 7-letter window: predict sound of middle letter 29 possible characters – alphabet+space+,+. –7*29=203 inputs 80 Hidden nodes Output: Generate 60 phones –Nodes map to 26 units: 21 articulatory, 5 stress/sil Vector quantization of acoustic space

52 Neural Net Example: NETtalk Learning to talk: –5 iterations/1024 training words: bound/stress –10 iterations: intelligible –400 new test words: 80% correct Not as good as DecTalk, but automatic

53 Neural Net Conclusions Simulation based on neurons in brain Perceptrons (single neuron) –Guaranteed to find linear discriminant IF one exists -> problem XOR Neural nets (Multi-layer perceptrons) –Very general –Backpropagation training procedure Gradient descent - local min, overfitting issues


Download ppt "Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007."

Similar presentations


Ads by Google