Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disambiguation March 7, 2003. Problem Many people have the same name. Example: Michael Jordan, basketball star or professor? Prior knowledge is not feasible.

Similar presentations


Presentation on theme: "Disambiguation March 7, 2003. Problem Many people have the same name. Example: Michael Jordan, basketball star or professor? Prior knowledge is not feasible."— Presentation transcript:

1 Disambiguation March 7, 2003

2 Problem Many people have the same name. Example: Michael Jordan, basketball star or professor? Prior knowledge is not feasible. Disambiguation based on context. Example: Scottie Pippen, Dennis Rodman, Phil Jackson Example: U.C. Berkeley, David Cohn

3 Graph Michael Jordan U.C. Berkeley Dennis Rodman Scottie Pippen David Cohn Phil Jackson

4 Graph Michael Jordan U.C. Berkeley Dennis Rodman Scottie Pippen David Cohn Phil Jackson

5 Algorithm Choose the most relevant people to Michael Jordan. Relevance measured by P( MJ | p) for each person p.

6 Choosing Seed Values We need a starting point. People that correspond with the senses of MJ. How well do the seeds separate people into camps? Exhaustive search through all pairs of people.

7 Good Seeds U.C. Berkeley Dennis Rodman Scottie Pippen David Cohn Phil Jackson

8 Bad seeds U.C. Berkeley Dennis Rodman Scottie Pippen David Cohn Phil Jackson

9 Choosing Seeds I Let Sj be the jth sense. Denote S1 as basketball star and S2 as professor (interchangeable because no prior knowledge). In the exhaustive search, we arbitrarily pick some person to be seed0 and another to be seed1 where seed0 corresponds to S0 and seed1 to S1. Let P(MJ = S1 | MJ, seed1) = 1 and P(MJ = S0 | MJ, seed1) = 0, vice versa. This probability could be wrong, but it is just an arbitrary assignment.

10 Choosing Seeds II For each person, p, and sense, Sj: P( MJ = Sj | MJ, p) = n(seedj, p) P(MJ | seedj) Person belong to camp Sj only if P(MJ=Sj| MJ, p) > 0.95. Use harmonic mean to score how well seed0 and seed1 assign people to camps.

11 Iteration I Now we have the best seeds, we are going to assign P( MJ = Sj | p) for each person, p. Step 1: Begin with every person in the unknown except the seeds. Step 2: For each person in the unknown and each sense, calculate P(MJ = Sj | p) = P(MJ | p) P(MJ = Sj|MJ,p)

12 Iteration II Step 3: For each sense, take the highest P(MJ = Sj | p) and take p out of unknown. Step 4: Repeat step 2 and step 3 until everyone is out of the unknown.

13 Prediction Given a link, simply add up all the probability of all the names for each sense. So MJ in link is S1 or S2. We don’t know anything about basketball stars or professors.

14 Dataset Movie database from IMDB 230,000 actors 40,000 movies Randomly pick actors who appeared in 15 movies or more (4000 actors). Assign them to be the same person. Run the algorithm. See which sense does each movie belong to. Repeat 100 times. Average accuracy: 75%

15 Good Example Blandick__Clara(38) vs Gibson__Henry(19): final score = 0.982456 38 out of 38 correct Blandick__Clara has seed Phelps__Lee 18 out of 19 correct Gibson__Henry has seed Davies__John__IV_ Clara Blandick from 1910s to 1950s Lee Phelps also from that era, appeared in 6 movies with Clara Henry Gibson from 1960s to 2000s John Davies IV also from that era, appeared in 2 movies with Henry

16 Bad Example Marsh__Mae(25) vs Moorehead__Agnes(19): final score = 0.500000 16 out of 25 correct Marsh__Mae has seed Morin__Alberto__I_ 6 out of 19 correct Moorehead__Agnes has seed Wolfe__Ian Mae Marsh, Agnes Moorehead, Alberto Morin, and Ian Wolfe all appeared in movies from 1940s to 1970s.


Download ppt "Disambiguation March 7, 2003. Problem Many people have the same name. Example: Michael Jordan, basketball star or professor? Prior knowledge is not feasible."

Similar presentations


Ads by Google