Download presentation
Presentation is loading. Please wait.
Published byCarol Wilkerson Modified over 9 years ago
1
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied Computing 2006
2
Motivation Link-based ranking algorithms are important to current popular search engines. (e.g., HITS for Teoma) Link farms will deteriorate the performance of link-based ranking algorithms
3
HITS algorithm Each page has two measures, authority score a shows how good this page is for a query, hub score h shows the possibility that the page points to good authority pages. E is the adjacency matrix. a = E T h h = E a
4
Example: for query “weather” http://www.tripadvisor.com/ http://www.virtualtourist.com/ http://www.abed.com/memoryfoam.html http://www.abed.com/furniture.html http://www.rental-car.us/ http://www.accommodation-specials.com/ http://www.lasikeyesurgery.com/ http://www.lasikeyesurgery.com/lasik-surgery.asp http://mortgage-rate-refinancing.com/ http://mortgage-rate-refinancing.com/mortgage- calculator.html
5
Factors that degrade HITS Mutually reinforcing relationships Duplicate pages Link farms
6
Complete hyperlink Definition: The link with its anchor text as a unit. Duplication of a complete link is a much stronger sign of copying behavior on the Web than a duplicate link target.
7
Document - Complete link Matrix
8
Bipartite Graph Two disjoint sets X and Y, each edge starts from an element in X and ends with an element in Y.
9
Link farms Link farms are usually densely connected via multiple overlapping small bipartite cores. Task: to detect densely connected bipartite components from “document - complete link” matrix
10
Algorithm for finding bipartite components
11
Result: k=2 and l=2
12
Adjustment: document-document matrix
13
Final matrix
14
Weighted adjacency matrix
15
Experiment: HITS result of “rental car” http://www.discountcars.net/ http://www.motel-discounts.com/ http://www.stlouishoteldeals.com/ http://www.richmondhoteldeals.com/ http://www.jacksonvillehoteldeals.com/ http://www.jacksonhoteldeals.com/ http://www.keywesthoteldeals.com/ http://www.austinhoteldeals.com/ http://www.gatlinburghoteldeals.com/ http://www.ashevillehoteldeals.com/
16
Experiment: B&H HITS result of “rental car” http://www.rentadeal.com/ http://www.allaboutstlouis.com/ http://www.allaboutboston.com/ https://travel2.securesites.com/ about_travelguides/addlisting.html http://www.allaboutsanfranciscoca.com/ http://www.allaboutwashingtondc.com/ http://www.allaboutalbuquerque.com/ http://www.allabout-losangeles.com/ http://www.allabout-denver.com/ http://www.allabout-chicago.com/
17
Experiment: CL-HITS result of “rental car” http://www.hertz.com/ http://www.avis.com/ http://www.nationalcar.com/ http://www.thrifty.com/ http://www.dollar.com/ http://www.alamo.com/ http://www.budget.com/ http://www.enterprise.com/ http://www.budgetrentacar.com/ http://www.europcar.com/
18
Experiment: B&H HITS result of “translation online” http://www.no-gambling.com/ http://www.teleorg.org/ http://ong.altervista.org/ http://bx.b0x.com/ http://video-poker.batcave.net/ http://www.websamba.com/marketing-campaigns http://online-casino.o-f.com/ http://caribbean-poker.webxis.com/ http://roulette.zomi.net/ http://teleservices.netfirms.com/
19
Experiment: CL-HITS result of “translation online” http://www.freetranslation.com/ http://www.systransoft.com/ http://babelfish.altavista.com/ http://www.yourdictionary.com/ http://dictionaries.travlang.com/ http://www.google.com/ http://www.foreignword.com/ http://www.babylon.com/ http://www.worldlingo.com/products_services /worldlingo_translator.html http://www.allwords.com/
20
Duplicate example: BH-HITS result of “maps” http://www.maps.com/ http://www.mapsworldwide.com/ http://www.cartographic.com/ http://www.amaps.com/ http://www.cdmaps.com/ http://www.ewpnet.com/maps.htm http://mapsguidesandmore.com/ http://www.njdiningguide.com/maps.html http://www.stanfords.co.uk/ http://www.delorme.com/
21
Duplicate example: CL-HITS result of “maps” http://www.maps.com/ http://maps.yahoo.com/ http://www.delorme.com/ http://tiger.census.gov/ http://www.davidrumsey.com/ http://memory.loc.gov/ammem/gmdhtml/gmdhome.html http://www.esri.com/ http://www.maptech.com/ http://www.streetmap.co.uk/ http://www.libs.uga.edu/darchive/hargrett/maps/maps.html
22
User evaluation CategoryHITSBHITSCL-HITSCL-POP Quite relevant12.9%24.5%48.4%46.3% Relevant10.7%18.3%28.8%26.2% Not sure6.6%10.5%6.7%6.4% Irrelevant26.8%14.8%11.3%12.7% Totally irrelevant42.8%31.9%4.6%8.1%
23
Discussion Using link alone, the precision at 10 is 66.4%. Much lower than using “complete link”. Random anchor texts.
24
Questions? baw4@cse.lehigh.edu davison@cse.lehigh.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.