Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mining Test Oracles for Search Engines Wujie Zheng

Similar presentations


Presentation on theme: "1 Mining Test Oracles for Search Engines Wujie Zheng"— Presentation transcript:

1 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk

2 2 Outline Search Engines Evaluation/Testing Our Approach Data Collection Examples

3 3 Search Engines Evaluation/Testing

4 4 Search Engine Evaluation Prepare a set of queries and the ground truth, then evaluate the results of different search engines using well-defined measurements  How to prepare queries, i.e., test inputs?  How to get the ground truth, i.e., test oracles?

5 5 Test Oracles Previous Approaches  Manually labeling too costly, hardly reusable  Clickthrough Data cannot find relevant pages that are not in the search results  Automatic labeling based on the search results of multiple search engines at the same time bias to systems of similar characteristics  Use previous search results as test oracles desired search results may change

6 6 Mining Test Oracles from Search Results

7 7 Basic Idea Mine implicit rules between inputs/outputs, e.g.,  tvguide.com, => imdb.com;  basketball-reference.com, => nba.com  ericsson,sony, => sonyericsson.com

8 8 Build The Dataset Terms (features) of inputs  Query words  Query types Terms (features) of outputs  Domains of top 10 search results Terms (features) of multiple search engines  Search engine + domains of top 10 search results

9 9 Example Dataset pine,furniture,Home.csv,barnfurnituremart.com,americancount ryhomestore.com,overstock.com,prairiecountryfurniture.com,e tsy.com,unfinishedfurnituregiant.com,cozylogfurniture.com,dir ectfrommexico.com,oakplus.com,sawdustcityllc.com, buy,wine,online,Food.csv,wine.com,foodandwine.com,market viewliquor.com,winechateau.com,wines.com,thewinebuyer.co m,wineweb.com,alloutwine.com,cellaraiders.com,french-wine- online.com, piercing,labret,Beauty.csv,wikipedia.org,youtube.com,youtube.com,about.com,ygoy.com,ehow.com,bodyjewelleryshop.com, google.com,bmezine.com,piercingdot.com

10 10 Example Dataset interest,rates,today,Finance.csv,real.csv,google:wellsfargo.com,g oogle:bankrate.com,google:marketwatch.com,google:interest.co m,google:interest.com,google:mortgagenewsdaily.com,google:us bank.com,google:mortgage101.com,google:yahoo.com,google:m ortgageloan.com,bing:wellsfargo.com,bing:bankrate.com,bing:ma rketwatch.com,bing:wsj.com,bing:interest.com,bing:interest.com, bing:bankrate.com,bing:usbank.com,bing:yahoo.com,bing:usatod ay.com,yahoo:bankrate.com,yahoo:wellsfargo.com,yahoo:bankra te.com,yahoo:interest.com,yahoo:msn.com,yahoo:money- rates.com,yahoo:cnn.com,yahoo:yahoo.com,yahoo:fxstreet.com, yahoo:marketwatch.com,

11 11 Association Rule Mining A,B,C=>D confidence(A=>D) = support(A,D)/support(A)  bing:mlb.com, => google:mlb.com,  support(bing:mlb.com, google:mlb.com)=26,  support(bing:mlb.com)=27,  confidence(bing:mlb.com, => google:mlb.com, )=26/27

12 12 Association Rule Mining Mine all frequent itemsets We are most interested in the single postfix rules, i.e., A=>B, where B’s size is 1 Algorithm  For each itemset S For each u in S  Check the rule S-u => u

13 13 Data Collection

14 14 Search Engines Google Bing Yahoo Baidu Sogou Soso

15 15 Queries Google trends (hot queries), 1000 queries Queries in KDDCUP 2005, 800 queries Google Adwords, 15,000 queries, 22 types Baidu Tops

16 16 Examples

17 17 dpreview.com,kenrockwell.com, => amazon.com, : 29/29=1.0, violations: test: 37/40, violations: 3881,4691,4783, amazon.com,kenrockwell.com, => dpreview.com, : 29/29=1.0, violations: test: 37/39, violations: 2089,8921, canon,amazon.com, => canon.com, : 22/22=1.0, violations: test: 34/38, violations: 4090,4870,5384,7400, canon.com,amazon.com, => canon, : 22/22=1.0, violations: test: 34/38, violations: 3560,5409,8983,8988, canon.com,Hobbies.csv, => canon, : 31/31=1.0, violations: test: 31/34, violations: 3560,5409,8988, canon.com,dpreview.com, => canon, : 22/22=1.0, violations: test: 24/26, violations: 5409,8983, gsmarena.com,samsung.com, => samsung, : 26/26=1.0, violations: test: 32/35, violations: 852,1195,1714, phonenumber.com, => whitepages.com, : 25/25=1.0, violations: test: 11/12, violations: 1077, Hobbies.csv,nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations: 896,8319, nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations: 896,8319, canon.com, => canon, : 37/37=1.0, violations: test: 37/41, violations: 3560,5409,8983,8988, amazon.com,nikon, => nikon.com, : 25/25=1.0, violations: test: 25/27, violations: 896,8319, reversephonedirectory.com,Computer.csv, => whitepages.com, : 22/22=1.0, violations: test: 26/30, violations: 1804,4424,5453,8720,

18 18 Internet.csv,ericsson, => sonyericsson.com, : 24/24=1.0, violations: test: 23/24, violations: 8776, reversephonedirectory.com, => whitepages.com, : 22/22=1.0, violations: test: 28/32, violations: 1804,4424,5453,8720, simplyrecipes.com,about.com, => allrecipes.com, : 25/25=1.0, violations: test: 38/39, violations: 5596, Finance.csv,oanda.com, => xe.com, : 20/20=1.0, violations: test: 27/30, violations: 3410,5566,5781, oanda.com, => xe.com, : 20/20=1.0, violations: test: 28/31, violations: 3410,5566,5781, food.com,foodnetwork.com, => allrecipes.com, : 30/30=1.0, violations: test: 32/34, violations: 7642,8519, foodnetwork.com,simplyrecipes.com, => allrecipes.com, : 39/39=1.0, violations: test: 40/43, violations: 566,5596,7642, ericsson,sony, => sonyericsson.com, : 24/24=1.0, violations: test: 23/24, violations: 8776, myrecipes.com,foodnetwork.com, => allrecipes.com, : 24/24=1.0, violations: test: 28/30, violations: 2748,5252, myrecipes.com,allrecipes.com, => foodnetwork.com, : 24/24=1.0, violations: test: 28/35, violations: 377,1236,1335,1645,3752,6655,6920, phonenumber.com,phone, => whitepages.com, : 20/20=1.0, violations: test: 8/9, violations: 1077, Food.csv,joyofbaking.com, => allrecipes.com, : 27/27=1.0, violations: test: 35/36, violations: 566, nikonusa.com,nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations: 896,8319, joyofbaking.com, => allrecipes.com, : 27/27=1.0, violations: test: 35/36, violations: 566,

19 19 mortgageloan.com, => bankrate.com, : 20/21=0.9523809523809523, violations: 7719, test: 24/28, violations: 545,1603,5073,7711, Finance.csv,mortgageloan.com, => bankrate.com, : 20/21=0.9523809523809523, violations: 7719, test: 24/28, violations: 545,1603,5073,7711, recipes,myrecipes.com, => foodnetwork.com, : 20/21=0.9523809523809523, violations: 7778, test: 20/25, violations: 1236,1335,6655,6920,7770, recipes,myrecipes.com, => allrecipes.com, : 20/21=0.9523809523809523, violations: 7778, test: 24/25, violations: 7770, phonearena.com,samsung, => gsmarena.com, : 21/22=0.9545454545454546, violations: 3806, test: 33/34, violations: 3802, samsung.com,samsungmobile.com, => samsung, : 21/22=0.9545454545454546, violations: 8585, test: 8/10, violations: 1195,4778, food.com,about.com, => allrecipes.com, : 21/22=0.9545454545454546, violations: 2406, test: 43/46, violations: 5740,7359,8893, Dining.csv,mcdonalds, => mcdonalds.com, : 21/22=0.9545454545454546, violations: 5326, test: 20/22, violations: 3470,3569, amazon.com,nikon.com, => nikon, : 25/26=0.9615384615384616, violations: 7295, test: 25/30, violations: 1256,5102,6165,6744,7287, nikon.com,Hobbies.csv, => nikon, : 28/29=0.9655172413793104, violations: 7295, test: 26/31, violations: 1256,5102,6165,6744,7287,

20 20 Examples of Multiple Search Engines

21 21 bing:medicinenet.com,google:emedicinehealth.com, => google:medicinenet.com, : 107/107=1.0, violations: symptoms,bing:medicinenet.com, => google:webmd.com, : 55/55=1.0, violations: Hobbies.csv,yahoo:allrecipes.com, => google:allrecipes.com, : 53/53=1.0, violations: bing:medicinenet.com,yahoo:nih.gov, => google:medicinenet.com, : 100/100=1.0, violations: google:amazon.com,bing:gsmarena.com, => google:gsmarena.com, : 52/52=1.0, violations: bing:gsmarena.com,google:youtube.com, => google:gsmarena.com, : 73/73=1.0, violations: google,google:google.com, => bing:google.com, : 56/56=1.0, violations: google:allrecipes.com,recipe, => bing:allrecipes.com, : 55/55=1.0, violations: bing:medicinenet.com,yahoo:mayoclinic.com, => google:medicinenet.com, : 90/90=1.0, violations: bing:dpreview.com,bing:amazon.com, => google:dpreview.com, : 56/56=1.0, violations:

22 22 bing:medicinenet.com,yahoo:mayoclinic.com, => google:mayoclinic.com, : 89/90=0.9888888888888889, violations: 7124, Home.csv,bing:amazon.com, => google:amazon.com, : 90/91=0.989010989010989, violations: 2124, bing:medicinenet.com,yahoo:wrongdiagnosis.com, => google:medicinenet.com, : 90/91=0.989010989010989, violations: 8556, bing:webmd.com,yahoo:wrongdiagnosis.com, => google:webmd.com, : 95/96=0.9895833333333334, violations: 6305, recipes,yahoo:allrecipes.com, => google:allrecipes.com, : 95/96=0.9895833333333334, violations: 6041, bing:mayoclinic.com,bing:nih.gov, => google:mayoclinic.com, : 102/103=0.9902912621359223, violations: 583, bing:mayoclinic.com,bing:medicinenet.com, => google:medicinenet.com, : 124/125=0.992, violations: 645, bing:medicinenet.com,bing:webmd.com, => google:medicinenet.com, : 136/137=0.9927007299270073, violations: 8556, yahoo:nextag.com,bing:amazon.com, => google:amazon.com, : 172/173=0.9942196531791907, violations: 4773, bing:medicinenet.com,google:mayoclinic.com, => google:medicinenet.com, : 174/175=0.9942857142857143, violations: 645, google:walmart.com,bing:amazon.com, => google:amazon.com, : 177/178=0.9943820224719101, violations: 4773,

23 23 bing:mayoclinic.com,google:nih.gov, => google:mayoclinic.com, : 143/145=0.9862068965517241, violations: 1255,583, bing:amazon.com,yahoo:thefind.com, => google:amazon.com, : 72/73=0.9863013698630136, violations: 4773, symptoms,bing:webmd.com, => google:webmd.com, : 77/78=0.9871794871794872, violations: 6451, yahoo:medicinenet.com,yahoo:wrongdiagnosis.com, => google:medicinenet.com, : 77/78=0.9871794871794872, violations: 8556, yahoo:medicinenet.com,yahoo:mayoclinic.com, => google:mayoclinic.com, : 78/79=0.9873417721518988, violations: 7124, bing:allrecipes.com,yahoo:allrecipes.com, => google:allrecipes.com, : 160/162=0.9876543209876543, violations: 566,5601, yahoo:bankrate.com,bing:bankrate.com, => google:bankrate.com, : 82/83=0.9879518072289156, violations: 6266, Internet.csv,bing:gsmarena.com, => google:gsmarena.com, : 83/84=0.9880952380952381, violations: 7617, bing:gsmarena.com, => google:gsmarena.com, : 86/87=0.9885057471264368, violations: 7617, bing:nextag.com,bing:amazon.com, => google:amazon.com, : 176/178=0.9887640449438202, violations: 4773,7343, bing:mayoclinic.com,bing:answers.com, => google:mayoclinic.com, : 89/90=0.9888888888888889, violations: 6328,

24 24 Thank you!


Download ppt "1 Mining Test Oracles for Search Engines Wujie Zheng"

Similar presentations


Ads by Google