A novel Web usage mining approach for search engines Authors: Dell Zhang, Yisheng Dong Source: Computer Networks, Vol. 39, Issue 3, June 21 2002, pp. 303-310 Speaker: Pei-Yu Lin Data: 8-May-03
User will only look an extremely small part of the search results only Search engines locate information based on the textual similarity of a query The search engine returns thousands of Web resource pointers to user after a general query User will only look an extremely small part of the search results only 歡迎光臨張真誠網頁 http://www.cs.ccu.edu.tw/~ccc/ 資訊安全上課投影片 http://filter.cs.ccu.edu.tw/courses/image_processing/slides/report.php 歷年研究計畫一覽表 http://www.automation.ccu.edu.tw/9.htm 專利 http://ics.stic.gov.tw/Patent/index.php?action=show&year=2001 「自然科學博物館國家典藏數位化計畫」 ... http://www.ndap.org.tw/active/report/910625.shtml 旗標學校叢書目錄 http://www.flag.com.tw/school/book.htm 張真誠系統與軟體工程傑出人才 http://www.tcssh.tc.edu.tw/news/administration/20010815.htm … …一直無法忘懷芬蘭的大自然氣息,以及那一張張真誠的年輕 ... …沉澱一年多,Tanya再度交出一張真誠動人的成跡單。 …小曼彷彿看到瞭志摩那張真誠得幾乎能夠感化世人所有 ...
Exploit the relationships among users, queries and resources The output of the query is a list consisting of the resources with the highest quality weights(authority & freshness)
MASEL(matrix analysis on search engine log) For the set of all users who have issued the query q* are constructed All resources relevant to these queries can be constructed through traditional keyword-base IR. Compute the numerical quality estimates of the found resources Web resources with the highest quality weights are returned in order for the search topic
Ex: 210.74.165.87 970813 ‘Car’ http://www.hello.com.tw/~w372/img1.jpg http://www.hello.com.tw/~w372/img2.jpg time-window’s width = week User Timestamp Query Results : Tom 970813 ‘Car’ img1, img2, img7, img4, img5, … Tom 970817 ‘Auto’ img9, img3, img10, … Tom 970818 ‘Bus’ img3, img6, img17, img13, … Jack 970814 ‘Car’ img7, img1, img2, img4, img9, img6, … Jack 970814 ‘Bus’ img1, img5, img4, img9, img2, … Rose 970813 ‘Car’ img3, img1, img10, img9, img1, img6, … Rose 970814 ‘Car’ img10, img1, img12, img14, img9, img6, … Rose 970815 ‘Auto’ img14, img5, img3, img4, img9, img6, …
User Timestamp Query Accessed images Tom 970813 Car img1, img1, img2 970817 Auto 970818 Bus Jack 970814 img1, img2 img4 Rose img1 970815 img3 DB
A = num(ui, qj) B = sim(qj, rk) C = hitq(rk, ui) User Timestamp Query Accessed images Tom 970813 Car img1, img1, img2 970817 Auto 970818 Bus Jack 970814 img1, img2 img4 Rose img1 970815 img3 A = num(ui, qj) u q B = sim(qj, rk) q r C = hitq(rk, ui) 1 u r
return the well-ordered list of image: Rose, Tom, Jack (1) u ← ABC u q q r 1 u r Tom Jack Rose r = (0.41, 0.4, 0.69)T return the well-ordered list of image: Rose, Tom, Jack
return the well-ordered list of image: Car, Auto, Bus (2) q ← BCA q r 1 u r u q Car Auto Bus r = (0.87, 0.14, 0.09)T return the well-ordered list of image: Car, Auto, Bus
return the well-ordered list of image: img1, img2, img3, img4 (3) r ← CAB 1 u r u q q r img1 img2 img3 img4 r = (0.96, 0.28, 0.03, 0.01)T return the well-ordered list of image: img1, img2, img3, img4
Application in MASEL(eeFind) Side effect in MASEL Return some images labeled with ‘BMW’, ‘Porsche’ or ‘Rolls Royse’… because they are often queried by the uses with similar interests recently query ‘Car’
Conclusions The algorithm, MASEL, can exploit the relationships among users, queries and resources The proposed approach reveals its power to achieve better ranking and query expansion effects