Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics.

Similar presentations


Presentation on theme: "1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics."— Presentation transcript:

1 1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014 Dept. of Informatics & Telecommunications University of Athens, Greece

2 2 Online User Activities ActivityABS Survey StatCan Survey Infoplease Survey Emailing91%93%92% General Web Browsing 87%> 70%83% Online Purchases 45%> 50%62% Download Content 37%~30%42%

3 3 Facilitating Downloads Save Link In Folder

4 4 Facilitating Downloads Save Link In Folder Problems: Predefined Directories Blunt approach / No learning UI Clutter Tedious user management

5 5 A principled solution

6 6 Associate the navigation through the hierarchy with a cost function One possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicks HNC(imgs/, docs/) = 2

7 7 Problem Definition Given  The hierarchical structure  A target directory T, where the resource will be saved Goal  Suggest a directory S that minimizes the cost function cf( S, T )

8 8 Problem Definition Given  The hierarchical structure  A target directory T, where the resource will be saved Goal  Suggest a directory S that minimizes the cost function cf( S, T ) But if I know T, why not suggest T directly? (0 cost)

9 9 Problem Definition Given  The hierarchical structure  A target directory T, where the resource will be saved Goal  Suggest a directory S that minimizes the cost function cf( S, T ) But if I know T, why not suggest T directly? (0 cost) In this setting, we don’t know T until it’s too late!

10 10 Casting to a classification framework Directories are potential class values T is the true target class S is the output of a classification process Web resource properties → classification features Recommend S that best matches T  Use directories from past saves as candidate classes

11 11 Features & Distances FeatureDistance TimestampExponential decay Domain (current / referrer)Equality Path, filename (current / referrer page) Tokenize & Jaccard TitleTokenize & Jaccard FilenameTokenize & Jaccard ExtensionCovariance Matrix KeywordsJaccard

12 12 Experimental Setup Implement classifier as a FF plugin  DiDoCtor approach  Javascript  1-NN classifier 6 participants  4-month minimum use period Baseline  Last-by-domain (LBD), current browser approach  Simulated, based on submitted result Metrics  Click Distance: HNC, Breadcrumbs  Classification Accuracy

13 13 Preliminary Result Analysis

14 14 Preliminary Result Analysis Take Home Messages 1.Users have different saving pattern behavior(s)

15 15 Preliminary Result Analysis Take Home Messages 1.Users have different saving pattern behavior(s) 2.Users have high variability in their accesses to each directory

16 16 Click Distance - HNC Take Home Message Significant reduction in number of clicks to reach target directory!

17 17 Click Distance - HNC Take Home Message Significant reduction in number of clicks to reach target directory! Click distance gain is even higher when considering a breadcrumbs UI!

18 18 Running Accuracy Take Home Message DiDoctor is much more accurate in predicting the download directory

19 19 Basic Model Extensions Feature reweighting  RELIEF_F

20 20 Basic Model Extensions Feature reweighting  RELIEF_F Suggesting k directories

21 21 Alternative classifiers Take Home Messages Classifiers can help! DiDoCtor generally performs the best Accuracy is affected by user behavior!

22 22 Conclusions & Future work Approach for facilitating downloads  Optimization problem & classification framework Experimentation with real users  Basic model extensions Further exploit the temporal dimension More informative features (e.g., entities) Automatic generation of directories

23 23 Thank you! Questions? Acknowledgements  To the evaluators of our plugin  Heraclitus II fellowship, THALIS-GeoComp, THALIS-DISFER, Aristeia-MMD, EU project INSIGHT


Download ppt "1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics."

Similar presentations


Ads by Google