Download presentation
Presentation is loading. Please wait.
Published byMarian Reynolds Modified over 9 years ago
1
1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014 Dept. of Informatics & Telecommunications University of Athens, Greece
2
2 Online User Activities ActivityABS Survey StatCan Survey Infoplease Survey Emailing91%93%92% General Web Browsing 87%> 70%83% Online Purchases 45%> 50%62% Download Content 37%~30%42%
3
3 Facilitating Downloads Save Link In Folder
4
4 Facilitating Downloads Save Link In Folder Problems: Predefined Directories Blunt approach / No learning UI Clutter Tedious user management
5
5 A principled solution
6
6 Associate the navigation through the hierarchy with a cost function One possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicks HNC(imgs/, docs/) = 2
7
7 Problem Definition Given The hierarchical structure A target directory T, where the resource will be saved Goal Suggest a directory S that minimizes the cost function cf( S, T )
8
8 Problem Definition Given The hierarchical structure A target directory T, where the resource will be saved Goal Suggest a directory S that minimizes the cost function cf( S, T ) But if I know T, why not suggest T directly? (0 cost)
9
9 Problem Definition Given The hierarchical structure A target directory T, where the resource will be saved Goal Suggest a directory S that minimizes the cost function cf( S, T ) But if I know T, why not suggest T directly? (0 cost) In this setting, we don’t know T until it’s too late!
10
10 Casting to a classification framework Directories are potential class values T is the true target class S is the output of a classification process Web resource properties → classification features Recommend S that best matches T Use directories from past saves as candidate classes
11
11 Features & Distances FeatureDistance TimestampExponential decay Domain (current / referrer)Equality Path, filename (current / referrer page) Tokenize & Jaccard TitleTokenize & Jaccard FilenameTokenize & Jaccard ExtensionCovariance Matrix KeywordsJaccard
12
12 Experimental Setup Implement classifier as a FF plugin DiDoCtor approach Javascript 1-NN classifier 6 participants 4-month minimum use period Baseline Last-by-domain (LBD), current browser approach Simulated, based on submitted result Metrics Click Distance: HNC, Breadcrumbs Classification Accuracy
13
13 Preliminary Result Analysis
14
14 Preliminary Result Analysis Take Home Messages 1.Users have different saving pattern behavior(s)
15
15 Preliminary Result Analysis Take Home Messages 1.Users have different saving pattern behavior(s) 2.Users have high variability in their accesses to each directory
16
16 Click Distance - HNC Take Home Message Significant reduction in number of clicks to reach target directory!
17
17 Click Distance - HNC Take Home Message Significant reduction in number of clicks to reach target directory! Click distance gain is even higher when considering a breadcrumbs UI!
18
18 Running Accuracy Take Home Message DiDoctor is much more accurate in predicting the download directory
19
19 Basic Model Extensions Feature reweighting RELIEF_F
20
20 Basic Model Extensions Feature reweighting RELIEF_F Suggesting k directories
21
21 Alternative classifiers Take Home Messages Classifiers can help! DiDoCtor generally performs the best Accuracy is affected by user behavior!
22
22 Conclusions & Future work Approach for facilitating downloads Optimization problem & classification framework Experimentation with real users Basic model extensions Further exploit the temporal dimension More informative features (e.g., entities) Automatic generation of directories
23
23 Thank you! Questions? Acknowledgements To the evaluators of our plugin Heraclitus II fellowship, THALIS-GeoComp, THALIS-DISFER, Aristeia-MMD, EU project INSIGHT
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.