1 Automatic Classification of Bookmarked Web Pages Chris Staff Third Talk February 2007.

Slides:



Advertisements
Similar presentations
MFA for Business Banking – Security Code Multifactor Authentication: Quick Tip Sheets Note to Financial Institutions: We are providing these QT sheets.
Advertisements

For Details Visit : or For any Help Contact the Librarian EBSCOhost 2.0.
Session Variables Storing Session Variables on the Server.
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
We’ll be spending minutes talking about Quiz 1 that you’ll be taking at the next class session before you take the Gateway Quiz today.
Cayuse Tools for Research Plans. 2 Why Cayuse? Making the “Whole Job” Easier SF 424 Forms completion Auto-Population Information Reuse Form Filling Calculation.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Linux, it's not Windows A short introduction to the sub-department's computer systems Gareth Thomas.
Premier Director Document Imaging
Online Collaboration Applications ADE100- Computer Literacy Lecture 28.
1 Community Investment Impact System (CIIS) CIIS 8.1 CDFI TLR Excel Upload February 2011 US Department of the Treasury Community Development Financial.
North Carolina State University ©2004 Labwrite Project.
Office Timesheets Tool Or go to and click on the Office Timesheets link in the footer.
The Internet 8th Edition Tutorial 1 Browser Basics.
Activating Pilot Account ( first time users ) Web-based Activation Browse to 1. Click on the link on the lower right that says.
Installing software on personal computer
OCLC Online Computer Library Center Distributing ILLiad Reports Created Using Microsoft Access David Larsen Head of Access Services University of Chicago.
Online Surveys A Look at Cardiff-TeleForm Denise H. Wells Planning and Research Central Piedmont Community College.
Before you begin If a yellow security bar appears at the top of the screen in PowerPoint, click Enable Editing. You need PowerPoint 2010 to view this presentation.
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
1 ADVANCED MICROSOFT WORD Lesson 15 – Creating Forms and Working with Web Documents Microsoft Office 2003: Advanced.
Blackboard 9.1 Presented by: Kim Shaver Associate Director of Educational Technology Assisted by : Alicia Harkless, Educational Technology Specialist,
1 Guide to Novell NetWare 6.0 Network Administration Chapter 11.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
© The Johns Hopkins University and The Johns Hopkins Health System Corporation, 2011 Using the Online HSOPS & RC Apps for CSTS Armstrong Institute for.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Plan My Move & MilitaryINSTALLATIONS May, 2008 Relocation Personnel Roles and Responsibilities MC&FP.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Software.
1 Automatic Classification of Bookmarked Web Pages Chris Staff First Talk February 2007.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
2 Copyright © 2004, Oracle. All rights reserved. Running a Forms Developer Application.
Lecture 8 – Cookies & Sessions SFDV3011 – Advanced Web Development 1.
New format which merges with IGAP (Individual Growth Action Plan) My Learning Plan.
 Whether using paper forms or forms on the web, forms are used for gathering information. User enter information into designated areas, or fields. Forms.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
COMP 208/214/215/216 – Lecture 8 Demonstrations and Portfolios.
1 Automatic Classification of Bookmarked Web Pages Individual APT Presentation January 2007.
1 Using FAA Access to CPS Online to Request ISIRs from the New ISIR Datamart Ginger Klock Matt Kain Session 22.
August 2005 TMCOps TMC Operator Requirements and Position Descriptions Phase 2 Interactive Tool Project Presentation.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Graphical Enablement In this presentation… –What is graphical enablement? –Introduction to newlook dialogs and tools used to graphical enable System i.
Software AS Module Heathcote Ch.20. Importance of Information  Information technology is fundamental to the success of any business  The information.
Facebook is a social utility that connects you with the people around you. Use Facebook to…  Keep up with friends and family  Share photos and videos.
EZRA STATISTICS & GCFA CHURCH USER GUIDE. LOGGING IN The first time you log in to the program, you will be led through initial steps to set up your account.
ITM © Port,Kazman 1 ITM 352 Cookies. ITM © Port,Kazman 2 Problem… r How do you identify a particular user when they visit your site (or any.
Folio3 IPhone Training Session 2 Testing App on device Presenter: Imam Raza.
Systems Software. Systems software Applications software such as word processing, spreadsheet or graphics packages Operating systems software to control.
: Information Retrieval อาจารย์ ธีภากรณ์ นฤมาณนลิณี
Module 4 Creating EMC Files, Uploading EMC Files and Downloading Reports PC-ACE Pro32.
Horizon Photo-mote. ability to access photographs and images stored online, with the aid of a wireless remote remote enables the user to identify and.
Day 22, Slide 1 CSE 103 Day 22 Non-students: Please logout by 10:12. Students:
Information Resources at University of Worcester Information and Learning Services.
Download & Convert Videos 1.How to Download Videos from YouTube & other Web sites; and 2.Convert them to WMV format so can be edited in Windows Movie Maker.
Using the AFRESH software Ruaraidh Dobson University of Aberdeen V1.0 27/04/2016.
The Next Step Hudson Fare Files 102 – Import & upload Rev. 10/14.
Editing Documents.
SAP ERP Basic System Navigation
Microsoft Word Illustrated
Editing Documents.
We’ll be spending minutes talking about Quiz 1 that you’ll be taking at the next class session before you take the Gateway Quiz today.
New Functionality in ARIN Online
Editing Documents.
Manage Sourcing - Supplier
Presentation transcript:

1 Automatic Classification of Bookmarked Web Pages Chris Staff Third Talk February 2007

2 Tasks 1.Representation of bookmark categories 2.Two clustering/similarity algorithms 3.Extra utility 4.User interface 5.Evaluation 6.Write up report

3 Overview User Interface –To replace the built in ‘Bookmark this Page’ menu item and keyboard command –To display a new dialog box to users to offer choice of recommended category, last category used, and to allow user to select some other category or create a new category

4 Overview Extra Utility: How can the classification of web pages to be bookmarked be improved? –What particular interests do you have, and how can they be used to improve classification? E.g., synonym detection, automatic reorganisation of bookmarks, improved interface, …

5 Overview Evaluation –Will be standard and automated –For testing purposes, download test_eval.zip from home page Contains 2x8 bookmark files (.html) and one URL file (.txt) Bookmark files are ‘real’ files collected one year ago URL file contains a number of lines with following format: –Bk file ID, URL of bookmarked page, home category, exact entry from bookmark file (with date created, etc.)

6 Overview Evaluation (contd.) –Challenge to also ‘re-create’ bookmark file in the order that it was created by users –Eventually, close to the end of the APT, the evaluation test data set will be made available About 20 unseen bookmark files and one URL file –Same format as before –You’ll get bookmark files early to prepare representations, but classification run (URL file) will be part of a demo session

7 User Interface Graphical, as part of the Web Browser Command-line based, or equivalent, for evaluation purposes

8 User Interface Graphical –Can be built using Bugeja’s HyperBK as framework –User needs to be able to select clustering algorithm When system is idle, recalculate different centroids for categories –User needs to be able to switch on/off extra utility –Whenever user bookmarks a page, system kicks in, performs functions, and presents dialog box to user

9 User Interface Graphical –From dialog box, user should be able to Use the recommended category Use a different category Create a new category Store bookmark in last category *used* Store bookmark in last category used to store a bookmark –It needs to be user friendly!

10 User Interface Command line: –To enable the evaluation to take place without user intervention –Essentially, call program with location of bookmark files directory (which contains bookmark files and the URL file), clustering algorithm to use, extra utility on/off, where to store results, switch logging on/off. If not practical, then embed call inside web browser

11 Extra Utility What inefficiencies or problems are the with the current methods? –How can they be improved? E.g., better term selection (e.g., synonym detection); anything you like, but I need to approve it Or how can overall system be improved? –E.g., automatic re-organisation of bookmark file to classify unclassified bookmarks, improved interface Need to work independently, but no need for utility to be unique

12 Evaluation Two types of evaluation –One to determine if randomly selected bookmarks can be placed into the “correct” category –Another to attempt to re-build the bookmark file in the order the user created it (only for bookmarks in categories) Both types must run in “batch” mode (via a command-line interface or equivalent)

13 Evaluation You each have test_eval.zip (from APT’s home page) This is the test set Format of files was explained earlier Later, you’ll get an evaluation set to use to prepare category representations –You will run the random evaluation under lab conditions, as part of the demo –You will *not* run the re-build evaluation on these, but you must provide a mechanism for me to run it

14 Evaluation Categorising random bookmarks –Bookmarks will be selected from user created categories –Classifying bookmark into correct category is a ‘hit’ Otherwise, it counts as a ‘miss’ –On average, you’re aiming for a minimum of 80% accuracy (using either classification algorithm) during the evaluation run. –Report results as percentage accuracy, and report the statistical significance of your results –Also report average time overhead per URL *once* the page has been downloaded until it is classified.

15 Evaluation Re-building the bookmark file (1) –Each entry in the bookmark file is time stamped –You can determine the order in which URLs were bookmarked overall –You may ignore uncategorised bookmarks (or you can suggest a category for them, but don’t count them for the evaluation) –If a URL is the first bookmark in a category, then assume user created category and placed URL in it manually –For all others, correct classification is a hit, otherwise it’s a miss

16 Evaluation Re-building the bookmark file (2) –After each URL classification, bookmark the URL into the correct (user-determined) category, and re-calculate the category centroid –Continue for all bookmarks –What is the overall success (hit) rate? What is the hit rate with just 1 (user located) bookmark in a category, with 3, 5, 10, 15+? (Does the hit rate improve as the number of bookmarks increases?) What’s the statistical significance? –Remember to use both algorithms (HDIFT and your own), with and without the extra utility (if possible) –Report average time overhead

17 Evaluation Write up the results based on test data (each algorithm, with and without the extra utility (if the utility is designed to improve results) for random and re-build) You won’t be able to write results for the evaluation (because it is likely to take place after you submit the report!), but your program will report on results

18 Evaluating the Extra Utility If your extra utility does not directly improve classification, but performs some other function (e.g., categorises unclassified bookmarks when file is imported into Firefox), then explain how you would evaluate it.

19 More Pitfalls Frames Pages that are no longer on-line