1 Automatic Classification of Bookmarked Web Pages Chris Staff First Talk February 2007.

Slides:



Advertisements
Similar presentations
My EBSCOhost Tutorial Tutorial support.ebsco.com.
Advertisements

Step 1 Start your web browser (Internet Explorer or Firefox). Step 2 Type: in the Address box Step 3 Press Enter on the keyboard.
MY NCBI (module 4.5). MODULE 4.5 PubMed/How to Use MY NCBI Instructions - This part of the: course is a PowerPoint demonstration intended to introduce.
For Details Visit : or For any Help Contact the Librarian EBSCOhost 2.0.
MyGateway. Top Tabs MyGateway Home Students Library
Blackboard Hands-On Lab Session Karl R. Wurst Computer Science Department Daron Barnard Biology Department Center for Teaching and Learning Worcester State.
® Microsoft Office 2010 Browser and Basics.
Reference Management Software Tools Mendeley. Table of Contents: Part A Background/Location Signup/Login Import References Organize (Manage) References.
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
Online Collaboration Applications ADE100- Computer Literacy Lecture 28.
CIS101 Introduction to Computing Week 08. Agenda Your questions JavaScript text Resume project HTML Project Six This week online Next class.
North Carolina State University ©2004 Labwrite Project.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Web browsers.
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
1 Lab Session-6 CSIT-121 Spring 2005 Structured Choice The do~While Loop Lab Exercises.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Browser and Basics Tutorial 1. Learn about Web browser software and Web pages The Web is a collection of files that reside on computers, called.
Instant E-Portfolios By: Ramesh Sabetiashraf Santa Ana College For Faculty and Students.
Nurse Practitioner Data Log: Student-Patient Computer Data Log Form The University of Michigan School of Nursing.
New School Websites Teacher Pages. Visit the SCUSD Website for videos tutorials: For more information.
Jean Phillips Schwerdtfeger Library Space Science and Engineering Center University of Wisconsin-Madison November 2005.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
1 ADVANCED MICROSOFT WORD Lesson 15 – Creating Forms and Working with Web Documents Microsoft Office 2003: Advanced.
Digital Image Processing Lecture3: Introduction to MATLAB.
Internet. Internet is Is a Global network Computers connected together all over that world. Grew out of American military.
Computer Science : Information Systems Design and Development Unit Web Sites - National 4 / 5 St Andrew’s High School-Revised January 2013 Slide 1 St Andrew’s.
Lesson 13: Building Web Forms Introduction to Adobe Dreamweaver CS6 Adobe Certified Associate: Web Communication using Adobe Dreamweaver CS6.
Classroom User Training June 29, 2005 Presented by:
LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Springerlink.com Introduction to SpringerLink springerlink.com.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 Exploring Microsoft Office Word 2007 Chapter 8 Word and the Internet Robert Grauer, Keith.
Creating a Web Site to Gather Data and Conduct Research.
Log on to Digital Locker Website You should be able to log on using Internet Explorer browser at the campus. You may need to log in using Mozilla FireFox.
Plan My Move & MilitaryINSTALLATIONS May, 2008 Relocation Personnel Roles and Responsibilities MC&FP.
COMPREHENSIVE Windows Tutorial 4 Working with the Internet and .
Windows Tutorial 4 Working with the Internet and
Support.ebsco.com My EBSCOhost Tutorial Tutorial.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
ECT 250: Survey of E-Commerce Technology FrontPage Publishing pages Unix.
1. If you know your user name (first initial, last name) and password, you will start here If you do not know your user name, start here and register.
 Whether using paper forms or forms on the web, forms are used for gathering information. User enter information into designated areas, or fields. Forms.
System for Administration, Training, and Educational Resources for NASA SATERN Overview for Users December 2009.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
1 Automatic Classification of Bookmarked Web Pages Individual APT Presentation January 2007.
Chapter 1 Review Chapter 2 Whatcha Gonna Do???
1 Automatic Classification of Bookmarked Web Pages Chris Staff Third Talk February 2007.
1 / 61 Using the Customer Support Web Site © 2006, Universal Tax Systems, Inc. All Rights Reserved. Customer Support Site Objectives –In this chapter you.
COP 3813 Intro to Internet Computing Prof. Roy Levow Lecture 1.
MODULE 3 Internet Basics © Paradigm Publishing, Inc.1.
Part 4 Processing and saving data with CGI/Perl Psychological Science on the Internet: Designing Web-Based Experiments From the Ground Up R. Chris Fraley.
XP Browser and Basics COM111 Introduction to Computer Applications.
Student Quick Start Guide Prepared by: Information Services Division Perpustakaan Sultan Abdul Samad Universiti Putra Malaysia
University of Malta CSA4080: Topic 7 © Chris Staff 1 of 15 CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department.
1 FollowMyLink Individual APT Presentation First Talk February 2006.
Computer Skills (1) Internet Explorer. To open the Internet Explorer: –Double click on the Internet Explorer icon on Desktop. –Or, from Start  All Programs.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
ISP 1600 for Winter 2005 Web.Edu: How Internet Courses Work Course web site: Third meeting January 27, 2005.
ACES User Interface Workshop #1 Prototype Inspection 22. November 2011.
OverDrive Digital Library Basics
How to use PSCEZPRINT prairiestate.edu/ezprint Web site address: 1 2
StaffTrac By Educational Vistas, Inc.
A Brief Introduction to the Internet
OverDrive Digital Library Basics
How to use PSCEZPRINT Web site address: prairiestate.edu/ezprint
Steps in accessing Past Examination Papers
Reference Management Software Tools Mendeley (Part A)
Dongwhan Kim Annie Zhao Steven Lawrance
CS/INFO 430 Information Retrieval
Presentation transcript:

1 Automatic Classification of Bookmarked Web Pages Chris Staff First Talk February 2007

2 Overview General Principles Reading List Tasks involved Schedule

3 General Principles Web site: Plagiarism Referencing ACM Digital Library: Membership for students from MaltaMembership for students from Malta

4 Reading List –Abrams, D., Baecker, R.: How people use WWW bookmarks. In: CHI ’97: CHI ’97 extended abstracts on Human factors in computing systems, New York, NY, USA, ACM Press (1997) –Bugeja, I.: Managing WWW browser’s bookmarks and history (a Firefox extension). Final year project report, Department of Computer Science & AI, University of Malta, –Cockburn, A., McKenzie, B.: What do web users do? an empirical analysis of web use. In: Int. J. Hum.-Comput. Stud. 54(6) (2001) –Staff, C.: Automatic Classification of Web Pages into Bookmark Categories. Submitted to UM’07, –Staff, C.: CSA3200 User Adaptive Systems Lecture Notes, Follow link from –Mozilla Development Center: 2006, “Building an Extension”.,

5 Classifying Bookmarks When a user bookmarks a page (or adds a page to Favorites) we want to recommend the best existing category –Improvement over simply recommending last category saved to –Improvement over simply offering ‘category root’

6 Tasks 1.Representation of bookmark categories 2.Two clustering/similarity algorithms 3.Extra utility 4.User interface 5.Evaluation 6.Write up report

7 Tasks Overview We are going to implement a number of algorithms to help with the overall task. –Some of these will be used while the user is browsing –Others will be used to classify pages ‘off-line’ (especially for the existing bookmark files) We’re going to have a ‘standard test bed’ for conducting the evaluation

8 Tasks Overview Represent bookmark categories –We’re starting with populated bookmark files, so use ‘How Did I Find That?’ approach –Plus another, individual approach When a page is to be bookmarked –If referrer page is available, identify topic of page –Otherwise, identify page topic using ‘How Did I Find That?’ approach Compare current topic topic to bookmark category representations

9 Tasks Overview User Interface –To replace the built in ‘Bookmark this Page’ menu item and keyboard command –To display a new dialog box to users to offer choice of recommended category, last category used, and to allow user to select some other category or create a new category

10 Tasks Overview Evaluation –Will be standard and automated –For testing purposes, download test_eval.zip from home page Contains 2x8 bookmark files (.html) and one URL file (.txt) Bookmark files are ‘real’ files collected one year ago URL file contains a number of lines with following format: –Bk file ID, URL of bookmarked page, home category, exact entry from bookmark file (with date created, etc.)

11 Tasks Overview Evaluation (continued) –Challenge to also ‘re-create’ bookmark file in the order that it was created by users –Eventually, close to the end of the APT, the evaluation test data sets will be made available About 20 unseen bookmark files and one URL file –Same format as before –You’ll get bookmark files early to prepare representations, but classification run will be part of a demo session

12 Tasks Overview Write up report –We’ll spend some time looking at the structure of a scientific report, how to write a literature review, present evaluation results, etc.

13 Task: Representing Bookmark Categories We need to identify what a category or collection of bookmarks is about so that we can check if a new page could belong to that category Ideally, we find out what is similar between the different documents in the category (especially if we know which link a user followed to reach child!) In the absence of this information use: –One algorithm will be based on ‘How Did I Find That?’ –A second algorithm that is up to you

14 Task: Two clustering/similarity algorithms Once we have represented the categories, we can ‘send’ page to be bookmarked to best category –Similar to ‘information filtering’ or ‘clustering’ –What similarity measure or clustering algorithm to use? One way of representing page to be classified will be based on ‘How Did I Find That?’ Other way researched/developed by you

15 Task: Extra Utility How can the classification of web pages to be bookmarked be improved? –What particular interests do you have, and how can they be used to improve classification? E.g., synonym detection, automatic reorganisation of bookmarks, …

16 Task: User Interface Can use XUL to ‘extend’ Mozilla Firefox – Use Ian Bugeja’s HyperBK as a framework (with due referencing and acknowledgement, of course): Programs are likely to be JavaScript Your extension will then be portable

17 Task: User Interface You can use Ian’s interface, but it may need some work to tweak it: –To support some of the new functionality that you’re adding (e.g. choice of algorithms) –And to fix some of the usability problems with the dialog box

18 Task: Evaluation ACofBWP will be evaluated! But you must build a version of the program that can be called in batch mode; that will accept a directory containing bookmark files and a URL file; that will run in two modes (classify and reconstruct); and that will report faithfully on its performance.

19 Task: Write Up Report At least one tutorial will be dedicated to good report writing practice; how to write a literature review; how to build and write references; how to present evaluation results.

20 Grading Structure 10% for obtaining an average of at least 0.8 precision on evaluation (for random bookmark classification, using either implemented approach) 10% for incurring a maximum 2 second overhead on average to classify a page (must faithfully report time overhead) Max. 10% for extra utility. 40% Report 15% Presentation 15% Artifact Design/Implementation

21 Future Opportunities FYP supervision Opportunity to co-author research paper that will be submitted to leading IR/AH/UM conference (irrespective of FYP)

22 Pitfalls Utilities must be lightweight –Mostly those that are interactive, or that are invoked while user is browsing Should all of a document be used to contribute to a category representation/be used in a similarity measure?

23 Schedule Until w.c. 6th March inc: Discussion, talks once/week w.c 19th March: Submit TOC/chapter overview for feedback (optional) w.c. 23th Apr: Demo 1 (optional) 23th Apr-7th May: Submit one chapter of your choice for feedback (optional) w.c. 7th May: Demo 2 (optional) 14th May: Evaluation collection will be made available May 25: Submit APT report June: Demo and evaluation under exam conditions