WIRED Week 7 Quick review of Information Seeking Readings Review - Questions & Comment - How does this affect IR system use? - How would this change evaluating.

Slides:



Advertisements
Similar presentations
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Advertisements

1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Optimizing search engines using clickthrough data
Eye Tracking Analysis of User Behavior in WWW Search Laura Granka Thorsten Joachims Geri Gay.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Tom Sheridan IT Director Gas Technology Institute (GTI)
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Design Guidelines for Effective WWW History Mechanisms Linda Tauscher and Saul Greenberg University of Calgary This talk accompanied a paper, and was presented.
Information Retrieval in Practice
Chapter 12: Web Usage Mining - An introduction
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Information Retrieval February 24, 2004
Human Memory Model Predicting Document Access in Large Multimedia Repositories (1996) JAMES E. PITKOW, MARGARET M. RECKER Sam Boham, Asif Hussaini, Christian.
INFO 624 Week 3 Retrieval System Evaluation
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Tasks, scenarios, sitemaps 21 Feb Task Analysis (1/3)  Know who is going to use the system  ID tasks that they now perform  ID tasks that they’d.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
© 2004 Keynote Systems Customer Experience Management (CEM) Bonny Brown, Ph.D. Director, Research & Public Services.
ICTLIP Module 3. Information Seeking in An Electronic Environment
Evaluation of digital Libraries: Criteria and problems from users’ perspectives Article by Hong (Iris) Xie Discussion by Pam Pagels.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Personalization of the Digital Library Experience: Progress and Prospects Nicholas J. Belkin Rutgers University, USA
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Evaluation of Adaptive Web Sites 3954 Doctoral Seminar 1 Evaluation of Adaptive Web Sites Elizabeth LaRue by.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Information Retrieval Evaluation and the Retrieval Process.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.
Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Information commitments, evaluative standards and information searching strategies in web-based learning evnironments Ying-Tien Wu & Chin-Chung Tsai Institute.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Planning an Applied Research Project Chapter 3 – Conducting a Literature Review © 2014 by John Wiley & Sons, Inc. All rights reserved.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Characterising Browsing Strategies in the World Wide Web Lara D. Catledge & James E. Pitkow Presented by: Mat Mannion, Dean Love, Nick Forrington & Andrew.
Information Architecture & Design Week 5 Schedule -Planning IA Structures -Other Readings -Research Topic Presentations Nadalia your Presentations.
Information Architecture & Design Week 6 Schedule -Group Project Plan Due -Browsing and Searching for IA -Other Readings -Research Topic Presentations.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Understanding Web Searching Secondary Readings and So On… Will Meurer for WIRED October 7, 2004.
WIRED Future Quick review of Everything What I do when searching, seeking and retrieving Questions? Projects and Courses in the Fall Course Evaluation.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
ASSOCIATIVE BROWSING Evaluating 1 Jin Y. Kim / W. Bruce Croft / David Smith by Simulation.
Evaluation and Assessment of Instructional Design Module #4 Designing Effective Instructional Design Research Tools Part 2: Data-Collection Techniques.
Information Architecture & Design Week 6 Schedule - Browsing and Searching for IA - Other Readings - Research Topic Presentations - Class Work (if time)
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Cognitive Research for Exploratory Search (CRES)
Search Engine Architecture
Connecting Interface Metaphors to Support Creation of Path-based Collections Unmil P. Karadkar, Andruid Kerne, Richard Furuta, Luis Francisco-Revilla,
Information Architecture & Design
Augmenting (personal) IR
Chapter 12: Automated data collection methods
Planning and Storyboarding a Web Site
Presentation transcript:

WIRED Week 7 Quick review of Information Seeking Readings Review - Questions & Comment - How does this affect IR system use? - How would this change evaluating IR systems? Topic Discussions Web search lab game!

What Is Information Seeking? “a process in which humans purposefully engage in order to change their state of knowledge.” p. 5 “a process driven by human’s need for information so that they can interact with the environment.” p. 28 “begins with recognition and acceptance of the problem and continues until the problem is resolved or abandoned” p. 49 Marchionini more than just representation, storage and systematic retrieval

Information Seeking in Context Learning Information Seeking Information Retrieval Analytical Strategy Browsing Strategy

How do we search? Analytical careful planning recall of query terms iterative query reformulations examination of results batched Browsing heuristic opportunistic recognizing relevant information interactive (as can be)

Iseek - WebTracker study Corporate IT and knowledge workers - In work environment - Own browser and network connection Long-term study (weeks) Overall Web use analyzed Bookmarks, printed pages How sites/pages found Frequency of page visits

Web Study Methodology Surveys Interviews Web Use Data* - History Files - WebTracker - Server Logs Bookmarks* Printouts

Study Elements - Research Design Field Work Field Workers - Data Collection 1. Questionnaire survey 2. WebTracker application (and Proxy Server) 3. Personal interviews

Collecting Web Client Data Modified client - Pitkow and Catledge 1995 Bookmarks Chosen Web sites are personal information space Most valuable data file on user’s system Automatically organizing bookmarks History logs The history mechanism Most promising source for usage data

WebTracker Expanded Window

WebTracker Log

Data Analysis Log files tabulated into spreadsheets Examined for clusters or patterns of behavior Selection of episodes of Information Seeking behavior - a highlighting of the episode by the participant during the personal interview; - evidence of the episode having consumed a relatively substantial amount of time and effort; - evidence that the episode was a recurrent activity. Determined the modes of scanning & moves exercised by the participants

Behavioral Model Recurring Web behavioral patterns that relate people’s browser actions (Web moves) to their browsing/searching context (Web modes) Modes of scanning: Aguilar (1967) & Weick & Daft (1983, 1984) Moves in information seeking behavior: Ellis (1989) & Ellis et. al. (1993, 1997)

Modes of Scanning

Modes of Scanning for Information

ISeek Behaviors & Web Moves

Modes & Moves Model

Behavioral Model Verification 61 identifiable episodes

Behavioral Model Results People who use the Web engage in 4 complementary modes of information seeking Certain browser based actions & events indicate a particular mode of information seeking Surprises - No Explicit Instances of Monitoring to Support Formal Searching - Very Few Instances of “Push” Monitoring - Extracting Involved Basic Search Strategies Only

Interview Highlights Most useful work-related sites: 1.Resource sites by associations & user groups 2.News sites 3.Company sites 4.Search engines Most people do not avidly search for new Web sites Criteria to bookmark is largely based on a site providing relevant & up-to-date information Learning about new Web sites: 1.Search engines 2.Magazines & newsletters 3.Other people/colleagues

Survey Highlights The Web was the 3rd most frequently used source Participants spent about 20% of their work hours using the Web Majority looked for technical information on the Web Quality of Web information was perceived to be “very high” (reliable) Web was perceived as accessible as other “internal” sources however less accessible than mass media sources Few participants deliberately set out to search for new sites

Study 1 Summary Behavioral model of information seeking on the Web People who use the Web engage in complementary modes of information seeking Certain browser based actions & events indicate particular moves in information seeking The study suggests: - that a behavioral framework that relates user motivations and Web moves may be helpful in analyzing Web-based Information Seeking - that multiple, complementary methods of collecting qualitative and quantitative data may help compose a richer portrayal of how individuals use Web-based information in their natural work settings

Study Recommendations

Iseek Expanded Study (2) Larger Dataset One Organization Longer Duration Open-ended Interviews IT Survey More Quantitative Modeling - Glassman (1994); - Catledge & Pitkow (1995); - Tauscher & Greenberg (1997a, 1997b); - Huberman, Pirolli, Pitkow, & Lukose (1998)

New Types Data Collection Sources - Modified Logs - Interviews (More Focused) - Survey (Broader Focus) - Field Observation (Cube Work) Volume - Over 1400 Consistent Users - Over a Month of Web Use - 8+ GB of data

Collecting Web Server Data - Web Server Log Accuracy Hit - a single file is requested from the Web server View - all of the information contained on a single Web page Visit - one series of views at a particular Web site. - Proxy Server Logs Day sampling - stop caching and analyzing data. IP sampling - cancel caching of particular Web users and measuring these results only Continuous sampling - use cookie files to track a particular user(s) - KDD

Survey Highlights Users not motivated to change/update browser versions or startup page IT made no modifications of browser until recently, primarily for system access testing Most of most frequent users from technical departments All IT system work now Web-specific

Interview Highlights Corporate adoption of Internet access driven by Intranet development Local portrayals of successful Web work drove rapid adoption Use of Intranet viewed as both resource conservation and expanded work Logging of Web use data not a high concern Open to recommendations to improve Web use “Webify”ing Everything seen as good

KDD Highlights Extremely High Data Collection Reliability Tightly-focused Web Use (business sites) Very Small (Determinable) Inappropriate Use ( >.001%) Lower than Expected Search Engine Use - Influenced by Startup Page - Internal Search Results Pages Used Higher than Expected (Average) Use of Intranet

KDD Use Highlights 40,000+ episodes 11:15 average episode length Search term mode of 1 - Not dominantly work-related terms - Use of intranet search results influential

Updated Behavioral Model 32,512 identifiable episodes

Behaviors Breakdown

Other Studies Tend to focus on server logs, a broad range of Web users, general Web seeking activity, quantitative methods - Glassman (1994): Proxy Study - Catledge & Pitkow (1995): Surveys and Client tool; - Tauscher & Greenberg (1997a, 1997b): The Back button; - Ingwersen (1995 & 1997): Informetrics - Huberman, Pirolli, Pitkow, & Lukose (1998): Information Foraging, “Law of Surfing” - Huberman “Laws of the Web” (2001)

Study 2 Summary Behavioral Model Scales Up Server Logs Provide Significant Gains in Quantity Server Logs Provide Challenges in Deriving Quality Organizations Provide Focused View of Overall Web Use Knowledge Workers Collaborate (But Not Enough)

Summary (New) Methodology Provide new ideas for data collection & cleaning tools Verify models of Information Seeking and Web Use Discover models of Web usage Find different types of Web users Gain rich descriptions of perception of Web & Web use Evoke new system & interface designs

Other Tools for Web Studies Pete Pirolli, Rob Reeder, Ed Chi, et. al (UIR Group Xerox PARC) Web Logger Eytan Adgar, Bernardo Huberman (Web Ecology PARC, now HP) Andy Edmonds – Uzilla.netUzilla.net Vividence Web Evaluation Tool (WET) Eye Tracking (*)

Improving Web Use Expert Systems - SNLP Multimedia Databases & Metadata Display Technology Better GUIs Better, More Available Search Engines/query Syntax - Desktop Search - Ranking - Relevance Help expert users get more expert

Web Activities Taxonomies What types of activities on the Web have impact? What we do vs. what seems significant Purpose of people’s search - Find Get a fact or document Download information Find out about a product - Compare/Choose: 51% Methods used to find information - Explore, Monitor, Find, Collect: 71% Content for which they are searching - Medical: 18%, People: 13%, …

Berrypicking & IR Flexibility IR systems are rational, users aren’t (always) We don’t search in a linear model - Single query, one good result We gradually build on what we know, how we find it - Footnote chasing (backward chaining) - Citation searching (forward chaining) - Journal run (favorite sites) - Area scanning (browsing) - Subject searches in bibliographies, abstracts & indices - Author searching We combine all of these when searching Interface support for each & combinations

Berrypicking Paths

Web Search Studies Framework Web IR is still relatively new - Differences in users & information - Changes in IR systems are rapid Who doesn’t search now? “A Web searching study focuses on isolating searching characteristics of searchers using a Web IR system via analysis of data, typically gathered from transaction logs.” p 3 Studying Search Engine use - AltaVista, Excite Web Searching Studies - Single & Multiple Web sites

Characterizing Browsing Modifed XMosiac to learn Web browser behavior Path lengths key (but changed) Types of users: - Serendipitous browsers – little repetition, short sequences - General purpose browsers – average, repeated actions - Searchers – long navigational sequences

Cognitive Strategies in Web Search Systems help with: re-representation - different external representations, that have the same abstract structure, make problem- solving easier or more difficult. It also refers to how different strategies and representations, varying in their efficiency for solving a problem. graphical constraining - constrain the kinds of inferences that can be made about the underlying represented concept. temporal and spatial constraining - different representations make relevant aspects of processes and events more salient when distributed over time and space.

Cognitive Strategies Searching Conditions - Dispersed or Category Structures Fact finding Exploratory searching Novice & Experiences users Top-down, bottom-up & mixed

Reading Time, Scrolling & Interaction Can implicit feedback improve relevancy? documents, 6 subjects - Read documents & score them Better than reading, saving & printing? - Measure use now vs. later - Focused on document, not activity How do you know the user is reading? Is saving a relevance measure? No differences noted in scrolling (4.28) What about following links? Finding, highlighting, copying?

How do we really use the Web? People don’t read, they scan Web pages We move quickly, we know we can go back Quick experimentation & short memory Behaviors that work are reinforced & continued Satificing makes measures of quality difficult Web pages as Billboards? What’s billboard information for IR systems?

Revisitation Patterns on WWW Mostly Re-Visits (58%) Continually Visit New Pages Access Only A Few Pages Frequently Clusters (Sets) & Short Paths of URLs - Frequency - Recency - “Distance” Types of Navigation - Hub and Spoke - Depth Searching (lots of links before returning, if at all) - Guided Tour (Tasks)

Revisitation Patterns 2 Back Button Use Affects Everything (Even More Since Study) Navigation Methods Differ Reasons for Revisiting - Explore Further - Use Feature (Search or Home Page) - “On the Way” to another Page (IA Problem) Users Don’t Understand Browser History Very Well or Do They Misunderstand Page/Site Navigation? Provide Navigation Support Work with the Back Button – Don’t Break its Functionality

Web search lab game Break into groups Answer a set of questions Different rules for each search 1.Search as you would 2.Talk & decide before each move 3.No typing this time! 4.Search as you would again 5.Fast as possible