2002.11.19 - SLIDE 1IS 202 – FALL 2002 Lecture 23: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and.

Slides:



Advertisements
Similar presentations
Recuperação de Informação B Cap. 10: User Interfaces and Visualization 10.1,10.2,10.3 November 17, 1999.
Advertisements

Chapter 11 Designing the User Interface
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 CS 501 Spring 2002 CS 501: Software Engineering Lecture 11 Designing for Usability I.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Information Retrieval in Practice
Spatial Hypermedia and Augmented Reality
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Human Computer Interface. HCI and Designing the User Interface The user interface is a critical part of an information system -- it is what the users.
SLIDE 1IS 202 – FALL 2004 Lecture 05: Web Search Issues and Algorithms Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
1 User-Centered Design at the USPTO: Application to Patent IT Modernization Marti Hearst Chief IT Strategist, USPTO May 23, 2011.
SLIDE 1IS 202 – FALL 2003 Lecture 22: Interfaces for Information Retrieval I Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday.
SLIDE 1IS 202 – FALL 2003 Lecture 21: Web Search Issues and Algorithms Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Gender Issues in Systems Design and User Satisfaction for e- testing software Prepared by Sahel AL-Habashneh. Department of Business information systems.
Course Wrap-Up IS 485, Professor Matt Thatcher. 2 C.J. Minard ( )
Usability 2004 J T Burns1 Usability & Usability Engineering.
Principles and Methods
© Lethbridge/Laganière 2001 Chapter 7: Focusing on Users and Their Tasks1 7.1 User Centred Design (UCD) Software development should focus on the needs.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, Jan 20, 2005.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, Jan 22, 2004.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, Jan 18, 2007.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Introduction to HCI Marti Hearst (UCB SIMS) SIMS 213, UI Design & Development January 21, 1999.
Information Retrieval
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and.
Chapter 13: Designing the User Interface
Overview of Search Engines
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Web Design Process CMPT 281. Outline How do we know good sites from bad sites? Web design process Class design exercise.
Computer –the machine the program runs on –often split between clients & servers Human-Computer Interaction (HCI) Human –the end-user of a program –the.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Software Requirements Engineering CSE 305 Lecture-2.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
SLIDE 1IS 240 – Spring 2013 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval Lecture.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
ITGS Databases.
Chapter 3 Managing Design Processes. 3.1 Introduction Design should be based on: –User observation Analysis of task frequency and sequences –Prototypes,
Overview Prototyping Construction Conceptual design Physical design Generating prototypes Tool support.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
Usability Engineering Dr. Dania Bilal IS 587 Fall 2007.
SLIDE 1IS 202 – FALL 2002 Lecture 20: Web Search Issues and Algorithms Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday.
1 Web Search Engines. 2 Search Engine Characteristics  Unedited – anyone can enter content Quality issues; Spam  Varied information types Phone book,
School of Engineering and Information and Communication Technology KIT305/607 Mobile Application Development Week 7: Usability (think-alouds) Dr. Rainer.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Information Retrieval in Practice
Search Engine Architecture
Usability Techniques Lecture 13.
Data Mining Chapter 6 Search Engines
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Web Search Engines.
Instructor : Marina Gavrilova
Presentation transcript:

SLIDE 1IS 202 – FALL 2002 Lecture 23: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval

SLIDE 2IS 202 – FALL 2002 Lecture Overview Review and Continuation –Web Search Engines and Algorithms Interfaces for Information Retrieval –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 3IS 202 – FALL 2002 Lecture Overview Review and Continuation –Web Search Engines and Algorithms Interfaces for Information Retrieval –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 4IS 202 – FALL 2002 Search Engines Crawling Indexing Querying

SLIDE 5IS 202 – FALL 2002 Web Search Engine Layers From description of the FAST search engine, by Knut Risvik

SLIDE 6IS 202 – FALL 2002 Standard Web Search Engine Architecture crawl the web create an inverted index Check for duplicates, store the documents Inverted index Search engine servers user query Show results To user DocIds

SLIDE 7IS 202 – FALL 2002 More detailed architecture (Brin & Page 98) Only covers the preprocessing in detail, not the query serving Google Web Search Architecture

SLIDE 8IS 202 – FALL 2002 Indexes for Web Search Engines Inverted indexes are still used, even though the web is so huge Some systems partition the indexes across different machines –Each machine handles different parts of the data Other systems duplicate the data across many machines –Queries are distributed among the machines Most do a combination of these

SLIDE 9IS 202 – FALL 2002 Search Engine Querying In this example, the data for the pages is partitioned across machines. Additionally, each partition is allocated multiple machines to handle the queries. Each row can handle 120 queries per second Each column can handle 7M pages To handle more queries, add another row. From description of the FAST search engine, by Knut Risvik

SLIDE 10IS 202 – FALL 2002 Querying: Cascading Allocation of CPUs A variation on this that produces a cost- savings: –Put high-quality/common pages on many machines –Put lower quality/less common pages on fewer machines –Query goes to high quality machines first –If no hits found there, go to other machines

SLIDE 11IS 202 – FALL 2002 Google Google maintains the worlds largest Linux cluster (10,000 servers) These are partitioned between index servers and page servers –Index servers resolve the queries (massively parallel processing) –Page servers deliver the results of the queries Over 3 Billion web pages are indexed and served by Google

SLIDE 12IS 202 – FALL 2002 Search Engine Indexes Starting Points for Users include Manually compiled lists –Directories Page “popularity” –Frequently visited pages (in general) –Frequently visited pages as a result of a query Link “co-citation” –Which sites are linked to by other sites?

SLIDE 13IS 202 – FALL 2002 Starting Points: What is Really Being Used? Todays search engines combine these methods in various ways –Integration of Directories Today most web search engines integrate categories into the results listings Lycos, MSN, Google –Link analysis Google uses it; others are also using it Words on the links seems to be especially useful –Page popularity Many use DirectHit’s popularity rankings

SLIDE 14IS 202 – FALL 2002 Web Page Ranking Varies by search engine –Pretty messy in many cases –Details usually proprietary and fluctuating Combining subsets of: –Term frequencies –Term proximities –Term position (title, top of page, etc) –Term characteristics (boldface, capitalized, etc) –Link analysis information –Category information –Popularity information

SLIDE 15IS 202 – FALL 2002 Ranking: Hearst ‘96 Proximity search can help get high- precision results if >1 term –Combine Boolean and passage-level proximity –Proves significant improvements when retrieving top 5, 10, 20, 30 documents –Results reproduced by Mitra et al. 98 –Google uses something similar

SLIDE 16IS 202 – FALL 2002 Ranking: Link Analysis Assumptions: –If the pages pointing to this page are good, then this is also a good page –The words on the links pointing to this page are useful indicators of what this page is about –References: Page et al. 98, Kleinberg 98

SLIDE 17IS 202 – FALL 2002 Ranking: Link Analysis Why does this work? –The official Toyota site will be linked to by lots of other official (or high-quality) sites –The best Toyota fan-club site probably also has many links pointing to it –Less high-quality sites do not have as many high-quality sites linking to them

SLIDE 18IS 202 – FALL 2002 Ranking: PageRank Google uses the PageRank We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. d is usually set to C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) PR(Tn)/C(Tn)) Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one

SLIDE 19IS 202 – FALL 2002 PageRank Similar to calculations used in scientific citation analysis (e.g., Garfield et al.) and social network analysis (e.g., Waserman et al.) Similar to other work on ranking (e.g., the hubs and authorities of Kleinberg et al.) Computation is an iterative algorithm and converges to the principle eigenvector of the link matrix

SLIDE 20IS 202 – FALL 2002 Lecture Overview Review and Continuation –Web Search Engines and Algorithms Interfaces for Information Retrieval –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 21IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 22IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 23IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 24IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 25IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 26IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 27IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 28IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 29IS 202 – FALL 2002 “Drawing the Circles”

SLIDE 30 Human-Computer Interaction (HCI) Human –The end-users of a program –The others in the organization Computer –The machines the programs run on Interaction –The users tell the computers what they want –The computers communicate results

SLIDE 31 What is HCI? HumansTechnology Task Design Organizational & Social Issues

SLIDE 32IS 202 – FALL 2002 Shneiderman on HCI Well-designed interactive computer systems –Promote Positive feelings of success Competence Mastery –Allow users to concentrate on their work, exploration, or pleasure, rather than on the system or the interface

SLIDE 33 Design Guidelines Set of design rules to follow Apply at multiple levels of design Are neither complete nor orthogonal Have psychological underpinnings (ideally)

SLIDE 34IS 202 – FALL 2002 Shneiderman’s Design Principles Provide informative feedback Permit easy reversal of actions Support an internal locus of control Reduce working memory load Provide alternative interfaces for expert and novice users

SLIDE 35IS 202 – FALL 2002 Provide Informative Feedback About: –The relationship between query specification and documents retrieved –Relationships among retrieved documents –Relationships between retrieved documents and metadata describing collections

SLIDE 36IS 202 – FALL 2002 Reduce Working Memory Load Provide mechanisms for keeping track of choices made during the search process Allow users to: –Return to temporarily abandoned strategies –Jump from one strategy to the next –Retain information and context across search sessions Provide browsable information that is relevant to the current stage of the search process –Related terms or metadata –Search starting points (e.g., lists of sources, topic lists)

SLIDE 37IS 202 – FALL 2002 Interfaces For Expert And Novice Users Simplicity vs. power tradeoffs “Scaffolded” user interface How much information to show the user? –Number and complexity of user operations –Variants of operations –Inner workings of system itself –System history Example: –Television remote control

SLIDE 38IS 202 – FALL 2002 User Differences Abilities, preferences, predilections –Spatial ability –Memory –Reasoning abilities –Verbal aptitudes –Personality differences –Age, gender, ethnicity, class, sexuality, culture, education –Modalilty preferences/restrictions Vision, audition, speech, gesture, haptics, locomotion

SLIDE 39 Nielsen’s Usability Slogans Your best guess is not good enough The user is always right The user is not always right Users are not designers Designers are not users Less is more Details matter (from Nielsen’s “Usability Engineering”)

SLIDE 40 Who Builds UIs? A team of specialists (ideally) –Graphic designers –Interaction / interface designers –Technical writers –Marketers –Test engineers –Software engineers

SLIDE 41 How to Design and Build UIs Task analysis Rapid prototyping Evaluation Implementation Design Prototype Evaluate Iterate at every stage!

SLIDE 42 Task Analysis Observe existing work practices Create examples and scenarios of actual use Try out new ideas before building software

SLIDE 43 Rapid Prototyping Build a mock-up of design Low fidelity techniques –Paper sketches –Cut, copy, paste –Video segments Interactive prototyping tools –Visual Basic, HyperCard, Director, etc. UI builders –NeXT, etc.

SLIDE 44IS 202 – FALL 2002 Evaluation Techniques Qualitative vs. quantitative methods Qualitative (non-numeric, discursive, ethnographic) –Focus groups –Interviews –Surveys –User observation –Participatory design sessions Quantitative (numeric, statistical, empirical) –User testing –System testing

SLIDE 45IS 202 – FALL 2002 Qualitative Questions User experience User preferences User recommendations “Design dialogue”

SLIDE 46IS 202 – FALL 2002 Quantitative Questions Precision Recall Time required to learn the system Time required to achieve goals on benchmark tasks Error rates Retention of the use of the interface over time

SLIDE 47IS 202 – FALL 2002 Information Visualization Utility –Inherently visual data –Making the abstract concrete –Making the invisible visible Techniques –Icons –Color highlighting –Brushing and linking –Panning and zooming –Focus-plus-context –Magic lenses –Animation

SLIDE 48IS 202 – FALL 2002 Lecture Overview Review and Continuation –Web Search Engines and Algorithms Interfaces for Information Retrieval –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 49IS 202 – FALL 2002 Why Interfaces Don’t Work Because… –We still think of using the interface –We still talk of designing the interface –We still talk of improving the interface “We need to aid the task, not the interface to the task.” “The computer of the future should be invisible.”

SLIDE 50IS 202 – FALL 2002 Norman on Design Priorities 1.The user—what does the person really need to have accomplished? 2.The task—analyze the task. How best can the job be done?, taking into account the whole setting in which it is embedded, including the other tasks to be accomplished, the social setting, the people, and the organization. 3.As much as possible, make the task dominate; make the tools invisible. 4.Then, get the interaction right, making things the right things visible, exploiting affordances and constraints, providing the proper mental models, and so on—the rules of good design for the user, written about many, many times in many, many places.

SLIDE 51IS 202 – FALL 2002 Lecture Overview Review and Continuation –Web Search Engines and Algorithms Interfaces for Information Retrieval –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 52IS 202 – FALL 2002 “What Dr. Bush Foresees” Cyclops Camera Worn on forehead, it would photograph anything you see and want to record. Film would be developed at once by dry photography. Microfilm It could reduce Encyclopaedia Britannica to volume of a matchbox. Material cost: 5¢. Thus a whole library could be kept in a desk. Vocoder A machine which could type when talked to. But you might have to talk a special phonetic language to this mechanical supersecretary. Thinking machine A development of the mathematical calculator. Give it premises and it would pass out conclusions, all in accordance with logic. Memex An aid to memory. Like the brain, Memex would file material by association. Press a key and it would run through a “trail” of facts.

SLIDE 53IS 202 – FALL 2002 Memex

SLIDE 54IS 202 – FALL 2002 Memex Detail

SLIDE 55IS 202 – FALL 2002 Cyclops Camera

SLIDE 56IS 202 – FALL 2002 Vocoder: “Supersecretary”

SLIDE 57IS 202 – FALL 2002 Investigator at Work “One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may be both in miniature, so that he projects them for examination.”

SLIDE 58IS 202 – FALL 2002 Memex “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

SLIDE 59IS 202 – FALL 2002 Associative Indexing “[…] associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of memex. The process of tying two items together is the important thing.”

SLIDE 60IS 202 – FALL 2002 The WWW circa 1945 “It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. But it is more than this; for any item can be joined into numerous trails, the trails can bifurcate, and they can give birth to side trails.” “Wholly new forms of encyclopaedias will appear, ready-made with a mesh of associative trails running them, ready to be dropped into the memex and there amplified.”

SLIDE 61IS 202 – FALL 2002 Selection “The heart of the problem, and of the personal machine we have here considered, is the task of selection. And here, in spite of great progress, we are still lame. Selection, in the broad sense, is still a stone adze in the hands of a cabinetmaker.” —“Memex Revisited” (Bush 1965)

SLIDE 62IS 202 – FALL 2002 Interaction Paradigms for IR Direct manipulation –Query specification –Query refinement –Result selection Delegation –Agents –Recommender systems –Filtering

SLIDE 63IS 202 – FALL 2002 The “Adaptive” Memex “In an adaptive Memex, the owner has delegated to the machine the ability to propose or effect changes in the stored information. By analogy to business practice, the Memex is said to be functioning as an agent (Kay, 1984). The machine is playing an autonomous role within a restricted charter: to attempt a more effective organization of the information based on observations of actual use and topical similarities.”

SLIDE 64IS 202 – FALL 2002 Next Time: HCI For IR Browsing –Visualizing collections and documents –Navigating collections and documents Searching –Formulating queries –Visualizing results –Navigating results –Refining queries –Selecting results

SLIDE 65IS 202 – FALL 2002 Next Time: HCI for IR Interfaces for Information Retrieval Readings –MIR 10.4 – 10.10

SLIDE 66IS 202 – FALL 2002 Cuts

SLIDE 67IS 202 – FALL 2002 Memex

SLIDE 68IS 202 – FALL 2002 Cyclops Camera

SLIDE 69IS 202 – FALL 2002 Supersecretary