Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.

Similar presentations


Presentation on theme: "The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001."— Presentation transcript:

1 The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001

2 Agenda 5:30-6:00 Introductions 6:00-6:15 What is “Information Retrieval?” 6:15-6:40 Applications 6:40-6:55 (Break) 6:55-7:20 Applications 7:20-7:50 System design 7:50-7:55 (Stretch break) 7:55-8:15 Course outline

3 Introductions Pair up with a partner from another department Get to know each other in 5 minutes (total) Tell us about your partner in 30 seconds –Name –Degree program (department, Master’s/Ph.D./Visitor) –Information retrieval experience –One thing they would like to learn

4 What do We Mean by “Information?” How is it different from “data”? –Information is data in context Databases contain data and produce information IR systems contain and provide information How is it different from “knowledge”? –Knowledge is a basis for making decisions Many “knowledge bases” contain decision rules

5 What Do We Mean by “Retrieval?” Find something that you want –The information need may or may not be explicit Known item search –Find the class home page Answer seeking –Is Lexington or Louisville the capital of Kentucky? Directed exploration –Who makes videoconferencing systems?

6 Source: Global Reach English 2000 2005 Global Internet User Population Chinese

7 IR is More Than Web Searching! Form into four groups to discuss: 1: A system to search a collection of oral history interviews 2: Construction of a personalized newspaper 3: Software to find music recordings in an online CD store 4: Searching all the Xerox copies ever made at an office

8 What To Do Form 2-3 person subgroups to discuss: (10 min) –How would you describe what to search for? What makes one object “better” than another? –How would you recognize when you have found it? How would you explain the way you made a choice? –What kind of technology might be helpful? Speech recognition, optical character recognition, … Get together to discuss what you learned (10 min) –Build a 5 minute Powerpoint presentation

9 Supporting the Search Process Source Selection Search Query Selection Ranked List Examination Document Delivery Document Query Formulation IR System Query Reformulation and Relevance Feedback Source Reselection NominateChoose Predict

10 Supporting the Search Process Source Selection Search Query Selection Ranked List Examination Document Delivery Document Query Formulation IR System Indexing Index Acquisition Collection     

11 Design Strategies Foster human-machine synergy –Exploit complementary strengths –Accommodate shared weaknesses Divide-and-conquer –Divide task into stages with well-defined interfaces –Continue dividing until problems are easily solved Co-design related components –Iterative process of joint optimization

12 Human-Machine Synergy Machines are good at: –Doing simple things accurately and quickly –Scaling to larger collections in sublinear time People are better at: –Accurately recognizing what they are looking for –Evaluating intangibles such as “quality” Both are pretty bad at: –Mapping consistently between words and concepts

13 Divide and Conquer Strategy: use encapsulation to limit complexity Approach: –Define interfaces (input and output) for each component Query interface: input terms, output representation –Define the functions performed by each component Remove common words, weight rare terms higher, … –Repeat the process within components as needed Result: a hierarchical decomposition

14 Co-design Design of one component may affect another –Effect may be direct or indirect Approach: –Develop alternatives for each interacting component –Assess the effects of each practical combination on efficiency, effectiveness, and usability –Repeat the process until a suitable combination is found

15 Some Examples of Codesign in IR Source Selection Search Query Selection Ranked List Examination Document Delivery Document Query Formulation IR System Indexing Index Acquisition Collection

16 Course Goals Appreciate IR system capabilities and limitations Understand IR system design & implementation –For a broad range of applications and media Evaluate IR system performance Identify current IR research problems

17 Course Design Text/readings provide background and detail –At least one recommended reading is required Class provides organization and direction –We will not cover every important detail Assignments and project provide experience –The TA can help CLIS students with the project Final exam helps focus your effort

18 Grading Assignments (30% total) –Mastery of concepts and experience using tools –708A: “homework,” 838L: “programming” Term project (30%) –3 options, described on course Web page Final exam (40%) –Two different in-class exams

19 Handy Things to Know Classes will be videotaped –Available in the CLIS library if you miss class Office hours are by appointment –Send an email or ask after class Everything is on the web –At http://www.clis.umd.edu/courses/708a/ We are most easily reached by email –oard@glue.umd.edu, resnik@umiacs.umd.edu

20 Do This Week Do the reading before class –Don’t fall behind! Start on assignment 1 –Due in 2 weeks! Explore the Web site –Start thinking about the term project


Download ppt "The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001."

Similar presentations


Ads by Google