2002.08.27 - SLIDE 1IS 202 - Fall 2002 Course Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am.

Slides:



Advertisements
Similar presentations
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Advertisements

Chapter 5: Introduction to Information Retrieval
Principles of IR Hacettepe University Department of Information Management DOK 324: Principles of IR.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Information Retrieval in Practice
Search Engines and Information Retrieval
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
Search Engines CS 186 Guest Lecture Prof. Marti Hearst SIMS.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
SLIDE 1IS Fall 2003 Course Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am.
10/23/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley.
SLIDE 1IS 202 – FALL 2003 Lecture 26: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
SLIDE 1IS Fall 2004 Course Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am.
10/24/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
8/28/2001Information Organization and Retrieval SIMS 202 Information Organization and Retrieval Prof. Ray Larson & Prof. Warren Sack UC Berkeley SIMS Tues/Thurs.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
3.02 The Information Superhighway
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Computer Networks CEN 5501C Spring, 2008 Ye Xia (Pronounced as “Yeh Siah”)
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
SLIDE 1IS 202 – FALL 2002 Lecture 27: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS.
World Wide Web Library 150 Week 8. The Web The World Wide Web is one part of the Internet. No one controls the web Diverse kinds of services accessed.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Computing Fundamentals Module Lesson 6 — Using Technology to Solve Problems Computer Literacy BASICS.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Text Based Information Retrieval
University of California, Berkeley
Information Retrieval
Magnet & /facet Zheng Liang
Information Organization: Overview
Presentation transcript:

SLIDE 1IS Fall 2002 Course Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 SIMS 202: Information Organization and Retrieval Credits to Marti Hearst for some of the slides in this lecture

SLIDE 2IS Fall 2002 Today Introductions Course overview Administrivia

SLIDE 3IS Fall 2002 Goals of the Course Learn about –Design, development and use of information storage and retrieval systems –Practical and theoretical foundations of information organization and analysis –Evaluation of information access systems –Cognitive and user-centric considerations –Hands-on experience with information systems

SLIDE 4IS Fall 2002 Two Main Themes Information Organization and Design Information Retrieval and the Search Process

SLIDE 5IS Fall 2002 Information Organization and Retrieval To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for. Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known. To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right. Information is (1) informing, telling; thing told, knowledge, items of knowledge, news. The Oxford English Dictionary, cf. Rowley

SLIDE 6IS Fall 2002 (Approximate) Course Schedule Organization –Overview –Categorization –Metadata and markup –Metadata for multimedia Photo project –Controlled vocabularies, classification, thesauri –Information design Thesaurus design Database design

SLIDE 7IS Fall 2002 Information Properties Information can be communicated electronically –Broadcasting –Networking Information can be easily duplicated and shared –Problems of ownership –Problems of control Adapted from ‘Silicon Dreams’ by Robert W. Lucky

SLIDE 8IS Fall 2002 Information Hierarchy Wisdom Knowledge Information Data

SLIDE 9IS Fall 2002 Information Hierarchy Data –The raw material of information Information –Data organized and presented by someone Knowledge –Information read, heard or seen and understood Wisdom –Distilled and integrated knowledge and understanding

SLIDE 10IS Fall 2002 Information Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, “The Rock” Where is the information we have lost in data?

SLIDE 11IS Fall 2002 Information Life Cycle Creation UtilizationSearching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering

SLIDE 12IS Fall 2002 Authoring/Modifying Converting data+information+knowledge to new information Creating information from observation, thought Editing and publication Gatekeeping

SLIDE 13IS Fall 2002 Organizing/Indexing Collecting and integrating information Affects data, information and metadata “Metadata” describes data and information –More on this later Organizing information –Types of organization? Indexing

SLIDE 14IS Fall 2002 Storing/Retrieving Information storage –How and where is information stored? Retrieving information –How is information recovered from storage –How to find needed information –Linked with accessing/filtering stage

SLIDE 15IS Fall 2002 Distribution/Networking Transmission of information –How is information transmitted? Networks vs. broadcast

SLIDE 16IS Fall 2002 Accessing/Filtering Using the organization created in the O/I stage to: –Select desired (or relevant) information –Locate that information –Retrieve the information from its storage location (often via a network)

SLIDE 17IS Fall 2002 Using/Creating Using information Transformation of information to knowledge Knowledge to new data and new information

SLIDE 18IS Fall 2002 Key Issues in This Course How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them –Organizing How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs –Retrieving

SLIDE 19IS Fall 2002 Key Issues Creation UtilizationSearching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering

SLIDE 20IS Fall 2002 (Approximate) Course Schedule Organization –Overview –Categorization –Metadata and markup –Metadata for multimedia Photo Project –Controlled vocabularies, classification, thesauri –Information design Thesaurus design Database design Retrieval –The search process –Content analysis Tokenization, Zipf’s law, lexical associations –IR implementation –Term weighting and document ranking Vector space model –User interfaces Overviews, query specification, providing context

SLIDE 21IS Fall 2002 Web Search Questions What do people search for? How do people use search engines? –How often do people find what they are looking for? –How difficult is it for people to find what they are looking for? How can search engines be improved?

SLIDE 22IS Fall 2002 What Do People Search for on the Web? Study by Spink et al., Oct 98 – –Survey on Excite, 13 questions –Data for 316 surveys

SLIDE 23IS Fall 2002 What Do People Search for on the Web? Topics Genealogy/Public Figure:12% Computer related:12% Business:12% Entertainment: 8% Medical: 8% Politics & Government 7% News 7% Hobbies 6% General info/surfing 6% Science 6% Travel 5% Arts/education/shopping/images 14% Something is missing…

SLIDE 24IS Fall 2002 What Do People Search for on the Web? 4660 sex 3129 yahoo 2191 internal site admin check from kho 1520 chat 1498 porn 1315 horoscopes 1284 pokemon 1283 SiteScope test 1223 hotmail 1163 games 1151 mp weather maps 1036 yahoo.com 983 ebay 980 recipes 50,000 queries from excite 1997 Most frequent terms:

SLIDE 25IS Fall 2002 Why Do These Differ? Self-reporting survey The nature of language –Only a few ways to say certain things –Many different ways to express most concepts UFO, Flying Saucer, Space Ship, Satellite How many ways are there to talk about history?

SLIDE 26IS Fall the a to of and in s for on this is by with or at all are from e you be that not an as home it i have if new t your page about com information Source: What is on the Web?

SLIDE 27IS Fall 2002 Intranet Queries (Aug 2000) 3351 bearfacts 3349 telebears 1909 extension 1874 schedule+of+classes 1780 bearlink 1737 bear+facts 1468 decal 1443 infobears 1227 calendar 989 career+center 974 campus+map 920 academic+calendar 840 map 773 bookstore 741 class+pass 738 housing 721 tele-bears 716 directory 667 schedule 627 recipes 602 transcripts 582 tuition 577 seti 563 registrar 550 info+bears 543 class+schedule 470 financial+aid

SLIDE 28IS Fall 2002 Intranet Queries Summary of sample data from 3 weeks of UCB queries –13.2% Telebears/BearFacts/InfoBears/BearLink (12297) –6.7% Schedule of classes or final exams (6222) –5.4% Summer Session (5041) –3.2% Extension (2932) –3.1% Academic Calendar (2846) –2.4% Directories (2202) –1.7% Career Center (1588) –1.7% Housing (1583) –1.5% Map (1393) Average query length over last 4 months: 1.8 words This suggests what is difficult to find from the home page

SLIDE 29IS Fall 2002 An Example Search System: Cha-Cha A system for searching complex intranets Places retrieval results in context Important design goals: –Users at any level of computer expertise –Browsers at any version level –Computers of any speed

SLIDE 30IS Fall An Example Search System: Cha-Cha

SLIDE 31IS Fall 2002 Search: Where to Start? Guess words? –Search engine plunges you into the middle of a site/collection –Too many or too few results –No context Use a directory? –If large, may be difficult/frustrating to navigate –Several ways to organize the information –May not reflect users’ needs Solution: Integrate browsing and search –How do you organize the information to optimize searching?

SLIDE 32IS Fall An Example Search System: Cha-Cha

SLIDE 33IS Fall An Example Search System: Cha-Cha

SLIDE 34IS Fall 2002 How Cha-Cha Works Crawl entire intranet Compute the shortest hyperlink path from a certain root page to every web page Index and compute metadata for the pages –Using Cheshire II –Run a user query –Gather all the hits –Create a “directory” based on combining the shortest paths –Special graph algorithm removes redundant links and internal nodes

SLIDE 35IS Fall Cha-Cha System Architecture crawl the web store the documents

SLIDE 36IS Fall Cha-Cha System Architecture crawl the web store the documents create files of metadata Cheshire II

SLIDE 37IS Fall Cha-Cha System Architecture crawl the web create a keyword index store the documents create files of metadata Cheshire II

SLIDE 38IS Fall Cha-Cha System Architecture Cheshire II user query Searching

SLIDE 39IS Fall Cha-Cha System Architecture Cheshire II server accesses the databases Searching

SLIDE 40IS Fall Cha-Cha System Architecture Cheshire II results shown to user Searching

SLIDE 41IS Fall Cha-Cha System Architecture Cheshire II results shown to user server accesses the databases user query Searching

SLIDE 42IS Fall 2002 What Hasn’t Been Explained Here? What metadata is collected How the indexes are created How queries are formed How documents are ranked How shortest paths are computed How the system is built –… among other things! –This is just an introduction! Much more on these issues in the second half of the course

SLIDE 43IS Fall 2002 (Approximate) Course Schedule Organization –Overview –Categorization –Metadata and markup –Metadata for multimedia Photo Project –Controlled vocabularies, classification, thesauri –Information design Thesaurus design Database design Retrieval –The search process –Content analysis Tokenization, Zipf’s law, lexical associations –IR implementation –Term weighting and document ranking Vector space model –User interfaces Overviews, query specification, providing context

SLIDE 44IS Fall 2002 Assignments and Exams Approximately 10 assignments (due within one week to ten days) –Sometimes “checked”, sometimes graded Final exam (during Finals week) Grading: –Assignments: 60% Not evenly weighted –Final: 25% –Class Participation: 15%

SLIDE 45IS Fall 2002 Readings Course reader –Will be available in about a week (will announce) –Textbooks Modern Information Retrieval, Baeza-Yates and Ribiero-Neto (Eds.), Addison Wesley, 1999 The Organization of Information, Arlene G. Taylor, Libraries Unlimited, 1999,

SLIDE 46IS Fall 2002 Homework (!) Read the handouts –Borges, Dennett, and Reddy Write one or two paragraphs on –What is information, according to your background or area of expertise? Due in class this Thursday, Aug 30.

SLIDE 47IS Fall 2002 What is Information? There is no “correct” definition Can involve philosophy, psychology, signal processing, physics Cookie Monster’s definition: – “news or facts about something” Oxford English Dictionary –information: informing, telling; thing told, knowledge, items of knowledge, news –knowledge: knowing familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known

SLIDE 48IS Fall 2002 Next Time Introduction to the Photo Project More on what is information? And how much of it is out there?