2007.09.07 - SLIDE 1CARL Presentation The Future of Search Ray R. Larson University of California, Berkeley School of Information.

Slides:



Advertisements
Similar presentations
Lake Land College Library Tim Schreiber Information Services Librarian.
Advertisements

Finding Sources Introduction Types of sources Locating sources Online card catalogues Search engines Online databases Talk About It Your Turn Tech Tools.
HYPERTEXT “The origin of the concept of hypertext is normally associated with an article published in 1945 by Vannevar Bush: "As we may think" …, while.
The future of technology, law enforcement, and justice ICJIA Feburary, 2013.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Search Engines and Information Retrieval
Spatial Hypermedia and Augmented Reality
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
Computers in Society History of Computing. Homework Assignment #3 is ready to go – let’s have a look. Questions about HW1? More people to schedule for.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Information Retrieval in Practice
Lecture 1: Introduction and History
Access to Digital Heritage Resources using What, Where, When and Who Michael Buckland Electronic Cultural Atlas Initiative University of California, Berkeley.
Lecture 15: Intro to Information Retrieval
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS 257 – Fall 2005 Future of Database Systems University of California, Berkeley School of Information Management and Systems SIMS.
Introduction to Computers QUME Some objectives  define the term, computer, and discuss four basic computer operations  understand the terms hardware.
11/27/2001Database Management -- R. Larson Databases and the Future (Cont.) University of California, Berkeley School of Information Management and Systems.
Geography, Time, and the Representation of Cultural Change – Experience from a Large Collaboration: The Electronic Cultural Atlas Initiative (ECAI) Michael.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
SLIDE 1IS 245 – Spring 2009 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
ECAI – CAA Conference, Fargo, April 19, 2006 Geo-temporal Indexing: Events, Lives, and Geographical Features Michael Buckland also Kim Carl, Sarah Ellinger.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
Prof. Ray R. Larson University of California, Berkeley School of Information Developing a Metadata Infrastructure for Information Access: What, Where,
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
DATABASES FROM HCT LIBRARIES. HCT has many online databases for students to use to find information. A database is a collection of information organized.
Modern Information Retrieval Lecture 1: Introduction.
Research Methods & Data AD140Brendan Rapple 2 March, 2005.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Directories, Almanacs, Yearbooks LIS 704 Summer 2009 Reference & Online Resources.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Search Engines and Information Retrieval Chapter 1.
Research Papers Locating Your Sources. Two Kinds of Sources Primary source: original text, document, interview, speech, or letter (it is the text itself)
Programming the Web Web = Computer Network + Hypertext.
As We May Think by Vannevar Bush Heerin Lee. After a war... a growing amount of research however, our methods of transmitting and reviewing the results.
Dance: A Research Strategy Anne Harlow Reference and Information Services Samuel Paley Library Temple University October 4, 2004.
Selecting a Topic and Purpose
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Cultural Heritage Markup Strategies Bibliotheca Alexandria –Digital Library of the Middle East –January, 2006.
Finding Credible Sources
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
MULTIMEDIA DEFINITION OF MULTIMEDIA
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
History Study Centre Demonstration. History Study Centre A wealth of primary and secondary resources for historians. Content is selected and organised.
인지구조기반 마이닝 소프트컴퓨팅 연구실 박사 2 학기 박 한 샘 2006 지식기반시스템 응용.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
What if? Virtual Time Line- PBL (Remember a time line goes in the order that it happened)
Indexing of Tables and Figures: Scientists’ Reaction Carol Tenopir University of Tennessee web.utk.edu/~tenopir/
The Semantic Logger: Supporting Service Building from Personal Context Mischa M Tuffield et al. Intelligence, Agents, Multimedia Group University of Southampton.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Lecture#11 Forecasting in the telecommunications The Bonch-Bruevich Saint-Petersburg State University of Telecommunications Series of lectures “Telecommunication.
 A website, also written Web site, web site, or simply site, is a group of Web pages and related text, databases, graphics, audio, and video files that.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Computers in Education Past, Present, and Future
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Four Reading Research: To Boldly Go Where Others Have Gone Before.
CS1001 Lecture 10. Overview HTML and Usability HTML and Usability Copyright Copyright.
More New Media Information Technology and Social Life Feb. 4, 2005.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Library Research Speech Research Anthony Valenti Campus Director Learning Resources.
INFORMATION SOURCES Resources in a library are determined by the information requirements of the users of the Library.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Federated & Meta Search
Searching for and Accessing Information
Finding Sources Introduction Types of sources Locating sources
WIRED Week 2 Syllabus Update Readings Overview.
Introduction to Semantic Metadata & Semantic Web
Introduction to Information Retrieval
Presentation transcript:

SLIDE 1CARL Presentation The Future of Search Ray R. Larson University of California, Berkeley School of Information

SLIDE 2CARL Presentation Overview Predicting the future… Quotes from Leon Kappelman “The future is ours” CACM, March 2001 Where are we coming from? Generations of Search Engines The Role of Metadata Pervasive Search

SLIDE 3CARL Presentation Overview Predicting the future… Quotes from Leon Kappelman “The future is ours” CACM, March 2001 Where are we coming from? Generations of Search Engines The Role of Metadata Pervasive Search

SLIDE 4CARL Presentation Radio has no future, Heavier-than-air flying machines are impossible. X-rays will prove to be a hoax. –William Thompson (Lord Kelvin), 1899

SLIDE 5CARL Presentation This “Telephone” has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us. –Western Union, Internal Memo, 1876

SLIDE 6CARL Presentation I think there is a world market for maybe five computers –Thomas Watson, Chair of IBM, 1943

SLIDE 7CARL Presentation The problem with television is that the people must sit and keep their eyes glued on the screen; the average American family hasn’t time for it. –New York Times, 1949

SLIDE 8CARL Presentation Where … the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers in the future may have only 1000 vacuum tubes and weigh only 1.5 tons –Popular Mechanics, 1949

SLIDE 9CARL Presentation There is no reason anyone would want a computer in their home. –Ken Olson, president and chair of Digital Equipment Corp., 1977.

SLIDE 10CARL Presentation 640K ought to be enough for anybody. –Attributed to Bill Gates, 1981

SLIDE 11CARL Presentation By the turn of this century, we will live in a paperless society. –Roger Smith, Chair of GM, 1986

SLIDE 12CARL Presentation I predict the internet… will go spectacularly supernova and in 1996 catastrophically collapse. –Bob Metcalfe (3-Com founder and inventor of ethernet), 1995

SLIDE 13CARL Presentation Overview Predicting the future… Quotes from Leon Kappelman “The future is ours” CACM, March 2001 Where are we coming from? Generations of Search Engines The Role of Metadata Pervasive Search

SLIDE 14CARL Presentation Origins Very early history of content representation –Sumerian tokens and “envelopes” –Alexandria - pinakes –Indices

SLIDE 15CARL Presentation Origins Biblical Indexes and Concordances –1247 – Hugo de St. Caro – employed 500 Monks to create keyword concordance to the Bible Journal Indexes (Royal Society, 1600’s) “Information Explosion” following WWII –Cranfield Studies of indexing languages and information retrieval

SLIDE 16CARL Presentation Visions of IR Systems Rev. John Wilkins, 1600’s : The Philosophical Language and tables Wilhelm Ostwald and Paul Otlet, 1910’s: The “monographic principle” and Universal Classification Emanuel Goldberg, 1920’s ’s H.G. Wells, “World Brain: The idea of a permanent World Encyclopedia.” (Introduction to the Encyclopédie Française, 1937) Vannevar Bush, “As we may think.” Atlantic Monthly, Term “Information Retrieval” coined by Calvin Mooers. 1952

SLIDE 17CARL Presentation “What Dr. Bush Foresees” Cyclops Camera Worn on forehead, it would photograph anything you see and want to record. Film would be developed at once by dry photography. Microfilm It could reduce Encyclopaedia Britannica to volume of a matchbox. Material cost: 5¢. Thus a whole library could be kept in a desk. Vocoder A machine which could type when talked to. But you might have to talk a special phonetic language to this mechanical supersecretary. Thinking machine A development of the mathematical calculator. Give it premises and it would pass out conclusions, all in accordance with logic. Memex An aid to memory. Like the brain, Memex would file material by association. Press a key and it would run through a “trail” of facts.

SLIDE 18CARL Presentation Memex

SLIDE 19CARL Presentation Memex Detail

SLIDE 20CARL Presentation Cyclops Camera

SLIDE 21CARL Presentation Vocoder: “Supersecretary”

SLIDE 22CARL Presentation Investigator at Work “One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may be both in miniature, so that he projects them for examination.”

SLIDE 23CARL Presentation Memex “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

SLIDE 24CARL Presentation Associative Indexing “[…] associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of memex. The process of tying two items together is the important thing.”

SLIDE 25CARL Presentation The WWW circa 1945 “It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. But it is more than this; for any item can be joined into numerous trails, the trails can bifurcate, and they can give birth to side trails.” “Wholly new forms of encyclopaedias will appear, ready-made with a mesh of associative trails running them, ready to be dropped into the memex and there amplified.”

SLIDE 26CARL Presentation But the reality of IR was… Card-based systems Library catalogs (of course) Some microfilm-based systems (Goldberg, Shaw’s Rapid Selector) But also keyword or topic-based IR systems using cards…

SLIDE 27CARL Presentation Card-Based IR Systems Uniterm (Casey, Perry, Berry, Kent: 1958) –Developed and used from mid 1940’s) EXCURSION LUNAR

SLIDE 28CARL Presentation Card Systems Batten Optical Coincidence Cards (“Peek- a-Boo Cards”), 1948 Lunar Excursion

SLIDE 29CARL Presentation Card Systems Zatocode (edge-notched cards) Mooers, 1951 Document 1 Title: lksd ksdj sjd sjsjfkl Author: Smith, J. Abstract: lksf uejm jshy ksd jh uyw hhy jha jsyhe Document 200 Title: Xksd Lunar sjd sjsjfkl Author: Jones, R. Abstract: Lunar uejm jshy ksd jh uyw hhy jha jsyhe Document 34 Title: lksd ksdj sjd Lunar Author: Smith, J. Abstract: lksf uejm jshy ksd jh uyw hhy jha jsyhe

SLIDE 30CARL Presentation Computer-Based Systems Bagley’s 1951 MS thesis from MIT suggested that searching 50 million item records, each containing 30 index terms would take approximately 41,700 hours –Due to the need to move and shift the text in core memory while carrying out the comparisons 1957 – Desk Set with Katharine Hepburn and Spencer Tracy – EMERAC

SLIDE 31CARL Presentation Selection “The heart of the problem, and of the personal machine we have here considered, is the task of selection. And here, in spite of great progress, we are still lame. Selection, in the broad sense, is still a stone adze in the hands of a cabinetmaker.” —“Memex Revisited” (Bush 1965)

SLIDE 32CARL Presentation Historical Milestones in IR Research 1958 Statistical Language Properties (Luhn) 1960 Probabilistic Indexing (Maron & Kuhns) 1961 Term association and clustering (Doyle) 1965 Vector Space Model (Salton) 1968 Query expansion (Roccio, Salton) 1972 Statistical Weighting (Sparck-Jones) Poisson Model (Harter, Bookstein, Swanson) 1976 Relevance Weighting (Robertson, Sparck- Jones) 1980 Fuzzy sets (Bookstein) 1981 Probability without training (Croft)

SLIDE 33CARL Presentation Historical Milestones in IR Research (cont.) 1983 Linear Regression (Fox) 1983 Probabilistic Dependence (Salton, Yu) 1985 Generalized Vector Space Model (Wong, Rhagavan) 1987 Fuzzy logic and RUBRIC/TOPIC (Tong, et al.) 1990 Latent Semantic Indexing (Dumais, Deerwester) 1991 Polynomial & Logistic Regression (Cooper, Gey, Fuhr) 1992 TREC (Harman) 1992 Inference networks (Turtle, Croft) 1994 Neural networks (Kwok) 1998 Language Models (Ponte, Croft)

SLIDE 34CARL Presentation The Internet and the WWW Gopher, Archie, Veronica, WAIS Tim Berners-Lee, 1991 creates WWW at CERN – originally hypertext only Web-crawler Lycos Alta Vista Inktomi Google (and many others)

SLIDE 35CARL Presentation Overview Predicting the future… Quotes from Leon Kappelman “The future is ours” CACM, March 2001 Where are we coming from? Generations of Search Engines –Credit goes to Andrei Broder of Yahoo! for most of the following slides The Role of Metadata Pervasive Search

SLIDE 36CARL Presentation Evolution of search engines First generation -- use only “on page”, text data –Word frequency, language Second generation -- use off-page, web-specific data –Link (or connectivity) analysis –Click-through data (What results people click on) –Anchor-text (How people refer to this page) Third generation -- answer “the need behind the query” –Semantic analysis -- what is this about? –Focus on user need, rather than on query –Context determination –Helping the user –Integration of search and text analysis AV, Excite, Lycos, etc From Made popular by Google but everyone now The current techhnology Courtesy Andrei Broder

SLIDE 37CARL Presentation From Information Retrieval to Information Supply Explicit demand for information driven by a user query Increase use of context Active information supply driven by user activity and context Courtesy Andrei Broder

SLIDE 38CARL Presentation From Information Retrieval to Information Supply: Car Navigation: Maps  GPS Now… Then … Courtesy Andrei Broder

SLIDE 39CARL Presentation Social search The next chapter of the Web story seems to be collective but specialized sharing on a mass scale –Lots of communities driven by common interests and the availability of enabling tools Challenge: leverage “human computing” –metadata/user-generated content in the form of tagging, sharing, connections, reviews, etc Flickr, Delicious, Blogs, Groups, Explicit Social Networks, etc –Social engineering/tools to stimulate HC: ESP game, MySpace, Human Experts, etc –Technical & UI aspects: how do we incorporate people’s opinion in search results? Recommender systems, reputation, networks, etc. Courtesy Andrei Broder

SLIDE 40CARL Presentation The user: information needs Informational – want to learn about something (~40%) Navigational – want to go to that page (~25%) Transactional – want to do something (web-mediated) (~35%) –Access a service –Downloads –Shop Gray areas –Find a good hub –Exploratory search “see what’s there” Low hemoglobin United Airlines Mendocino weather Mars surface images Nikon CoolPix Car rental Finland Courtesy Andrei Broder

SLIDE 41CARL Presentation But… Academia and Research are not the Web Other things are important as user needs, –Validity –Current theory and Information –Good research and papers to read and cite Suppose you are an undergrad and have a 20 page paper to write for your Astronomy class…

SLIDE 42CARL Presentation Where do you go from here?

SLIDE 43CARL Presentation Google Scholar?

SLIDE 44CARL Presentation And it’s not all Science… Humanities and Social Sciences are less well served by Google Scholar

SLIDE 45CARL Presentation A History paper on George Washington?

SLIDE 46CARL Presentation Might do better on the web…

SLIDE 47CARL Presentation What is missing? At Berkeley we have been looking at how the functionality of library reference collections have been largely overlooked, or poorly implemented (as sets of unconnected digital books) in the digital world This has constantly reinforced the central role of metadata in search, and led to the realization that metadata resources can constitute an infrastructure for improving search in scholarly environments

SLIDE 48CARL Presentation Overview Predicting the future… Quotes from Leon Kappelman “The future is ours” CACM, March 2001 Where are we coming from? Generations of Search Engines –Credit goes to Andrei Broder of Yahoo! for most of the following slides The Role of Metadata Pervasive Search

SLIDE 49CARL Presentation Our work is based on four ideas 1. Understanding means knowing the context 2. Could the use of digital resources be made more like using a well-organized library reference collection? 3. How could it be easier to find context of any museum object, document, or performance: What is related to it in what it is, where it came from, when it originated, and who was associated with it? 4. Distinguishing WHAT, WHERE, WHEN, and WHO provides a useful infrastructure The “Metadata” projects

SLIDE 50CARL Presentation Linking vocabularies WHAT, WHERE, WHEN Library subject headings Topic – Geographic subdivision – Chronological subdivision Place name gazetteer: Place name – Type – Spatial markers (Lat & long) – When Time Period Directory Period name – Type – Time markers (Calendar) – Where Vocabularies are the key

SLIDE 51CARL Presentation Metadata Infrastructure – 4 W’s Texts Numeric datasets Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies Biographical Dictionary Who? What? When? Where?

SLIDE 52CARL Presentation A Metadata Infrastructure CATALOGS Achives Historical Societies Libraries Museums Public Television Publishers Booksellers Audio Images Numeric Data Objects Texts Virtual Reality Webpages RESOURCES INTERMEDIA INFRASTRUCTURE Text and ImagesBiographical DictionaryWHO TimelinesTime Period DirectoryWHEN MapsGazetteer WHERE Syndetic StructureThesaurusWHAT Special Display ToolsAuthority ControlFacet Learners Dossiers

SLIDE 53CARL Presentation WHEN: Time Period Directory Timeline Link to Catalog Link to Wikipedia

SLIDE 54CARL Presentation WHO: Biographical Dictionary with complex relationships Life events metadata WHAT: Actions prisoner WHERE: Places Holstein WHEN: Times WHO: People Margaret Sambiria

SLIDE 55CARL Presentation Early Prototype interface

SLIDE 56CARL Presentation Entry Vocabulary Index suggests correct LCSH with different spelling

SLIDE 57CARL Presentation Related places

SLIDE 58CARL Presentation Potentially related people

SLIDE 59CARL Presentation Potentially related periods

SLIDE 60CARL Presentation Mostly in India 16 th - 18 th century

SLIDE 61CARL Presentation Find out more about this area.

SLIDE 62CARL Presentation Different Browsing Options!

SLIDE 63CARL Presentation Zooming in to South Asia Restricting time frame Select

SLIDE 64CARL Presentation More information about the country of India…

SLIDE 65CARL Presentation More information about the country of India… Wikipedia CIA Factbook BBCEthnologue Berkeley Natural History Museums

SLIDE 66CARL Presentation Historical events – linked to Library catalog & Wikipedia : none avail. for this time period

SLIDE 67CARL Presentation ECAI Cultural Atlases: presenting history in its geographical & chronological contexts

SLIDE 68CARL Presentation Better interfaces are now available…

SLIDE 69CARL Presentation Some issues with clutter…

SLIDE 70CARL Presentation Can be handled by the time bar

SLIDE 71CARL Presentation Conclusions: Pervasive Search Search is going to be everywhere, often working in the background attempting to anticipate what you want or need We are seeing some of this already on the Web search engines –for example…

SLIDE 72CARL Presentation

SLIDE 73CARL Presentation Where is search going? Broder has suggested that search will become more integrated into work processes –This is even more important for scholarly and research work –Search should move beyond the “empty box” and develop dynamic context-related searches that allow the user to focus in on particular topics, branch into related topics, and reexamine a topic from different perspectives or disciplines

SLIDE 74CARL Presentation But in the Academic world… These kinds of context-based linking are happening already on the web and in commerce-driven applications Either we can give up and assume Google will do it all for us… Or we can continue to find new uses for the rich collections of metadata that we have been building over the decades, and demand that our search engines use that information to provide relevant context- driven information links

SLIDE 75CARL Presentation Thank You Questions?