Challenges for Information Fusion in Retrieval Welcome to RIAO Conference, Pittsburgh PA Jaime Carbonell Language Technologies Institute.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Library Portals. Selected press articles Covers & blur Reviews Digital music service Co-operation & integration CONTENTSYSTEMS Library Automation Indexing.
Bringing It All Together: An Academic Viewpoint (What is needed and what is likely to come next?) Association of Information and Dissemination Centers.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
“How Can Research Help Me?” Please make SURE your notes are similar to what I have written in mine.
How does a web search engine work?. search  google (started 1998 … now worth $365 billion)  bing  amazon  web, images, news, maps, books, shopping,
1 Search Engines What is the Internet? The Web is only part of the Internet The Internet is a computer network connecting millions of computers.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Carnegie Mellon School of Computer Science 1 NSF-Relevant Challenges in Computational Intelligence Jaime Carbonell & Tom Mitchell, Guy.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( ), with Betty Cheng, Yan Liu, Eric Xing,
Information Retrieval in Practice
Carnegie Mellon School of Computer Science 1 Protein Tertiary and Quaternary Fold Recognition: A ML Approach Jaime Carbonell Joint work with: Yan Liu(
Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies Design Understanding.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Search engines. The number of Internet hosts exceeded in in in in in
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Outline of Presentation Introduction of digital video libraries Introduction of the CMU Informedia Project Informedia: user perspective Informedia:
Libraries and Institutional Content Management Systems
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
By Kathy O’Reilly Using Technology to Create LEARNING Experiences.
Molecular Modeling and Drug Discovery Judith Klein-Seetharaman Assistant Professor Department of Pharmacology University of Pittsburgh School of Medicine.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Homework 4 Final homework Deadline: Sunday April 20, PM In this homework you have to write a short essay on how Google can handle new types of data.
Information Storage Analysis & Retrieval group
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Introduction to the Language Technologies Institute Fall, 2008 Jaime Carbonell
Molecular Modeling and Drug Discovery Judith Klein-Seetharaman Assistant Professor Department of Pharmacology University of Pittsburgh School of Medicine.
CSM06 Information Retrieval Lecture 1a – Introduction Dr Andrew Salway
The Evolving Digital Mathematics Library: A Mathematics Librarian’s Perspective Timothy W. Cole University of Illinois at Urbana-Champaign 8 Dec
Technology in the Language Arts Classroom Kurt Wachowski Education 504 Holy Family University
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Mass Digitization Projects Celebration and Challenges Presented to the 2 nd ICUDL Alexandria, Egypt by Dr. Gloriana St. Clair Carnegie Mellon University.
Information Retrieval Part 2 Sissi 11/17/2008. Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching.
Accessing News Video Libraries through Dynamic Information Extraction, Summarization, and Visualization Mike Christel Carnegie Mellon University, USA June.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
Protein Targeting and Degradation
Google search in general  Google Search, commonly referred to as Google Web Search or just Google, is a web search engine owned by Google Inc. It is.
Data Mining in Germany IIM Conference, Oct. 24, 2012 Gottfried Schwarz, DLR > Lecture > Author Document > Datewww.DLR.de Chart 1.
Definition, purposes/functions, elements of IR systems Lesson 1.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Searching the Web for academic information Ruth Stubbings.
Information Retrieval in Practice
Google Scholar and ShareLaTeX
Marking the Most of the Web’s Resources
Recommender Systems & Collaborative Filtering
Differentiating Instruction Using Nettrekker
Course Summary (Lecture for CS410 Intro Text Info Systems)
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Next-Generation Search Engines -Perspective and challenges
Information Retrieval and Web Search
Thanks to Bill Arms, Marti Hearst
Data Mining Chapter 6 Search Engines
Retrieval of Authentic Documents for Reader-Specific Lexical Practice
Web Mining Department of Computer Science and Engg.
How Search Engines Work?
Introduction to Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Search
Introduction to Search Engines
Presentation transcript:

Challenges for Information Fusion in Retrieval Welcome to RIAO Conference, Pittsburgh PA Jaime Carbonell Language Technologies Institute Carnegie Mellon University May 30, 2007

30-May-2007 RIAO Conference 2 CMU IR: Cast of Dozens School of Computer Science [6 departments/institutes] –Language Technologies Institute (IR, MT, speech, …) –Machine Learning Department (data & text mining, …) –Computer Science Department (multi-media, algorithms, …) Cross-Cutting Projects [Universal Library, Informedia, …] Diverse Expertise & Collaboration [cross-dept, cross-disc…] Jamie CallanJamie Callan Jaime CarbonellJaime Carbonell Yiming YangYiming Yang

30-May-2007 RIAO Conference 3 LTI’s Bill of Rights right Get the right information To the right people At the right time On the right medium In the right language With the right level of detail Search Engines Personalization Anticipatory Analysis Speech Recognition Machine Translation Summarization

30-May-2007 RIAO Conference 4 NEXT-GENERATION SEARCH ENGINES Search Criteria Beyond Query-Relevance –Popularity of web-page (link density, clicks, …) –Information novelty (content differential, recency) –Trustworthiness of source –Appropriateness to user (difficulty level, …) “Find What I Mean” Principle –Search on semantically related terms –Induce user profile from past history, etc. –Disambiguate terms (e.g. “Jordan”, or “club”) –From generic search to helpful E-Librarians

30-May-2007 RIAO Conference 5 MMR Ranking vs Standard IR query documents MMR IR λ controls spiral curl

30-May-2007 RIAO Conference 6 KNOWLEDGE MAPS: First Steps Towards Useful eLibrarians Query: “Tom Sawyer” Tom Sawyer home page The Adventures of Tom Sawyer Tom Sawyer software (graph search) Disneyland – Tom Sawyer Island RESULTS: Universal Library: free online text & images Bibliomania – free online literature Amazon.com: The Adventures of Tom… WHERE TO GET IT: CliffsNotes: The Adventures of Tom… Tom Sawyer & Huck Finn comicbook “Tom Sawyer” filmed in 1980 A literary analysis of Tom Sawyer DERIVATIVE & SECONDARY WORKS: Mark Twain: life and works Wikipedia: “Tom Sawyer” Literature chat room: Tom Sawyer On merchandising Huck Finn and Tom Sawyer RELATED INFORMATION:

30-May-2007 RIAO Conference Project for the Ages (Y3K compatible) The Universal Library

30-May-2007 RIAO Conference 8 Million Book Project Scan, OCR, index, 10 6 books Completed in 2006 US, China, India, Egypt ~20TB (tif, XML, …) The Usual Suspects Universal Library New Challenges 1M  10M  100M Copyright wars (Google) Search, summarize, translate Beyond books & journals –Images, videos, music –Science (next slides)

30-May-2007 RIAO Conference SEARCHING MATHEMATICS Has this integral ever been evaluated?

30-May-2007 RIAO Conference SEARCHING MATHEMATICS MATHEMATICA C.F.: Integrate[ Times[Power[E,Times[ -1,Power[V1,2]]], Sin[Power[V1,2]]], {V1,0,Infinity}]

30-May-2007 RIAO Conference 11 Indexing Images (vs just the labels) Who is this guy? Easy for humans, hard to automate What is George W doing? Hard even for humans to answer…

30-May-2007 RIAO Conference 12 Primary Sequence MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA 3D Structure Folding Complex function within network of proteins Normal P ROTEIN S Sequence  Structure  Function (Borrowed from: Judith Klein-Seetharaman)

30-May-2007 RIAO Conference 13 Primary Sequence MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA 3D Structure Folding Complex function within network of proteins Disease P ROTEIN S Sequence  Structure  Function

30-May-2007 RIAO Conference 14 Searching for Protein Structures at Different Levels of Granularity Protein Structure is a key determinant of protein function The gap between the known protein sequences and structures: –3,023,461 sequences v.s. 36,247 resolved structures (1.2%) How do we query with a structure, or with a function to see which proteins match?

30-May-2007 RIAO Conference 15 Last Words “IR will herald the next revolution in information utility” – Herbert A. Simon, circa 1985 “The web without search engines is like the night without Edison” – Anonymous “A picture may be worth a thousand words, but a book is worth a thousand pictures” – Yours truly “Billions and billions” – Carl Sagan Have a Great Conference!