Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,

Slides:



Advertisements
Similar presentations
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
Advertisements

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.
Day 1 SOCIAL MEDIA CERTIFICATE SERIES DAY 4 - LINKEDIN.
How to do an Effective Literature Search? Application Training Module Series I by Customer Education Team Stop Searching,
A Quality Focused Crawler for Health Information Tim Tang.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Advanced Topics in Next- Generation Wireless Networks Qian Zhang Department of Computer Science HKUST.
CSCD 555 Research Methods for Computer Science
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender Systems; Social Information Filtering.
Online communities 1 Theory revision Complete some of the activities in this powerpoint and use the revision book to answer questions.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Personal Transparency and self- analytic tools for online habits Mark Johnson David Sherlock David Griffiths The Institute for Educational Cybernetics.
American Medical Association Journals include: JAMA (journal of the American Medical Association.), Archives of surgery, Archives of ophthalmology and.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
 Facebook  Youtube  Twitter  Google +  Pinterest.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Pick a Good IR Research Problem ChengXiang Zhai Department of Computer.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Network of Communities: Synergy Through Common Formats, Reuse, and Models for Contribution Cathy Manduca, Sean Fox, Bruce Mason representing SERC, comPADRE,
Sharad Oberoi and Susan Finger Carnegie Mellon University DesignWebs: Towards the Creation of an Interactive Navigational Tool to assist and support Engineering.
How to get the most out of the survey task + suggested survey topics for CS512 Presented by Nikita Spirin.
Community Information Service Omid Fatemieh CS 598 CXZ Department of Computer Science University of Illinois at Urbana-Champaign.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
UNIT 14 1 Websites. Introduction 2 A website is a set of related webpages stored on a web server. Webmaster: is a person who sets up and maintains a.
WIRESCRIPT1 WIRESCRIPT Web Interactive REview of Scientific Culture, Research, Innovation Policy and Technology.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department.
Beyond Search Engines: Advanced Web Searching Subject Directories  Librarians’ Index to the Internet  Infomine Finding Databases on a Subject  The Invisible.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Promotion of e-Commerce sites. A business which uses e- commerce to trade online must also advertise. Several traditional methods can be used, such as.
Puget Sound Information Challenge Experiences and Lessons Learned.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.
How to… Research Like An Expert! Day 2. Today’s Goals By the end of the period, I will: have chosen my ISU topic have mapped out my search strategy have.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Using Blogs in the Classroom Presented By: Patrick Egan.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
© 2010 Deep Web Technologies, Inc. Taking the Library Back from Google Abe Lederman, President and CTO Deep Web Technologies May 12, 2010.
Information Retrieval
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Individualized Knowledge Access David Karger Lynn Andrea Stein.
The Technical Report Hitting the ground running. Research Research is a way of… What are some everyday uses of research? What experiences have you had.
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
Kendra Hunter & Charde Johnson EDUC Dr. M. Kariuki.
Oxlip+. What is Oxlip+? A tool for finding & linking to databases – Online collections of (scholarly) materials – Includes full text / indexes / range.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
COLLABORATIVE WEB 2.0 TOOLS IN EDUCATION USING WIKIS & BLOGS IN THE CLASSROOM.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
By: Jamie Morgan  A wiki is a web page or collection of web pages which you and your students can access to contribute or modify content without having.
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Searching the Web for academic information Ruth Stubbings.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
Researching for your Literature Review
Introduction to IR Research
Web Mining Ref:
Course Summary (Lecture for CS410 Intro Text Info Systems)
INF 103 Education for Service-- snaptutorial.com.
INF 103 Teaching Effectively-- snaptutorial.com
INF 103 Education for Service-- tutorialrank.com
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
ROLE OF «electronic virtual enhanced research-engaged student teams» WEB PORTAL IN SOLUTION OF PROBLEM OF COLLABORATION INTERNATIONAL TEAMS INSIDE ONE.
Presentation transcript:

Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

There are many research problems to work on. It’s more beneficial to the society if we work on problems that reflect real world challenges…

What is a Good Research Problem? A good research problem is a solvable challenge that is well connected to a real world need/problem Real word challenges vs. imaginary challenges –Not all challenges are interesting (to the society) –Real world challenges are always interesting to work on –Imaginary challenges may (happen to) be interesting –Spend your effort to solve interesting challenges so that you’ll make more contributions to the society However, not all real world problems are challenges; some are straightforward to solve Not all challenges/problems are solvable (with limited resources, time, money, tools, etc)

Real Word vs. Imaginary Challenges Real World Needs/Problems Challenges Imaginary Needs/Problems Real world challenges

Identify a Good Research Problem Level of Challenges Impact/Usefulness Known Unknown Good applications Not interesting for research High impact Low risk (easy) Good short-term research problems High impact High risk (hard) Good long-term research problems Low impact Difficult Maybe publishable, but not good research problems Low impact Low risk Bad research problems (May/May not be publishable)

Three Basic Questions to Ask for an IR Problem Who are the users? –Everyone vs. Small group of people What data do we have? –Web (whole web vs. sub-web) – (public vs. personal ) –Literature (general vs. special discipline) What functions do we want to support? –Information access vs. knowledge acquisition –Decision and task support Everyone (who has an Internet connection) The whole web (indexed by Google) Search (by keywords)

Map of IR Applications Web pages News articles messages Literature Organization docs Legal docs/Patents Medical records Customer complaint letter/transcripts … Kids UIUC community LawyersScientists SearchBrowsingAlertMining Task/Decision support Customer Service People management + automatic reply “Google Kids” Legal Info Systems Literature Assistant Intranet Search Local Web Service

High-Level Challenges in IR How to make use of imperfect IR techniques to do something useful? –Save human labor (e.g., partially automate a task) –Create “add on” value (e.g., literature alert) – A lot of HCI issues (e.g., allowing users to control) How to develop robust, effective, and efficient methods for a particular application? –Methods need to “work all the time” without failure –Methods need to be accurate enough to be useful –Methods need to be efficient enough to be useful

Challenge 1: From Search to Information Access Search is only one way to access information Browsing and recommendation are two other ways How can we effectively combine these three ways to provided integrated information access? E.g., artificially linking search results with additional hyperlinks, “literature pop- ups”…

Challenge 2: From Information Access to Task Support The purpose of accessing information is often to perform some tasks How can we go beyond information access to support a user at the task level? E.g., automatic/semi-automatic reply for customer service, literature information service for paper writing (suggest relevant citations, term definitions, etc)

Challenge 3: Support Whole Life Cycle of Information A life cycle of information consists of “creation”, “storage”, “transformation”, “consumption”, “recycling”, etc Most existing applications support one stage (e.g., search supports “consumption”) How can we support the whole life cycle in an integrated way? E.g., Community publication/subscription service (no need for crawling, user profiling)

Challenge 4: Collaborative Information Management Users (especially similar users) often have similar information need Users who have explored the information space can share their experiences with other users How to exploit the collective expertise of users and allow users to help each other? E.g., allowing “information annotation” on the Web (“footprints”), collaborative filtering/retrieval,

IR Problems Around Us (Web) Finding information about our alumni (motivated by Siebel), more generally, targeted crawling Paper filter (Can we filter out non-research pages in Google’s results?), more generally, a user-end filter How to better design our department website? (Currently, it’s running Google; can we do better for searching our department website?) Course information integration (Can we automatically generate a virtual Machine Learning course website that serves as a portal to all course information related to machine learning?) UIUC Yellow Pages & White Pages, more generally, can we automatically generate such directories for any website? (Web site summarization?) …

IR Problems Around Us ( ) How to recognize and block spams? How to better manage my personal (thread-based organization, appointment extraction, reply-assistant) How to better manage our newsgroups? How to help the TSG group to increase their productivity? (e.g., automatic generation of FAQs from an archive, suggest related answers to a question) …

IR Problems Around Us (Literature) How can we build a literature recommender/alert system? Can we mine the CS literature to discover “what’s hot in CS?” Can we discover emerging interdisciplinary topics between DAIS area and network area from literature? Can we automatically recognize survey/review papers and collect all surveys about a topic? …

Plan for the Next 3 Classes Goals: –Move from real world problems to research topics and further to specific research questions –Identify interesting research topics/questions for the 3 domains Class format: –Brainstorming: Everyone will bring in at least one research topic –Discussions/debates on topics –Select topics to cover in the course

Assignment For each of the 3 domains (Web, , Literature), every one identifies at least one interesting real world challenge about text information management; the more the better If you can’t think of one –Surf on the web and see what problems are being addressed –Ask yourself, what kind of information management tool do I wish to have, but doesn’t already exist? –Ask yourself, what features/capabilities do you wish Google to have? –… –Randomly combine some IR function with a group of users and some data For each challenge, identify –Who are the users? (Who will benefit from solving this challenge?) –What are the data involved in the challenge? –What kind of function(s) will be developed? (What is exactly the challenge?) Write one small paragraph for each problem to state clearly what the challenge is and argue why it is an interesting problem to solve. all your paragraphs and your domain preferences to me by next Monday night (11:59pm, Jan 24)