2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology, Statistics University of Illinois, Urbana-Champaign
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, General Steps to Define a Research Problem Generate and Test Raise a question Novelty test: Figure out to what extent we know how to answer the question –There’s already an answer to it: Is the answer good enough? Yes: not interesting, but can you make the question more challenging? No: your research problem is how to get a better answer to the raised question –No obvious answer: you’ve got an interesting problem to work on Tractability test: Figure out whether the raised question can be answered –I can see a way to answer it or potentially answer it: you’ve got a solvable problem –I can’t easily see a way to answer it: Is it because the question is too hard or you’ve not worked hard enough? Try to reframe the problem to make it easier Evaluation test: Can you obtain a data set and define measures to test solutions/answers? –Yes: you’ve got a clearly defined problem to work on –No: can you think of anyway to indirectly test the solutions/answers? Can you reframe the problem to fit the data? Every time you reframe a problem, try to do all the three tests again.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Rigorously Define Your Research Problem Exploratory: what is the scope of exploration? What is the goal of exploration? Can you rigorously answer these questions? Descriptive: what does it look like? How does it work? Can you formally define a principle? Evaluative: can you clearly state the assumptions about data collection? Can you rigorously define measures? Explanatory: how can you rigorously verify a cause? Predictive: can you rigorously define what prediction is to be made?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame a New Computation Task Define basic concepts Specify the input Specify the output Specify any preferences or constraints
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Map of IR Applications Web pages News articles messages Literature Organization docs Legal docs/Patents Medical records Customer complaint letter/transcripts … Kids Peking Univ. community LawyersScientists SearchBrowsingAlertMining Task/Decision support Customer Service People management + automatic reply “Google Kids” Legal Info Systems Literature Assistant Intranet Search Local Web Service Blog articles Online Shoppers ?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, From a new application to a clearly defined research problem Try to picture a new system, thus clarify what new functionality is to be provided and what benefit you’ll bring to a user Among all the system modules, which are easy to build and which are challenging? Pick a challenge and try to formalize the challenge –What exactly would be the input? –What exactly would be the output? Is this challenge really a new challenge (not immediately clear how to solve it)? –Yes, your research problem is how to solve this new problem –No, it can be reduced to some known challenge: are existing methods sufficient? Yes, not a good problem to work on No, your research problem is how to extend/adapt existing methods to solve your new challenge Tuning the problem
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Tuning the Problem Level of Challenges Impact/Usefulness Known Unknown Make a hard problem easier Make an easy problem harder Increase impact (more general)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Examples of Problem Formulation Risk minimization framework Study of smoothing Axiomatic retrieval framework Comparative Text Mining Contextual PLSA Opinion Integration
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Form Research Hypotheses Typical hypotheses in IR: –Hypothesis about user characteristics (tested with user studies or user- log analysis, e.g., clickthrough bias) –Hypothesis about data characteristics (tested with fitting actual data, e.g., Zipf’s law) –Hypothesis about methods (tested with experiments): Method A works (or doesn’t work) for task B under condition C by measure D (feasibility) Method A performs better than method A’ for task B under condition C by measure D (comparative) Introduce baselines naturally lead to hypotheses Carefully study existing literature to figure our where exactly you can make a new contribution (what do you want others to cite your work as?) The more specialized a hypothesis is, the more likely it’s new, but a narrow hypothesis has lower impact than a general one, so try to generalize as much as you can to increase impact But avoid over-generalize (must be supported by your experiments) Tuning hypotheses (next lecture)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Next Lecture (June 26): Test/Refine Hypothese