Accelerating Research Discovery: Towards an Intelligent Workbench for Researchers Department of Computer Science Affiliated with Graduate School of Library.

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Group Project CVEN Mixing and Transport in the Environment. A River Dye Study.
Learn how to search for information the smart way Choose your own adventure!
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
How to Read a Technical Paper Locking and Consistency 10/7/05.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Grant Proposal Basics 101 Office of Research & Sponsored Programs.
What’s new in search? Internet Librarian Oct 29 th 2007.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Dr. Alireza Isfandyari-Moghaddam Department of Library and Information Studies, Islamic Azad University, Hamedan Branch
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Pick a Good IR Research Problem ChengXiang Zhai Department of Computer.
Automatic Construction of Topic Maps for Navigation in Information Space ChengXiang (“Cheng”) Zhai Department of Computer Science University of Illinois.
Search Engines and Information Retrieval Chapter 1.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Prepare Yourself for IR Research ChengXiang Zhai Department of Computer.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
How to get the most out of the survey task + suggested survey topics for CS512 Presented by Nikita Spirin.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Data Mining By Dave Maung.
Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Search Engine Architecture
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Course Summary ChengXiang Zhai ( 翟成祥 ) Department of.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Automatic Labeling of Multinomial Topic Models
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
DESIGNING AN ARTICLE Effective Writing 3. Objectives Raising awareness of the format, requirements and features of scientific articles Sharing information.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Introduction to IR Research
Search Engine Architecture
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Course Summary (Lecture for CS410 Intro Text Info Systems)
ChengXiang (“Cheng”) Zhai Department of Computer Science
Introduction to TIMAN: Text Information Managemetn & Analysis
Introduction of KNS55 Platform
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Web Mining Research: A Survey
Presentation transcript:

Accelerating Research Discovery: Towards an Intelligent Workbench for Researchers Department of Computer Science Affiliated with Graduate School of Library & Information Science Department of Statistics Carl R. Woese Institute for Genomic Biology University of Illinois at Urbana-Champaign ChengXiang (“Cheng”) Zhai Microsoft Workshop on Big Scholarly Data, July 10, 2015

Motivation Acceleration of scientific research and discovery  huge societal benefits – Faster discovery of new knowledge – Faster invention of new technology – Less spending on research Today’s workbench for researchers lacks task support Question: how can we build a general intelligent researcher’s workbench to improve productivity of every researcher?

Research Workflow Research Question Formulation Literature Search Engines Research Plan Design Research Result Generation Research Result Dissemination Literature Collaboration

An Intelligent Researcher’s Workbench Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Social Network Literature Access Support Knowledge Assistant Research Task Support

Time to Integrate Multiple Systems! Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Social Network Literature Access Support Knowledge Assistant Research Task Support

Developed at Institute of Computing Technology, Chinese Academy of Sciences Project Leaders Social Scholar “ 学术圈 ” Xueqi ChengJiafeng Guo

Social Scholar: A Vertical Social Platform Paper Centric User Centric Collaboration, Work Flow

Social Scholar Architecture ① ② ③ ④ search explore recommend analyze social collaboration Academic Social Platform

How to Support Research Tasks? Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Task Support Research Social Network Literature Access Support Knowledge Assistant

Potential Research Task Support Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant

Research Question Recommender Function: recommend research questions based on a keyword query Basic solution: – Mine future work sections of all papers to discover sentences about future work directions – Cluster them to identify major research directions – Recommend large clusters that match a user’s query to the user, or – Recommend major clusters or most recent clusters without requiring any query Potential extension: – Mine CFPs to discover “hot topics”; then use the hot topics to retrieve specific directions matching the hot topics

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Novelty Checker Function: Check whether an idea is new – Like a search engine, but would need to perform “idea matching” Basic solution: – Allow a user to provide a detailed description of the idea – Treat the description as a long query and search in papers – Return the best matching paragraphs in a paper Further extension: – Paraphrasing; favor “impact” sentences

Generating an Impact Summary [Mei & Zhai 08] Abstract:…. Introduction: ….. Content: …… References: …. … Ponte and Croft [20] adopt a language modeling approach to information retrieval. … … probabilistic models, as well as to the use of other recent models [19, 21], the statistical properties … Author picked sentences: good for summary, but don’t reflect the impact Solution: Citation context  infer impact; Original content  summary Reader composed sentences: good signal of impact, but too noisy to be used as summary Citation Context Target: extractive summary of the impact of a paper 14 Extraction of variable-length citation context [Sondhi & Zhai 14]

Original Abstract of “A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval” 15

1. Figure 5: Interpolation versus backoff for Jelinek-Mercer (top), Dirichlet smoothing (middle), and absolute discounting (bottom). 2. Second, one can de-couple the two different roles of smoothing by adopting a two stage smoothing strategy in which Dirichlet smoothing is first applied to implement the estimation role and Jelinek-Mercer smoothing is then applied to implement the role of query modeling 3. We find that the backoff performance is more sensitive to the smoothing parameter than that of interpolation, especially in Jelinek-Mercer and Dirichlet prior. 16 Specific to smoothing LM in IR; especially for the concrete smoothing techniques (Dirichlet and JM) Impact Summary of “A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval”

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Topic Explorer Function: Support flexible navigation in the research topic space Basic solution: Construct a multi-resolution topic map; seamless integration of search & browsing – Search log-based map – Document-based map – Ontology-based map – Flexible switching between different maps Further extension: – Entity-Relation graph browsing

Information Seeking as Sightseeing Know the address of an attraction site? – Yes: take a taxi and go directly to the site – No: walk around or take a taxi to a nearby place then walk around Know what exactly you want to find? – Yes: use the right keywords as a query and find the information directly – No: browse the information space or start with a rough query and then browse When query fails, browsing comes to rescue… 19

Current Support for Browsing is Limited Hyperlinks – Only page-to-page – Mostly manually constructed – Browsing step is very small Web directories – Manually constructed – Fixed categories – Only support vertical navigation ODP Beyond hyperlinks? Beyond fixed categories? How to promote browsing as a “first-class citizen”? 20

Sightseeing Analogy Continues… 21

Topic Map for Touring Information Space Zoom in Zoom out Horizontal navigation Topic regions Multiple resolutions 22

Collaborative Surfing [Wang et al. 08] 23 Clickthroughs become new footprints Navigation trace enriches map structures New queries become new footprints Browse logs offer more opportunities to understand user interests and intents

Constructing Topic Evolution Map with Probabilistic Citation Analysis [Wang et al. 13] Given research articles and citations in a research community Identify major research topics (themes) and their spans Construct a topic evolution map For each topic, identify milestone papers 24

Sample Results: Major Topics in NLP Community 25 ACL Anthology Network (AAN) Papers from NLP major conferences from ,041 papers 82,944 citations

NLP-Community Topic Evolution Topic Evolution: (green: newer, red: older) 3: Unification-based grammer (1988) 6: Interactive machine translation (1989) 13: tree-adjoining grammer (1992) Fading-out 72: Coreference resolution (2002) 89: Sentiment-Analysis (2004) 25: Spelling correction (1997) 10: Discourse centering method (1991) Shifting 8: Word sense disambiguation (1991) 18: Prepositional phrase attachment (1994) 34: Statistical parsing (1998) 73: Discriminative-learning parsing (2002) 95: Dependency parsing (2005) Branching 20: Early SMT(1994) 29: decoding, alignment, reordering (1998) 50: min-error-rate approaches (2000) 96: phrase-based SMT (2000) 26

Detailed View of Topic “Statistical Machine Translation” 27

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Discussion Center Function: Support research discussion with a Research Forum or Community Question Answering platform Basic solution: – Community QA organized by a topic map or papers – Push questions to the most relevant experts (authors) – Research forums organized by topics Further extension: – Automatic question answering – One forum per paper/Collaborative paper annotation

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Collaborator Finder Function: Support searching for an expert on a topic Basic solution – Information Extraction + Query creation – Queries can contain both structured and non- structured data. – Build a profile for each individual person and support expert finding Further extension: – Automatic team formation: take BAA/RFP as input, suggest people to form a team

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Community Newsletter Function: Automatically generate a newsletter for any research community, possibly personalized Basic solution: – Report new papers, upcoming conferences, emerging topics – Report other news (e.g., new grants) Further extension: – Personalization; relevance feedback

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Definition Finder Function: Enable a researcher to search for the definition of any concept Basic solution: – Extract definition sentences from research papers – Build a search engine for searching definitions Further extension: – Summarization of definitions

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Survey Generator Function – Given a topic map, automatically generate a survey on the topic Basic solution: Define the survey generation task as – find all the relevant papers – Cluster them – Create a hypertext document with links to specific papers. Extensions: – Learn to automatically “write” an introduction by learning from many introduction text data. – Automatically extract the findings

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Citation Generator Function: While a researcher is editing a paper, the system automatically suggests the papers to be cited and where to cite them Basic solution: – Use the current paragraph that a user is writing as a query, and search for relevant references – Automatically or semi-automatically add references Extensions: – Learn how to generate sentences describing a cited work based on what other papers have said about the work

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Auto Proofreading Function: automatically do grammar checking and improve rhetorical structures etc. Basic solution: – Use existing techniques for spelling and grammar correction. Extensions: – Learn how to polish the English usage of a paper by using many high-quality full-text articles as training data

Research Question Formulation Research Plan Design Research Result Generation Research Result Dissemination Literature Research Question Recommender Novelty Checker Topic Explorer Research Topic Service Discussion Center Collaborator Finder Community Newsletter Community Service Survey Generator Definition Finder Citation Generator Literature Radar Auto Proofreading Paper Writing Assistant Potential Research Task Support

Literature Radar Function: Monitor and track the literature for potentially interesting new research results Basic solution: – Literature recommendation – Personal library – Learn a researcher’s interest over time Further extensions: – Inference of relevance; explanation of recommendation

Summary Intelligent Research Workbench for Every Researcher  Accelerate Research Discovery – Support the entire workflow of research – Multiple interactive task assistants – Unified portal to all resources – Personalization – Scholar social network (collaborative research) Optimize the combined intelligence of humans and machines – Let the machine do only what it’s good at – Minimize human’s overall effort, but have human to help the machine if needed Action item: Let’s work together! – Integration of multiple systems and parties (federation?) – From Search to Access to Task Support: Learning engine

Thank You! Questions/Comments? 45 Looking forward to opportunities for collaboration!

References Qiaozhu Mei, ChengXiang Zhai. Generating Impact-Based Summaries for Scientific Literature, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies ( ACL- 08:HLT), pages Parikshit Sondhi, ChengXiang Zhai: A Constrained Hidden Markov Model Approach for Non-Explicit Citation Context Extraction. SDM 2014: Xuanhui Wang, ChengXiang Zhai, Mining term association patterns from search logs for effective query reformulation, Proceedings of the 17th ACM International Conference on Information and Knowledge Management ( CIKM'08), pages Xiaolong Wang, ChengXiang Zhai, Dan Roth, Understanding Evolution of Research Themes: A Probabilistic Generative Model for Citations, Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'13), pp , 2013.