Search and Data Management Rakesh Agrawal MSR Search Lab.

Slides:



Advertisements
Similar presentations
Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
Advertisements

Jane Long, MA, MLIS Reference Services Librarian Al Harris Library.
Data Mining and Text Analytics Advertising Laura Quinn.
Personalization and Search Jaime Teevan Microsoft Research.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
PRODUCT FOCUS 4/14/14 – 4/25/14 INTRODUCTION Our Product Focus for the next two weeks is Microsoft Office 365. Office 365 is Microsoft’s most successful.
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Web 2.0. Definitions Web 1.0 Web 1.0 Static web pages Static web pages Use of search engines Use of search engines “Surfing” the web “Surfing” the web.
Evaluating Search Engine
+ Interventions for Ethnically Diverse Populations.
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Problem Addressed The Navigation –Aided Retrieval tries to provide navigational aided query processing. It claims that the conventional Information Retrieval.
Experiences Teaching Math Using Wikipedia Andrew Knyazev Twenty-Third Annual International Conference on Technology in Collegiate Mathematics Denver, Colorado.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Introduction to SharePoint 2010 Sayed Ali (MCTS, MCITP) Senior SharePoint Administrator Arabian Advanced Systems(Naseej)
Rakesh Agrawal Technical Fellow Search Labs, Microsoft Research – Silicon Valley.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
1 Humane Data Mining: The New Frontier Rakesh Agrawal Microsoft Search Labs Mountain View, California Updated version of the SIGKDD-06 keynote.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Unlocking the Power of NHANES. Agenda I.Introduction Joshua Murphy, Vice President II.Demonstration/Training Dennis Wijnker, Senior Software Architect,
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Data Mining By Dave Maung.
P2Pedia A Distributed Wiki Network Management and Artificial Intelligence Laboratory Carleton University Presented by: Alexander Craig May 9 th, 2011.
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Understanding User’s Query Intent with Wikipedia G 여 승 후.
University of Delaware Workshops on Problem-Based Learning International Islamic University Malaysia Integrating Information Technology.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Social Bookmarking with del.icio.us. What is del.icio.us? Social Software Store your bookmarks online Tag your bookmarks Share your bookmarks with others.
SEC835 Security in Databases and Web applications Presentation.
Knowledge Ontario Integration Collaboration Content Knowledge Virtual Communities Information Resources Libraries Archives Museums Education Social Space.
Knowledge Ontario Integration Collaboration Content Knowledge Virtual Communities Information Resources Libraries Archives Museums Education Social Space.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Welcome to Atomic Learning! Your online software training and support resource
Post-Ranking query suggestion by diversifying search Chao Wang.
Assignment Examples of Portfolios using wikispaces
University of Delaware Workshops on Problem-Based Learning American University of Beirut Integrating Information Technology and PBL.
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Quantitative Methods. Focus on numbers as opposed to text Even textual information is converted to numbers Purpose: Describe situations/events/contexts.
Data Mining in Germany IIM Conference, Oct. 24, 2012 Gottfried Schwarz, DLR > Lecture > Author Document > Datewww.DLR.de Chart 1.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Database Technologies for E-Commerce Rakesh Agrawal IBM Almaden Research Center.
 GEETHA P.  Originally coined by Tim O’Reilly Publishing Media  Second generation of services available on www.  Lets people collaborate and share.
Athabasca University COMP 683 Introduction to Learning and Knowledge Analytics Project Analytics Model Darin Hobbs
What is Confluence Confluence is the world’s most popular commercial enterprise wiki that lets you edit and share wiki pages, documents and rich content.
Our Digital Showcase Scholars’ Mine Annual Report from July 2015 – June 2016 Providing global access to the digital, scholarly and cultural resources.
Evaluating Websites.
Library Research Workshop
Web Application.
English Hub School networks A-level English Language
DB/IR Research at Stony Brook
Sitemap – Web analytics
Summarizing Entities: A Survey Report
The Knowledge Center.
Fluency with Information Technology
Navi 下一步工作的设想 郑 亮 6.6.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
COLLABORATING VIA BLOGS AND WIKIS
McGraw-Hill Technology Education
Presentation transcript:

Search and Data Management Rakesh Agrawal MSR Search Lab

Current Focus & Direction Understand the virtuous cycle between search and data and ways to accelerate it New search-centric applications –Personal data mining (Health) –Distributed Knowledge creation (Education)

Search & Data: Virtuous Cycle Search DataInsights Queries, Clicks Mining Relevance Web Pages Feeds Better Search Results ► More Data ►Greater Insights ► Better Search Results Intents Behaviors Connections Popularity Trends

Related Searches (aka Query Suggestions) Most popular queries containing the current query Analysis of how users reformulated their queries Query click graph to find related queries FootballSoccer Wildflower cafeWildflower bakery (whole query) (piecewise)

Result Diversification Ideas from portfolio theory to allocate space to different result types Marginal utility of adding a document decreases if the result set already contains high quality documents of the same type Query and document classification using merged click logs

Seed documents ANIMALS documents ANIMALS queries Classification Using Click Graph Algorithm: Random walk with absorbing states

Changing Nature of Disease New Challenge: chronic conditions: illnesses and impairments expected to last a year or more, limit what one can do and may require ongoing care. In 2005, 133 million Americans lived with a chronic condition (up from 118 million in 1995). Infectious Diseases

Technology Trends Tremendous simplification in the technologies for capturing useful personal information Dramatic reduction in the cost and form factor for personal storage Cloud Computing

Personal Health Analytics

Personal Data Mining Charts for appropriate demographics? Optimum level for Asian Indians: 150 mg/dL (much lower than 200 mg/dL for Westerners) Due to elevated levels of lipoprotein(a)* Computation and selection across millions of data sources Privacy and security *Enas et al. Coronary Artery Disease In Asian Indians. Internet J. Cardiology

Collaborative Knowledge Creation (Educational Material) More than 3.5 million articles in 75 languages Fashioned by more than 25,000 writers 1 million articles in English (80,000 in Encyclopedia Britannica) Inspired by Wikipedia But multiple viewpoints rather than one consensus version! How to personalize search to find the material suitable for one’s own style of teaching? Management of trust and authoritativeness?

Summary Web search is a “data management and creating value from data” problem New search-centric applications can provide rich fodder for future database research.