Tagging with Queries: How and Why?

Slides:



Advertisements
Similar presentations
PART IV - EMBED VIDEO, AUDIO, AND DOCUMENTS. Find a video on Youtube.com: Search for a video, then look for the Embed code. Copy this code into the HTML/JavaScript.
Advertisements

PayPal Phishing Example. Can you tell which is real? 1. 2.
Local SEO Panel Search Engine Optimization – employing techniques that help your website rank higher in organic (natural) search results. What is SEO.
Output URL Bidding Panagiotis Papadimitriou, Hector Garcia-Molina, (Stanford University) Ali Dasdan, Santanu Kolay (Ebay Inc) Related papers: VLDB 2011,
12-CRS-0106 REVISED 8 FEB 2013 PRESENTS Meeting Notice feeds and iCal Functionality.
FM Web Scraping FMPUG: Dallas Chapter Taylor Made Services: FileMaker Presentation March 6, 2009 Dallas Texas.
Center for E-Business Technology Seoul National University Seoul, Korea Socially Filtered Web Search: An approach using social bookmarking tags to personalize.
@tgwilson / gilliganondata.com Web Analytics Tagging and Tracking EXPLAINE D.
Video, audio, embed, iframe, HTML Form
H YPERLINKING DIGITAL LIBRARIES ON THE WEB Juan Camilo Zapata ITEC – 810 Supervisor Robert Dale 1.
Simrank++: Query Rewriting through link analysis of the click graph Ioannis Antonellis Hector Garcia-Molina
Synchronizing a Database To Improve Freshness Junghoo Cho Hector Garcia-Molina Stanford University.
Web Search – Summer Term 2006 IV. Web Search - Crawling (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Page Behavior IS 373—Web Standards Todd Will.
How to Crawl the Web Junghoo Cho Hector Garcia-Molina Stanford University.
Creating your website Using Plain HTML. What is HTML? ► Web pages are authored in HyperText Markup Language (HTML) ► Plain text is marked up with tags,
1 Intelligent Crawling Junghoo Cho Hector Garcia-Molina Stanford InfoLab.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
Search Engine Optimization for Silverlight Brad Abrams
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
Tag-based Social Interest Discovery
Increasing Ecommerce Sales Determine keywords that lead to sales (research vs buy mode) Make sure keywords are aligned with products/services you offer.
Christopher M. Pascucci.NET Programming: Basic ASPX Scripting & HTML Embedment.
HTML - Forms By Joaquin Vila, Ph.D.. Form Tag The FORM tag specifies a fill-out form within an HTML document. More than one fill-out form can be in a.
Integrating JavaScript and HTML5 HTML5 & CSS 7 th Edition.
Web Spam Detection with Anti- Trust Rank Vijay Krishnan Rashmi Raj Computer Science Department Stanford University.
 What is SEO?  Industry Research  SEO Process  Technical aspects of SEO  Social Media - MySpace Optimization  Measuring SEO success  SEO Tools.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Tag Data and Personalized Information Retrieval 1.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
1 Midterm Review. 2 Midterm Exam  30% of your grade for the course  October14 at the regular class time  No makeup exam or alternate times  Closed.
 Word Processing  Spreadsheets  Presentations  Drawings  Forms.
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
HTML - Forms By Joaquin Vila, Ph.D.. Form Tag The FORM tag specifies a fill-out form within an HTML document. More than one fill-out form can be in a.
Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
Http protocol Response-request Clients not limited to web browsers. Anything that can access code implementing the protocol works: –Standalone programs.
® IBM Software Group © 2007 IBM Corporation Best Practices for Session Management
 How to create a quiz and teacher code. 1)First find a video on Youtube or other video hosting website and copy the embed code (Video embed code pictured.
Access Chapter 8- Integrating Access with the Internet and other Programs.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
© Copyright 2012 STI INNSBRUCK Measuring your visibility in the online sharing platforms (YouTube, slideshare,
Chapter 16 The World Wide Web. FIGURE 16.0.F01: A very, very simple Web page. Courtesy of Dr. Richard Smith.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
242/102/49 0/51/59 181/172/166 Primary colors 248/152/29 PMS 172 PMS 137 PMS 546 PMS /206/ /227/ /129/123 Secondary colors 114/181/204.
Welcome to the Minnesota SharePoint User Group February 13 th, 2013 SharePoint 2013 – Developers Track - Client Side Rendering.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Field Trip #24 Setting Up a Web Server. Apache Apache is one of the most successful open source web servers In 1995 the most popular web server was the.
How Web Database Architectures Work CPS181s April 8, 2003.
WebBase: Building a Web Warehouse Hector Garcia-Molina Stanford University Work with: Sergey Brin, Junghoo Cho, Taher Haveliwala, Jun Hirai, Glen Jeh,
Google Analytics Graham Triggs Head of Repository Systems, Symplectic.
Training Maps and Advanced Query Tools Midwest. Begin by Signing In You can always view the data in EDDMapS without signing in.
1 Efficient Crawling Through URL Ordering Junghoo Cho Hector Garcia-Molina Lawrence Page Stanford InfoLab.
HOW TO USE GOOGLE WEBMASTER TOOLS TO IMPROVE SEO ? GOOGLE WEBMASTEER.
PHP Assignment Help BookMyEssay. What is PHP PHP is a scripting language generally used on web servers. It is an open source language and embedded code.
Web Basics: HTML/CSS/JavaScript What are they?
Search Engine Optimization
PIWIK JUNIOR TIDAL ASSOCIATE PROF., WEB SERVICES & MULTIMEDIA LIBRARIAN NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY.
CIIT-Human Computer Interaction-CSC456-Fall-2015-Mr
Programming by a Sample: Rapidly Creating Web Applications with d.mix
Web Systems Development (CSC-215)
Web Mining Department of Computer Science and Engg.
Client-Server Model: Requesting a Web Page
INTRODUCTION TO OU CAMPUS
Using Link Information to Enhance Web Page Classification
Who is Using your webSite?
© 2017, Mike Murach & Associates, Inc.
Cross Site Request Forgery (CSRF)
Development of Search engine optimization for Crowdfunding site
Presentation transcript:

Tagging with Queries: How and Why? Ioannis Antonellis antonell@cs.stanford.edu Hector Garcia-Molina hector@cs.stanford.edu Jawed Karim jawed@cs.stanford.edu

Content on the Web  Back Link Text Search queries Page Text Forward Link Text Cnn Obama Critics news Stanford Infolab

How? Basic observation: http referrer field contains search query Stanford Infolab 3

How?  Stanford Infolab

How? Basic observation: http referrer field contains search query 1) Extract queries from web access log Stanford Infolab 5

Web Access Log a997c1950718d75c03f22ca8715e50b3 [28/Feb/2007:23:45:47 -0800] /group/svsa/cgi-bin/www/officers.php http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=HPIB,HPIB:2006-47,HPIB:en&q=sexy+random+facts a64344ffd6638d0f6fb2a0284f98b28b [28/Feb/2007:23:45:49 -0800] /group/King/ "http://www.google.com.au/search?hl=en&q=Martin+Luther+King&meta=" 413fa663474b2288c1661882e7e62aea [28/Feb/2007:23:46:02 -0800] /group/pandegroup/folding/results.html "http://www.google.com/search?sourceid=navclient-menuext&ie=UTF-8&q=RESULTS" 3d2edd4dfa7778da92875ee67a319433 [28/Feb/2007:23:46:03 -0800] /group/vpge/sgsi/entrepreneurship/ "http://www.google.com/search?hl=en&q=summer+institute+of+entrepreneurship" ac49793239a6c490023e460fd4863a48 [28/Feb/2007:23:46:06 -0800] / "http://www.google.com/search?sourceid=navclient&hl=ko&ie=UTF-8&rlz=1T4SUNA_ko___KR209&q=stanford" 1c9893680 Stanford Infolab

How? Basic observation: http referrer field contains search query 1) Extract queries from web access log 2) Embed Javascript code in web pages that capture search queries Stanford Infolab 7

Embeddable code Stanford Infolab 8

How? Basic observation: http referrer field contains search query 1) Extract queries from web access log 2) Embed Javascript code in web pages and capture search queries Convince server administrator/page onwer Stanford Infolab 9

Stanford Infolab 10

Query tags Stanford Infolab 11

Information value of Query Tags Datasets: Stanford Query Logs: 360,000 URLs, 900,000 query tags Delicious@Stanford: 3,000 URLs, 5,500 tags WebBase Stanford Infolab 12

Experiments - Summary URLs coverage Query vs Delicious Tags Query/Delicious Tags vs Pagetext Stanford Infolab

URLs coverage Query logs provide tags for ~110 times more URLs than delicious 13% of delicious URLs (380 URLs) only tagged by delicious Stanford Infolab 14

Query Tags Query logs provide 42 query tags per URL on average Stanford Infolab 15

Delicious Tags Delicious provides 3 tags per URL on average Stanford Infolab 16

Tags for common URLs Query logs provide 250 query tags per URL on average for common URLs Delicious provides 5 tags per URL on average for common URLs Stanford Infolab 17

Query Tags vs Page Text For every URL, 1 out of 3 query tags are not present in the pagetext Stanford Infolab 18

Delicious Tags vs Page Text For every URL, 1 out of 2 query tags are not present in the pagetext Stanford Infolab 19

Tags for common URLs For common URLs, 1 out of 2 query/delicious tags not present in the pagetext Stanford Infolab 20

Conclusions Query tags: Can be extracted in a distributed fashion new promising source of information can provide substantially many, new tags, for a large fraction of the Web To be removed Stanford Infolab 21 21

Thank You! (DEMO) http://tags.stanford.edu Stanford Infolab 22

 Stanford Infolab 23

 Stanford Infolab 24

Stanford Infolab 25

 Stanford Infolab 26

 Stanford Infolab 27

Stanford Infolab 28

Stanford Infolab 29

Stanford Infolab 30

Stanford Infolab 31

Stanford Infolab 32

How? Stanford Infolab 33

Stanford Infolab 34

Stanford Infolab 35