Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Similar presentations


Presentation on theme: "Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research."— Presentation transcript:

1 Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research

2 Research © 2008 Yahoo! Agenda Motivation Our Approach Comparison from Previous Work Experimental Results

3 Research © 2008 Yahoo! Titles on Search Results Page HTML Titles –Too long –Can be missing –Non-html results Pictures, video and audio clips Other Apps –Site-map generation

4 Research © 2008 Yahoo! Titles for “Quicklinks” Strict length restrictions Links displayed in context of home page Quicklink Titles Homepage Context

5 Research © 2008 Yahoo! Agenda Motivation Our Approach Comparison from Previous Work Experimental Results

6 Research © 2008 Yahoo! “Sources” of Information about URLs (URL: http://www.barackobama.com/issues/) URL-Tokens“barack obama issues” Web page content (HTMLTitle, KeyPhrases) “Barack Obama | Change We Can Believe In | Issues” “Issues”, “Civil Rights”, “Defense”, “Economy” Anchor text on incoming links (IntrasiteAT, IntersiteAT, HomepageAT) “Issues”, “Economic Issues” “Barack Obama’s Plan for America” Search engine queries (QueryView, QueryClick, QueryClickPos1) “obama issues”, “obama platform”, “obama campaign issues”, “barack obama platform” User generated tags (DeliciousTags) “obama campaign platform”, “cool”, “nice webpage” URL-Tokens“barack obama issues” Web page content (HTMLTitle, KeyPhrases) “Barack Obama | Change We Can Believe In | Issues” “Issues”, “Civil Rights”, “Defense”, “Economy” Anchor text on incoming links (IntrasiteAT, IntersiteAT, HomepageAT) “Issues”, “Economic Issues” “Barack Obama’s Plan for America” Search engine queries (QueryView, QueryClick, QueryClickPos1) “obama issues”, “obama platform”, “obama campaign issues”, “barack obama platform” URL-Tokens“barack obama issues” Web page content (HTMLTitle, KeyPhrases) “Barack Obama | Change We Can Believe In | Issues” “Issues”, “Civil Rights”, “Defense”, “Economy” Anchor text on incoming links (IntrasiteAT, IntersiteAT, HomepageAT) “Issues”, “Economic Issues” “Barack Obama’s Plan for America” URL-Tokens“barack obama issues” Web page content (HTMLTitle, KeyPhrases) “Barack Obama | Change We Can Believe In | Issues” “Issues”, “Civil Rights”, “Defense”, “Economy” URL-Tokens“barack obama issues” Source Instances

7 Research © 2008 Yahoo! Central Idea Words from title and context (if applicable) are preferentially used by sources in constructing instances. Degree of these preferences is source dependent.

8 Research © 2008 Yahoo! Generation of Instances (URL: http://www.barackobama.com/issues/) Quicklink Title Homepage Abstract (Context) General Vocabulary QueryClick SourceIntrasiteAT SourceHTMLTitle Source … obama issues obama campaign issues barack obama platform platform for obama campaign … Issues Foreign Policy Economic Issues Yes We Can … “Barack Obama | Change We Can Believe In | Issues” 0.5 0.40.1 0.80.1 0.20.60.2 0.5/0.4/0.10.8/0.1/0.10.2/0.6/0.2

9 Research © 2008 Yahoo! Learning Source Generation Parameters (URL: http://www.barackobama.com/issues/) Quicklink Title Homepage Abstract (Context) General Vocabulary QueryClick SourceIntrasiteAT SourceHTMLTitle Source … obama issues obama platform obama campaign issues barack obama platform … Issues Foreign Policy Economic Issues Yes We Can … “Barack Obama | Change We Can Believe In | Issues” GIVEN Learn parameter values that maximize probability of generation of instances --/--/-- UNKNOWN

10 Research © 2008 Yahoo! Finding Best Quicklink Title (URL: http://www.barackobama.com/issues/) Quicklink Title Homepage Abstract (Context) General Vocabulary QueryClick SourceIntrasiteAT SourceHTMLTitle Source … obama issues obama platform obama campaign issues barack obama platform … Issues Foreign Policy Economic Issues Yes We Can … “Barack Obama | Change We Can Believe In | Issues” UNKNOWN GIVEN Select title for which probability of generation of instances is maximum LEARNT 0.5/0.4/0.10.8/0.1/0.10.2/0.6/0.2

11 Research © 2008 Yahoo! Objective Function Sources have different number of instances –QueryClick vs. HTMLTitle Sources are associated to target web object to different degrees –QueryClick vs. QueryView –Comments on Youtube etc. Can account for dependent sources Source specific Normalization Source specific Weights

12 Research © 2008 Yahoo! Learning Source Weights With known source generation parameters we have a linear function in source weights We learn weights that ranks various candidate titles correctly –We use the linear ranking SVM described in Joachims, “Optimizing search engines using clickthrough data”, KDD 2002

13 Research © 2008 Yahoo! Where do Title Candidates come from? Instances of some sources of information Not all sources used –Ungrammatical (URL-Tokens) –Miss-spellings (QueryView) –Sometimes irrelevant (DeliciousTags) We clean some instances to obtain more candidates –Removing website name

14 Research © 2008 Yahoo! Agenda Motivation Our Approach Comparisons from Previous Work Experimental Results

15 Research © 2008 Yahoo! Comparisons with Previous Work Our title generation is an “extractive” approach –Avoid modeling gramatical correctness of titles Only learn parameters at the source level –Lesser training data needed Combine information from external sources –Can obtain titles for objects with no text content Respect constraints placed by context of title use

16 Research © 2008 Yahoo! BMW: Banko et al., Headline Generation based on Statistical Translation, ACL 2000 Rank headline candidates using 3 factors –Likelihood of seeing candidate words in a title –Likelihood of most likely sequence of the words in candidate –Likelihood of length of candidate Lots of parameters –to model word being in title –to model bi-grams –to combine the above 3 factors

17 Research © 2008 Yahoo! Agenda Motivation Our Approach Comparison from Previous work Experimental Results

18 Research © 2008 Yahoo! Empirical Evaluation Two Tasks –Generating Quicklink titles (manually judged data) –Generating Web Page Titles Metrics –F-measure, Jaccard, Exact Match, Longest Common Subsequence Baselines –Sources of information our system uses –BMW: Banko et al., ACL 2000

19 Research © 2008 Yahoo! Quicklinks Title Task ApproachF-measureJaccardExact Match Our Approach0.810.750.63 HomepageAT0.700.660.58 IntrasiteAT0.430.410.35 IntersiteAT0.360.320.25 HTMLTitle0.370.270.05 KeyPhrases0.250.190.07 HomepageAT is a very competitive baseline IntrasiteAT better than IntersiteAT Our system’s performance approaches inter-judge agreement values

20 Research © 2008 Yahoo! Quicklinks Title Task: Learning Rates Very few datapoints needed –Learning parameters at source level helps

21 Research © 2008 Yahoo! Quicklinks Title Task: Source Weights Having Source weights and normalization helps

22 Research © 2008 Yahoo! Web Page Title Task ApproachF-measureJaccardLCS Our Approach0.530.413.44 HomepageAT0.450.342.7 KeyPhrases0.410.312.54 QueryClick0.310.232.1 IntersiteAT0.290.211.8 BMW0.120.10-- Our approach beats competition –BMW not suited to this task –Often page text doesn’t describe page well HomepageAT surprisingly effective

23 Research © 2008 Yahoo! Conclusions Our approach combines various sources of information to select titles It select titles that respect constraints of length and context We empirically showed the effectiveness of our approach Future Work –Deeper language features in selecting titles –Uniform quicklinks titles across websites –Contexts of different types

24 Research © 2008 Yahoo! Questions Thank you.

25 Research © 2008 Yahoo! Copyright Yahoo! 2008 No publication or distribution allowed without written permission


Download ppt "Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research."

Similar presentations


Ads by Google