Japanese MLR
International/JP MLR Issues Have to do more with less data Blending different languages? Can’t necessarily filter adult May need new/different features Different types of queries English/Bracket/Phrase/etc Metrics designed for English China has lots more spam Japan has much less spam Germany looks 10-20% ahead of Google by DCG
JP MLR vs. English MLR Kanji/ Hiragana Katakana Latin (Romaji) Baseline 7.2 7.6 9.2 JP MLR +4% +2% +1% EN MLR 0% +3% Google +6% Examples 277 231 96
Different features important for JP http://internal.inktomi.com/~lukeb/FeatureImportance.html “Linkflux” How soon the word appears in the document Is the first word in query in the title
New features for JP Query Word Length very important Query type important Phonetic url match Future: vcano match Matching segmented chunks