Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL /7/91Rick Liu
2008/7/92Rick Liu
Question Search Help users to search previous answers 2008/7/93Rick Liu Any nice hotels in Berlin or Hamburg? How long does it take to Hamburg from Berlin? Cheap hotels in Berlin?
2008/7/94Rick Liu
Identifying question topic & focus Question tree Determining the tree cut Modeling question topic & focus for search Language model 2008/7/9Rick Liu5
Topic terms BaseNP, WH-ngram Topic profile probability distribution of categories Specificity inverse of the entropy of the topic profile Topic chain topic terms ordered by specificity value (desc) Topic tree 2008/7/9Rick Liu6
2008/7/9Rick Liu7
M = ( Γ, θ ) Γ = [ C1, C2,.. Ck ], tree cut Θ = [ P(C1), P(C2),.. P(Ck) ], prob param vector A cut is any set of nodes Σ i=1..k P( Ci ) = /7/9Rick Liu8
2008/7/9Rick Liu9 [n 0, n 11 ], [n 12, n 21, n 22, n 23 ], [n 13, n 24 ] [n 11, n 21, n 22, n 23, n 24 ]
2008/7/9Rick Liu10 Minimum Description Length Ref : Li and Abe, 1998
2008/7/9Rick Liu11
P( q | q ) q : queried question q : targeted question 2008/7/9Rick Liu12 ~ ~
Yahoo! Answers Resolved questions travel : 314,616 items computers & internet : 210,785 items Tree fields title ( only used ) description answers 2008/7/9Rick Liu13
Employed Vector Space Model Manual judgments : relevant / irrelevant Baseline : VSM, LMIR Evaluation : MAP, R-precision, MRR 2008/7/9Rick Liu14
2008/7/9Rick Liu15
2008/7/9Rick Liu16
2008/7/9Rick Liu17
Examine the correctness of question topics and question foci 200 queried question => 69 question incorrect (a) Only have the head part ( 59 ) (b) Incorrect order ( 10 ) (a) explains why λ is /7/9Rick Liu18
FAQ data Community based Jeon et al., 2005 Compared four different retrieval methods ▪ Vector space model ▪ Okapi ▪ Language model ▪ Translation-based model Translation-based model performed the best 2008/7/9Rick Liu19
Lexical chasm Where to stay in Hamburg? The best hotel in Hamburg? IBM model 1 Use question titles and question description as the parallel corpus 2008/7/9Rick Liu20
2008/7/9Rick Liu21
1) Data Structure 2) Use MDL-based Tree Cut Model to Identify 3) A new form of language modeling for question search 4) Extensive experiments 2008/7/9Rick Liu22 Now only community-based From forum sites / FAQ sites
2008/7/9Rick Liu23
2008/7/9Rick Liu24
2008/7/9Rick Liu25