Download presentation
Presentation is loading. Please wait.
Published byReynold Walsh Modified over 9 years ago
1
Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University 11/06/2007
2
Agenda Introduction Main task – ad hoc search Routing task – relevance feedback
3
What is legal search Goal: retrieve all documents for production requests. Production request: describes a set of documents that the plaintiff forces the defendant to produce. Recall-oriented: high risk (value) of missing (finding) important documents. Sample request text: All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated. AND OR W/5 guide strategy approval family “G rated” movie film Final query
4
Data set 7 million business records from tobacco companies and research institutes. Metadata: title, author, organizations, etc. OCR text: contain errors 50 topics generated from four hypothetical complaints created by lawyers
5
Main task – Ad hoc search Indri query formulation Without boolean constraint #combine(ranking function) With boolean constraints #filreq( #band(boolean constraint) #combine(ranking function) )
6
Boolean constraint Translate the Final Query Original expressionIndri operator x AND y#uw(x y) x OR y#syn(x y) x BUT NOT y#filrej(y x) Phrase: “x y”#1(x y) Proximity: (x W/k y)#uw(k+2)(x y) AND OR W/5 guide strategy approval family “G rated” movie film
7
Ranking functions Bag of words (guide strategy approval family G rated movie film) Respect phrase operators (guide strategy approval family #1(G rated) movie film) Group synonyms together (#syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film)) AND OR W/5 guide strategy approval family “G rated” movie film
8
Experiments and findings Boolean constraints improve recall and precision Structured queries outperform bag-of-words ones * B is the number of documents matching the Final Query. Its average value is 5000.
9
Per topic performance (Difference to the median of 29 manual runs) est_RB est_PB
10
Routing task of Legal track 2007 Structured queries are known to be hard to construct. Not, with supervision Questions Weighted query help? Metadata&Annotations help? A definitive answer from Supervised Structured Query Construction
11
Structured query #weight( w1 t1 w2 t2 … wn tn) 0.00851 trademark.sentence0.00846 trademark 0.00665 gmp.product0.00653 basement.product 0.00625 steenland0.00606 steenland.sentence 0.00602 gouda.sentence0.00600 gouda 0.00587 steenland.organization0.00561 toi 0.00550 toi.sentence0.00544 lett.product 0.00486 chocol.ti0.00479 legal.sentence 0.00474 children.per_desc0.00467 legal.s 0.00459 legal0.00453 legal.organization 0.00435 kid.sentence0.00433 kid
12
Supervised Structured Query Construction Relevance feedback => supervised learning Train linear SVM with keyword, keyword.field feature SVM classifier f i : training weights for terms, choose to be tfidf/LM scores Retrieval: #weight( w1 t1 w2 t2 … ) f i : tfidf/LM scores for terms Advantages Given enough training, know for sure whether one type of feature helps
13
Example Query 13 All documents to or from employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes. (cand! OR chocolate) w/10 cigarette!
14
Annotations Feedback query: NE: bush.person sentence: violate.sent meta: television.title 0.00851 trademark.sentence0.00846 trademark 0.00665 gmp.product0.00653 basement.product 0.00625 steenland0.00606 steenland.sentence 0.00602 gouda.sentence0.00600 gouda 0.00587 steenland.organization0.00561 toi 0.00550 toi.sentence0.00544 lett.product 0.00486 chocol.ti0.00479 legal.sentence 0.00474 children.per_desc0.00467 legal.s 0.00459 legal0.00453 legal.organization 0.00435 kid.sentence0.00433 kid
15
Performance On 39 topics of Legal 2006 (2/3 of judged documents for training, the rest for testing) On 10 topics of Legal 2007 routing task
16
Routing Conclusions A principled way of constructing structured queries Annotations Query term weights Answers from a supervised learning algorithm Weights helps, annotations less.
17
Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.