Download presentation
Presentation is loading. Please wait.
1
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo
2
Contents Introduction Practice in legal retrieval Generation of Background concepts Combining concepts and contexts Conclusion
3
Introduction Why needs advanced legal retrieval, e- discovery? Document Collections Legal Requirements Efficiency
4
Introduction What challenges? Explosive growth of document size Extensive document source Expanding document format collection Informal language
5
Introduction Opportunities: Background contexts utilization Search documents deeply for every possible evidence Examples – TREC: complaint as background information More context information: Web and the links
6
Practice in Retrieval Process TREC legal track practice: Defendants devise queries Plaintiffs’ turns Final queries for production request Document Retrieved
7
Practice in Retrieval Process What can be added to the process? Exploit the background information – complaints Merge with the larger background – Web and links Proposal in this work – Use Wikipedia as an example
8
Modeling
9
Generation of Background concepts Representation of Background concepts: Entities & Relations Ease the conversion from texts to concepts Facilitate unsupervised operations
10
Generation of Background concepts Concepts sources – Wikipedia Page: a document Title: central concept described by a document Links: A set of concepts / terms to other pages Word: Set of words
11
Generation of Background concepts Facilitate lexical realization from texts to concepts: Surface concepts: Mentioned by a page Hidden concepts: Indexed by no pages but exist in pages
12
Generation of Background concepts Entities: Basic objects – named entities, locations, organizations …. Definitions: e ⊂c, e≠r, e∈role of relations
13
Generation of Background concepts Relations: Relationships between concept r⊂c, r≠e, r=, role i = e
14
Semantical Domain Semantical Domain: Group of inter-related concepts, as defined by Wikipedians Groups can be configured, reconfigured, depending on the size, nature of domains Represent background information of different size, nature, structures
15
Semantical Domain Operations: D = {page i } where page i ∈ E Overlap Subsumed Join
16
Knowledge Extraction, Parsing Parsing: Conversion of syntactic parse into concepts representations Dependency parsing Fill the entities and relations automatically
17
Entities & Relations Highlights of the process: Syntactic parsing of sentences Conversion from linguistic representation to concepts representation Constraint the concept spaces by different sizes and scopes
18
Combining the concepts and background contexts Algorithms: Filter the background text and request text Match the term set into Wikipedia Build the network of concepts and relations Combine for single network and filter unnecessary concepts Extract terms and concepts and expand the query string Fire the query to retrieval
19
Conclusion
20
Challenges in legal retrieval Background contexts Generation of background concepts Project the context to concepts Expand the queries for retrieval
21
Conclusion Current work: Integration of language learning (not only parsing) and concepts generation process Large scale construction of networks with full document set in 3 languages on Grid: English: 1.7 million Spanish: 300 thousand Chinese: 200 thousand
22
Conclusion Current work: Experiments running on 20M web pages corpus for expanded links Generated Language, Concept spaces used in other Natural Language Technologies (NLT) TREC-Legal: Testing the integration of knowledge base with the complaint text for queries TREC-Legal: Building new matching mechanism (from KB induction) on small, concise set of documents
23
Thank you QA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.