Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

Slides:



Advertisements
Similar presentations
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Advertisements

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Chapter 6 Methodology Conceptual Databases Design Transparencies © Pearson Education Limited 1995, 2005.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Methodology Conceptual Database Design
Web Service Architecture Part I- Overview and Models (based on W3C Working Group Note Frank.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Survey of Semantic Annotation Platforms
Methodology - Conceptual Database Design Transparencies
Methodology Conceptual Databases Design
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Methodology: Conceptual Databases Design
Methodology - Conceptual Database Design
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Data mining in web applications
Methodology Conceptual Databases Design
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Methodology Conceptual Database Design
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Social Knowledge Mining
Presentation 王睿.
How to publish in a format that enhances literature-based discovery?
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Methodology Conceptual Databases Design
Presentation transcript:

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo

Contents Introduction Practice in legal retrieval Generation of Background concepts Combining concepts and contexts Conclusion

Introduction Why needs advanced legal retrieval, e- discovery? Document Collections Legal Requirements Efficiency

Introduction What challenges? Explosive growth of document size Extensive document source Expanding document format collection Informal language

Introduction Opportunities: Background contexts utilization Search documents deeply for every possible evidence Examples – TREC: complaint as background information More context information: Web and the links

Practice in Retrieval Process TREC legal track practice: Defendants devise queries Plaintiffs’ turns Final queries for production request Document Retrieved

Practice in Retrieval Process What can be added to the process? Exploit the background information – complaints Merge with the larger background – Web and links Proposal in this work – Use Wikipedia as an example

Modeling

Generation of Background concepts Representation of Background concepts: Entities & Relations Ease the conversion from texts to concepts Facilitate unsupervised operations

Generation of Background concepts Concepts sources – Wikipedia Page: a document  Title: central concept described by a document  Links: A set of concepts / terms to other pages Word: Set of words

Generation of Background concepts Facilitate lexical realization from texts to concepts: Surface concepts: Mentioned by a page Hidden concepts: Indexed by no pages but exist in pages

Generation of Background concepts Entities: Basic objects – named entities, locations, organizations …. Definitions: e ⊂c, e≠r, e∈role of relations

Generation of Background concepts Relations: Relationships between concept r⊂c, r≠e, r=, role i = e

Semantical Domain Semantical Domain: Group of inter-related concepts, as defined by Wikipedians Groups can be configured, reconfigured, depending on the size, nature of domains Represent background information of different size, nature, structures

Semantical Domain Operations: D = {page i } where page i ∈ E Overlap Subsumed Join

Knowledge Extraction, Parsing Parsing: Conversion of syntactic parse into concepts representations Dependency parsing Fill the entities and relations automatically

Entities & Relations Highlights of the process: Syntactic parsing of sentences Conversion from linguistic representation to concepts representation Constraint the concept spaces by different sizes and scopes

Combining the concepts and background contexts Algorithms: Filter the background text and request text Match the term set into Wikipedia Build the network of concepts and relations Combine for single network and filter unnecessary concepts Extract terms and concepts and expand the query string Fire the query to retrieval

Conclusion

Challenges in legal retrieval Background contexts Generation of background concepts Project the context to concepts Expand the queries for retrieval

Conclusion Current work: Integration of language learning (not only parsing) and concepts generation process Large scale construction of networks with full document set in 3 languages on Grid: English: 1.7 million Spanish: 300 thousand Chinese: 200 thousand

Conclusion Current work: Experiments running on 20M web pages corpus for expanded links Generated Language, Concept spaces used in other Natural Language Technologies (NLT) TREC-Legal: Testing the integration of knowledge base with the complaint text for queries TREC-Legal: Building new matching mechanism (from KB induction) on small, concise set of documents

Thank you QA