©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade www.H5technologies.com.

Slides:



Advertisements
Similar presentations
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
I can create a power point presentation to show my understanding of the scientific process by preparing all the parts of it in a power point presentation.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Evaluating Evaluation Measure Stability Authors: Chris Buckley, Ellen M. Voorhees Presenters: Burcu Dal, Esra Akbaş.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Modern Information Retrieval
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Evaluation of Retrieval Effectiveness 2.
INFO 624 Week 3 Retrieval System Evaluation
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Evaluation of Retrieval Effectiveness 2.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Advance Information Retrieval Topics Hassan Bashiri.
Lessons Learned from Information Retrieval Chris Buckley Sabir Research
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 Discussion Class 5 TREC. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
Chapter 5: Information Retrieval and Web Search
Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Chapter 6: Information Retrieval and Web Search
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Assessing The Retrieval A.I Lab 박동훈. Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Conducting Modern Investigations Analytics & Predictive Coding for Investigations & Regulatory Matters.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 CS 430: Information Discovery Lecture 8 Evaluation of Retrieval Effectiveness II.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Evaluation of Retrieval Effectiveness 2.
Information Retrieval Quality of a Search Engine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
Information Retrieval in Practice
Information Retrieval (in Practice)
Multimedia Information Retrieval
Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Cumulated Gain-Based Evaluation of IR Techniques
Retrieval Evaluation - Reference Collections
Standards of Measurement
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Presentation transcript:

©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade & Mitchell P. Marcus June 7, 2007 STIR:

©2007 H5 Slide of 9 The e-Discovery IDEAL: High P with High R Find every relevant document & only those docs that are relevant Desired P=0.8 (or R=0.8 (or better) Acceptable P= 2 / 3 (or R= 2 / 3 (or better) 1

©2007 H5 Slide of 9 The e-Discovery REALITY High P & Low R = RISK (important docs not retrieved) Low P & High R = COST (many more documents must be reviewed) TextREtrivalConference 1

©2007 H5 Slide of 9 Agenda Results –TREC ad hoc (= typical) –Queries typifying Communities of Practice (CoPs) e-Discovery Approaches –5 Dimensions –Linguistics of CoPs Research Issues –TREC –AI –Linguists –Lawyers 2

©2007 H5 Slide of 9 Typical Results – ad hoc queries (from Chapter 3, “Retrieval System Evaluation” by Chris Buckley and Ellen M. Voorhees, in TREC: Experiment and Evaluation in Information Retrieval, Voorhees & Harman, ed., MIT Press, 2005, p62, Fig. 3.1) TREC: Experiment and Evaluation in Information Retrieval 22 Topics Average Desired is Rare Acceptable < 10% 3

©2007 H5 Slide of 9 compared with STIR topical avg in 4 cases (I-IV) encompassing 42 topics Accuracy Metrics Most accurate TREC results for 20 of 22 topics in one test case Ideal TREC avg Acceptable F 1 = 2. (P. R)/(P+R) I II III IV 4

©2007 H5 Slide of 9 Recall Precision Average P & R for each case STIR compared with TREC IR Topical P & R results for one TREC and 4 STIR cases STIR TREC 5

©2007 H5 Slide of 9 Recall Improvement Sampled Corpus Tests for 12 Topics in case I during STIR Training Recall Precision ● STIR training provides substantial Recall improvement with acceptable Precision reduction 5 Retrieval Acceptable to lowest limit of statistical uncertainty

©2007 H5 Slide of 9 Agenda Results –TREC ad hoc (= typical) –Queries typifying Communities of Practice (CoPs) e-Discovery Approaches –5 Dimensions –Linguistics of CoPs Research Issues –TREC –AI –Linguists –Lawyers 6

©2007 H5 Slide of 9 Dimensions of e-Discovery Subject Matter Legal Case Linguistics Documents Community 7

©2007 H5 Slide of 9 Dimensions of e-Discovery: Document Review Legal Case Documents Example Systems: Manual (human) review conducted by attorneys Basic keyword searches targeted to legal issues Supervised learning with relevance feedback 7

©2007 H5 Slide of 9 Dimensions of e-Discovery: Expert Search Subject Matter Legal Case Documents Example Systems: Subject matter experts review results under legal team direction ● Domain- specific lexicons used 7

©2007 H5 Slide of 9 Dimensions of e-Discovery: Model Meaning Subject Matter Legal Case Linguistics Documents Example Systems: Supervised learning with –relevance feedback –semantic analysis ● Semantic search 7

©2007 H5 Slide of 9 Dimensions of e-Discovery: Model Communities Subject Matter Legal Case Linguistics Documents Community Example System: ● Socio- Technical-IR 7

©2007 H5 Slide of 9 Dimensions of e-Discovery: Socio-Technical-IR LinguisticsCommunity Non- computational Linguistic Disciplines –Pragmatics –Socio- Linguistics –Ethno- Methodology –Discourse Analysis A community of practice is –a diverse group of people –engaged in real work –over a significant period of time –developing their own tools, language, and processes –during which they build things, solve problems, learn and invent –evolving a practice that is highly skilled and highly creative 7

©2007 H5 Slide of 9 Agenda Results –TREC ad hoc (= typical) –Queries typifying Communities of Practice (CoPs) e-Discovery Approaches –5 Dimensions –Linguistics of CoPs Research Issues –TREC –AI –Linguists –Lawyers 8

©2007 H5 Slide of 9 Research Issues TREC –Nature of the relatively rare high P with high R queries –Measuring both recall and precision effectively AI –Knowledge-Based (Expert) Systems that codify linguistic expertise –Characterize practice communities of subject matter experts –Investigate combination systems applied to different types of topics Linguists –Identify and characterize different types of topics and map to system types –Language patterns in communities as well as subject matter fields –Defining categories in concrete terms Lawyers –Defining categories in concrete terms –Integration of technology and processes 9

©2007 H5 Slide of 9 Back-Up

©2007 H5 Slide of 9 STIR Analysis: CoPs’ Enunciatory language Relevant Document Text State of Affairs Object Process Action Fact Event