Russian Information Retrieval Evaluation Seminar (ROMIP) Igor Nekrestyanov, Pavel Braslavski CLEF 2010.

Slides:



Advertisements
Similar presentations
Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Yansong Feng and Mirella Lapata
Search Results Need to be Diverse Mark Sanderson University of Sheffield.
Detecting Cartoons a Case Study in Automatic Video-Genre Classification Tzvetanka Ianeva Arjen de Vries Hein Röhrig.
ImageCLEF breakout session Please help us to prepare ImageCLEF2010.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Systems & Applications: Introduction Ling 573 NLP Systems and Applications March 29, 2011.
T.Sharon 1 Internet Resources Discovery (IRD) Video IR.
XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Mobile Photos April 17, Auto Extraction of Flickr Tags Unstructured text labels Extract structured knowledge Place and event semantics Scale-structure.
Overview of Search Engines
Cisco Public © 2010 Cisco and/or its affiliates. All rights reserved. 1 Social Media Top Tasks In partnership with customercarewords.com Cisco Global Social.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Information Retrieval in Practice
Presentation By: Brian Mais. What Is It? Content Management Systems(CMS) describes software that manage content, workflow, and collaboration online and.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
TERMS TO KNOW. Programming Language A vocabulary and set of grammatical rules for instructing a computer to perform specific tasks. Each language has.
KNOWLEDGE DATABASE Topics inside  Document sharing  Event marketing  Web content.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Some Personal Observations Donna Harman NIST. Language issues I see learning about accessing information both within and across different languages as.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Tag Data and Personalized Information Retrieval 1.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Similar Document Retrieval and Analysis in Information Retrieval System based on correlation method for full text indexing.
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
ETISEO Evaluation Nice, May th 2005 Evaluation Cycles.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Similarity Access for Networked Media Connectivity Pavel Zezula Masaryk University Brno, Czech Republic.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
General Architecture of Retrieval Systems 1Adrienn Skrop.
TRECVID IES Lab. Intelligent E-commerce Systems Lab. 1 Presented by: Thay Setha 05-Jul-2012.
Introduction to Information Retrieval. What is IR? Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Programming by a Sample: Rapidly Creating Web Applications with d.mix
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Social Knowledge Mining
Thanks to Bill Arms, Marti Hearst
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Measuring Complexity of Web Pages Using Gate
Requirements Management
Information Retrieval and Web Design
Presentation transcript:

Russian Information Retrieval Evaluation Seminar (ROMIP) Igor Nekrestyanov, Pavel Braslavski CLEF 2010

ROMIP at a glance TREC-like Russian initiative Started 2002 Several text and image collections participants per year (total 50+) Academia and industry, students support ~3 000 man-hours of evaluation (2009) Remote participation + live meeting Collections are freely available Popular testbed for IR research in Russia Related activities: summer school in IR ROMIP

Why? Russia specifics Strong IR industry Limited research in academia Participation in global events considered complicated for Russian groups (language barrier, costs, etc.) Russian language was not covered in international campaigns Objectives Consolidate IR community Stimulate research in the area Independent evaluation ROMIP

Evaluation methodology Similar to TREC approaches What’s special? Russian language collections Some tasks are unique  E.g. news clustering, snippet generation, etc. Mix of widely used and custom metrics  E.g. snippet informativeness/readability Typically 2+ assessors (agreement 80-85%) Domain experts for legal-related tracks Rules and methodology are adjusted yearly ROMIP

Largest text collections CollectionDocuments Size (compressed) Topics Evaluated within ad-hoc search track Legal ~ Gb By.Web Gb~ KM.RU Gb~ ~ ROMIP

Text documents tracks Classic tracks run for years Ad-hoc text retrieval Text categorization (Web pages & sites, legal) Experimental tracks every year Snippet generation QA and fact extraction News clustering Search by sample document ROMIP

Snippets evaluation ROMIP

Image collections Photo collection: images from Flickr Dups collection: 15 hrs video  frames ROMIP

Image tracks Content based image retrieval (started 2008)  750 tasks labeled Near-duplicate detection (started 2008)  ~1500 clusters Image annotation (started 2010)  ~ 1000 labeled images ROMIP

ROMIP timeline search classification legal news snippets news ROMIP legal 2007 BY.Web KM.RU image tracks 3000 man- hours eval. QA image tagging ROMIP

Thank you! Questions? Pavel Braslavski Igor Nekrestyanov ROMIP

RuSSIR Put RuSSIR pic here Annual event 100+ participants 4 th RuSSIR: Voronezh September ROMIP