Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi.

Slides:



Advertisements
Similar presentations
Application of Ensemble Models in Web Ranking
Advertisements

Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
1 Mono & Cross Language Experiments on Persian Text Abolfazl AleAhmad, Hadi Amiri, Farhad Oroumchian Database Research Group School of Electrical and Computer.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information.
Search Engines and Information Retrieval
Measuring Monolinguality Chris Biemann NLP Department, University of Leipzig LREC-06 Workshop on Quality Assurance and Quality Measurement for Language.
11 September 2002IR/LM workshop, Amherst1 Information retrieval, language and ‘language models’ Stephen Robertson Microsoft Research Cambridge and City.
Cross Language IR Philip Resnik Salim Roukos Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12,
Cross-Lingual IR Salim Roukos IBM T. J. Watson Research Center 9/11/02.
XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.
Information Retrieval in Practice
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Advance Information Retrieval Topics Hassan Bashiri.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
JCN, Justice Cooperation Network European Treatment and Transition Management of High Risk Offenders Dublin, 12 June 2013 Steering Committee Meeting.
Russian Information Retrieval Evaluation Seminar (ROMIP) Igor Nekrestyanov, Pavel Braslavski CLEF 2010.
University of Tehran FuFaIR: a Fuzzy Farsi Information Retrieval System Amir Nayyeri School of Electrical and Computer Engineering University of Tehran.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Search Engines and Information Retrieval Chapter 1.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
ISSPA January 1 N -Gram and Local Context Analysis for Persian text retrieval Tehran University Abolfazl AleAhmad, Parsia Hakimian, Farzad Mahdikhani.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
APA Style Bibliographies. Newspaper Article from Internet Markoff, J. (1996, June 5). Voluntary rules proposed to help insure privacy for Internet users.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
 CIKM  Implementation of Smoothing techniques on the GPU  Re running experiments using the wt2g collection  The Future.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan ProjArabic Team Ahmed.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Chapter 17.1 Civic Participation. A Tool for Political Education and Action ► The Internet is a mass communication system of millions of networked computers.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Stiftung Wissenschaft und Politik German Institute for International and Security Affairs CLEF 2005: Domain-Specific Track Overview Michael Kluck SWP,
CAASL July Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Automatic Labeling of Multinomial Topic Models
Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones School of Computer Science Carnegie Mellon University.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Improving the Classification of Unknown Documents by Concept Graph Morteza Mohagheghi Reza Soltanpour
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
Information Retrieval in Practice
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
Measuring Monolinguality
Thanks to Bill Arms, Marti Hearst
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR
Introduction to Search Engines
Presentation transcript:

Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi Amiri, Azadeh Shakery, Farhad Oroumchian

1 Oct 2009 Current and Future Research Directions Why Persian IR Language Resources for Persian Hamshahri at CLEF 2009 participants results pool analysis Future works Outline 2

1 Oct 2009 Persian in the Middle East 3 Source: Internet World Stats, User Population Growth on the Web ( ) Current and Future Research Directions

1 Oct 2009 Current and Future Research Directions Why Persian IR Updated in June 2009 from Internet World Stats 4

1 Oct 2009 A branch of Indo-European Languages Official Language of Iran, Afghanistan and Tajikistan Its morphological analysis is Comparably difficult The word “خبر” has two plural forms: Persian rules: “خبرها” Arabic rules: “اخبار” Writing Style Issues: e.g. ”می شود“ and “میشود” are the same e.g. ”کتابها“ and ”کتاب ها“ are the same 5 Current and Future Research Directions The Persian Language

1 Oct 2009 Persian Test Collections Text IR Domain Ghavanin (domain specific) Hamshahri (news): Hamshahri 2 (recently developed 50 topics) Web IR Domain FWT1m (.ir Web) nearly 1Million docs NLP Domain Bijankhan (2.7 Million Words): 6 Current and Future Research Directions

1 Oct 2009 Hamshahri at CLEF 2008 & News articles of Hamshahri newspaper from year 1996 to bilingual topics 166,000+ documents Current and Future Research Directions Hamshahri 2 News articles of Hamshahri newspaper from year 1996 to bilingual topics 320,000 documents (2times larger ~ 1.5GB) Richer document tags

1 Oct Participants Current and Future Research Directions 1.JHU-APL N-gram tokenization (skip n-grams for n=5) 2.Unine Developed “light” and “plural” stemmers and blind query expansion 3.Open Text Savoy’s Stemmer and 4-grams Pool analysis (with top 10,000 retrieved docs) 4.Quazvin IAU Perstem for monolingual runs (Prec +91%, Rec +43%) “Query Wikification” Algorithm for bilingual runs

1 Oct Final Results Current and Future Research Directions

1 Oct Final Results Current and Future Research Directions

1 Oct Pool of CLEF 2008 Current and Future Research Directions

1 Oct Pool of CLEF 2009 Current and Future Research Directions

1 Oct Pool Comparison Current and Future Research Directions Quoted from: Stephen Tomlinson. German, French, English and Persian Retrieval Experiments at CLEF 2008 & Working Notes for the CLEF 2008 & 2009 Workshops.

1 Oct Pool Comparison Current and Future Research Directions Quoted from: Stephen Tomlinson. German, French, English and Persian Retrieval Experiments at CLEF 2008 & Working Notes for the CLEF 2008 & 2009 Workshops

1 Oct 2009 Current and Future Research Directions Using Hamshahri 2 for CLEF 2010 (50 training topics) A campaign on the Persian WebIR collection Creation of an English-Persian parallel corpora Creation of a comparable corpora A stemmer for the Persian language Future Works 15

1 Oct 2009 Thanks ? 16 Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions

1 Oct Current and Future Research Directions