Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.

Slides:



Advertisements
Similar presentations
The History of NBA.
Advertisements

NBA ALL STARS By: Bobby Watmuff. LEBRON JAMES LeBron James is a great player. The team he is on right know is the Miami heat. He was MVP for 4 seasons.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010.
LeBron Raymone James was born 30 December He is an American professional basketball player for the Miami Heat of the National Basketball Association.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Search Engines and Information Retrieval
Modern Information Retrieval
Kobe Bryant The Best. He Is The Next Jordan! NBA Scoring Leaders NBA Scoring Leaders Pts Allen Iverson 34.2 Kobe Bryant 32.2 LeBron James 28.9.
Information Retrieval in Practice
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Facts about Michael Jordan Born 17/02/1963 Born in Brooklyn Youngest child of 4 Grew up in Willmington, North Carolina He has 3 children.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Overview of Search Engines
Basketball and the NBA Learn the facts ( 事实 ) about your favorite sport.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
All-Star Hoop Dreams Kobe Bryant Grant Hill Alonzo Mourning Kevin Garnett Shaq O’Neal Yao Ming Gary Paton Dennis Rodman Jason Kidd Robert Horry Lebran.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
By : Eric Muttik. SShaquille O’Neal was born on March 6 th, 1972 in Newark, NJ. HHe didn’t see his dad until he was a teenager. HHe went to military.
Search Engines and Information Retrieval Chapter 1.
By: Jacob Koontz Go To: Table Of Contents Eastern Conference Starters Highlighted. -Western Conference Starters Highlighted.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
NBA Jeopardy Championships MVPS Coaches All Star Stats
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
NBA Eiichiro Ibaraki Yosuke Noda Rui Ohse.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Basketball By Nijhani Hinds. Nickname The nickname for basketball is known as b-ball for short abbreviation.
Searching CiteSeer Metadata Using Nutch Larry Reeve INFO624 – Information Retrieval Dr. Lin – Winter 2005.
All-Star Hoop Dreams Instructions You and your partner will take turns answering the questions. There are 10 questions total, so you will each answer.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Enterprise Track: Thread-based Retrieval Enterprise Track: Thread-based Retrieval Yejun Wu and Douglas W. Oard Goal Explore -- document expansion.
1-7 a Functions. Match the Relations DomainRange LeBron JamesLos Angeles Lakers Kobe BryantOklahoma City Thunder Kevin DurantCleveland Cavaliers Rule:
Favorite NBA Team BY: Humza Ahmad Population My population was the city of Pikeville, KY. My sample: Fifty randomly selected people in Walmart. NOTE:
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Information Retrieval in Practice
NBA Players’ Salaries Roy Philip Elal Index.
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
NBA Players’ Salaries Stevan Dimitrijevic Start.
Walid Magdy Gareth Jones
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
THE NBA BY COLIN SUTTON.
Sahil Waraich Computer Studies A Period #3
Basketball Rahul Dosanjh Computer Studies A Period: 3.
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
All-Star Baloncesto Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to what.
All-Star Hoop Dreams How to Play: Divide class into two teams. Each team chooses a player which selects a question. If the team gets the question correct,
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
Thanks to Bill Arms, Marti Hearst
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
All-Star Hoop Dreams Setup/Preparation: Open PowerPoint template and choose the outline tab at the left hand side. Change the questions and answers to.
Welcome 公共课教学部 张文洁.
BASKETBALL/NBA By: Chelsea.
Presentation transcript:

Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth Jones School of Computing, Dublin City University, Ireland 20 May 2010 LREC 2010

Outline CNGL Objective Data collection preparation and overview IR test collection design Baseline Experiments Summary

CNGL Centre of Next Generation Localisation (CNGL) 4 Universities: DCU, TCD, UCD, and UL Team: 120 PhD students, PostDocs, and PIs Supported by Science Foundation of Ireland (SFI) 9 Industrial Partners: IBM, Microsoft, Symantec, … Objective: Automation of the localisation process Technologies: MT, AH, IR, NLP, Speech, and Dev.

Objective Create a collection of data that is: 1. Suitable for IR tasks 2. Suitable for other research fields (AH, NLP) 3. Large enough to produce conclusive results 4. Associated with defined evaluation strategies Prepare the collection from freely available data YouTube Domain specific (Basketball) Build standard IR test collection (document set + topics set + relevance assessment)

YouTube Videos Features Document Tags -Video URL -Video Title Posting User Posting date Descriptio n Category Number of Views Length Responde d Videos Related Videos Comment s Number of Ratings Number of Favorited

Methodology for Crawling Data 50 NBA related queries used to search YouTube First 700 results per query crawled with related videos Crawled pages parsed and metadata extracted. Extracted data represented in XML format Non-sport category results filtered out Used Queries: NBA - NBA Highlights - NBA All Starts - NBA fights Top ranked 15 NBA players in Jordan + Shaq 29 NBA teams

Data Collection Overview 61,340 Crawled video pages: 61,340 pages 20 Max crawled related/responded video pages: Max crawled comments for a given video page: 500 Comments associated with contributing user’s ID 250k Crawled user profiles ≈ 250k

XML sample

Topics Creation Michael Jordan best dunks Find the best dunks through the career of Michael Jordan in NBA. It can be a collection of dunks in matches, or dunk contest he participated in. A relevant video should contain at least one dunk for Jordan. Videos of dunks for other players are not relevant. And other plays for Jordan other than dunks are not relevant as well 40 topics (queries) created Specific topics related to NBA TREC topic = query (title) + description + narrative

Relevance Assessment 4 indexes created: Title Title +Tags Title + Tags + Description Title + Tags + Description + Related videos titles 5 different retrieval models used 20 different result lists, each contains 60 documents Result lists merged with random ranking 122 to 466 documents assessed per topic 1 to 125 relevant documents per topic (avg. = 23)

Baseline Experiments Search 4 different indexes: Title Title +Tags Title + Tags + Description Title + Tags + Description + Related videos titles Indri retrieval model used to rank results 1000 results retrieved for each search Mean average precision (MAP) used to compare the results

Results

Summary (new language resource) 61,340 XML docs 40 topics + rel. assess. 250,000 User profiles Comments Ratings # Views Metadata IR test set AH/Personalisation Sentiment Analysis Videos Multimedia processing Reranking using ML Tags NER Top bigrams in “Tags” field Kobe Bryant NBA Basketball Lebron James Michael Jordan Los Angeles All Star Chicago Bulls Boston Celtics Allen Iverson Angeles Lakers Slam Dunk Basketball NBA Dwight Howard Vince Carter Dwyane Wade Kevin Garnett Toronto Raptors Houston Rockets Miami Heat O’Neal Phoenix Suns Detroit Pistons Tracy Mcgrady Yao Ming Chris Paul Amazing Highlights New York Pau Gasol Cleveland Cavaliers NBA Amazing Top bigrams in “Tags” field Kobe Bryant NBA Basketball Lebron James Michael Jordan Los Angeles All Star Chicago Bulls Boston Celtics Allen Iverson Angeles Lakers Slam Dunk Basketball NBA Dwight Howard Vince Carter Dwyane Wade Kevin Garnett Toronto Raptors Houston Rockets Miami Heat O’Neal Phoenix Suns Detroit Pistons Tracy Mcgrady Yao Ming Chris Paul Amazing Highlights New York Pau Gasol Cleveland Cavaliers NBA Amazing

Questions & Answers Q: Is this collection available for free? A: No Q: Nothing could be provided? A: Scripts + Topics + Rel. assess. (needs updating) Q: Any other questions? A: …

Thank you

YouTube Statistics (1/8) MinMax 13/09/200503/03/2009

YouTube Statistics (2/8) MinMaxMeanMedianStd Dev

YouTube Statistics (3/8) MinMaxMeanMedianStd Dev 021,710,75735,7073,329221,091

YouTube Statistics (4/8) MinMaxMeanMedianStd Dev 023,

YouTube Statistics (5/8) MinMaxMeanMedianStd Dev 00:00:0002:38:2000:02:5300:02:1000:02:54

YouTube Statistics (6/8) MinMaxMeanMedianStd Dev 027,

YouTube Statistics (7/8) MinMaxMeanMedianStd Dev 05451

YouTube Statistics (8/8) MinMaxMeanMedianStd Dev 072,

YouTube Statistics (9/9) MinMaxMeanMedianStd Dev