Information Retrieval and Extraction 2010 Term Project – Modern Web Search Advisor: 陳信希 TA: 許名宏 & 王界人.

Slides:

Advertisements

Similar presentations

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Advertisements

計算機概論 ( 上機實習部分 ) 每週上課講義授課教師 : 吳槐桂最後更新 : 2004/10/21.

Understanding Target Audience Behavior 瞭解顧客之行為將公司標幟插入此投影片選取〔插入〕功能表中的〔圖片〕選項選取〔從檔案〕指令選取該圖片檔案按下〔確定〕按鈕調整商標的大小於商標圖示內按一下﹐此時商標圖示外的白色小方塊即為可調整大小的圖框。

建立使用案例敘述 --Use Case Narrative

EBI European Bioinformatics Institute. EBI The European Bioinformatics Institute (EBI) part of EMBL is a centre for research and services in bioinformatics.

指導教授：應鳴雄老師組長： B 鄧光宏組員： B 莊禮仲 B 陳品諺 B 林于迪古弗瑞德交友網站系統中華大學資訊管理學系九十九學年專題報告 1.

高大網頁競賽評鑑量表改良 Yu-Hui Tao 2011/4/27. 目地是甚麼 ? 1. 世界網路大學排名 2. 校務評鑑.

工程流程類流程領域(PA).

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 壹企業研究導論.

Word 進階 Speaker : Kuo Tung Yang Date: 2008/12/10.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆資料分析與表達.

1 真理大學運輸管理學系實務實習說明目錄  實務實習類別  實務實習條例  校外實習單位  實務實習成績計算方式  校外實習甄選 / 自洽申請流程  附錄：相關表格.

學校社會工作計劃執行與評估學校社工服務. 社工服務之設計社工服務之傳輸學校社工服務責信與評量.

By : Angela Lo. 成功的生涯規劃  1. 自我探索 : 知己知彼百戰百勝  2. 職業探索  3. 抉擇決定生涯  4. 連結網絡 ;  5. 工作世界探索 : 工作提供 or 成功的典範  6. 生涯規劃.

 Prentice Hall Chapter 111 創造與維持組織的文化.  Prentice Hall Chapter 112 學習目標定義組織文化描述組織文化的主要特質定義強勢文化的品質要素解釋組織文化的來源.

高效率太陽能車指導教授 : 蔡志成, 王國禎組員 : 張友倫 ( ) 溫承豫 ( ) 溫承豫 ( ) 李志健 ( ) 李志健 ( ) 第十三週 (2005/5/18)

各種線上電子資源的特異功能 STICnet 的 SDI 專題訂閱服務 2003/4/28 修改. 無論校內外皆可使用。連線至

請問 : 科技融入教學再你的心目中只是一個不同於其他教學法的選擇 (optional choice) ? 或是一個必要的需要 (demanding needs)?

文獻探討 Literature Review. 文者典籍也，獻者賢也朱熹功能與目的決定研究題目與問題選取理想模式與先前結果做比較及應証避免重複他人研究.

專題製作（一）邱明星老師遠東科技大學資訊管理系. 專題製作的緣起 Learning how to learn. Know more. Learning space.

資料處理汪群超 2 這一年將學習到什麼？網際網路：你在哪裡？瞭解你的角色、駕馭網路。 Web 、 FTP 、、 Proxy 、 Database Servers 記錄你的學習歷程、展現學習成果： Homepage 、 PowerPoint.

Fundamentals of Management ESSENTIAL CONCEPTS AND APPLICATIONS -- Stephen P. Robbins & David A. Decenzo 主講人：張德儀銘傳大學管理學院

Modern Information Retrieval 第三組陳國富王俊傑夏希璿.

智慧藏科技知識庫使用說明呂明欣國立政治大學資訊科學系機器智能實驗室語言教學研究中心 2006/12/5.

FGU LDT. FGU EIS 96 ‧ 8 ‧ 25 FGU LDT 佛光大學學習與數位科技學系.

職場危害與管理 (VIII) 目標標的與管理計畫張書奇 Shu-Chi Chang, Ph.D., P.E., P.A. Assistant Professor 1 and Division Chief 2 1 Department of Environmental Engineering 2 Division.

電子計算機概論電子計算機概論教科書計算機概論 Introduction to Computers 原著： Peter Norton 審閱：陳正雄‧趙立本‧簡文山‧林碧蘭編譯：普羅數位科技總審閱：林志敏 NT 590 洽助教.

各種線上電子資源的特異功能 SwetsWise 的 alert, TOC alert 與 Favorites 2003/4/28 修改.

生物統計學期中報告組員 : 醫放一 A 王小明醫放一 A 王小明醫放一 A 王大明醫放一 A 王大明 2009/04/14.

North Point Government Primary PM School 北角官立下午小學應用 ‘ 基本能力學生評估 ’ 及 ‘ 網上學與教支援系統 ’ 經驗分享.

生產系統導論生產系統簡介績效衡量現代工廠之特徵管理機能.

教材名稱：網際網路安全之技術及其應用（編號： 41 ）計畫主持人：胡毓忠副教授聯絡電話：教材網址：執行單位：政治大學資訊科學系.

專題報告－ R M I 組員：陳佳宜林宜謙鄭雲玲.

1 高等演算法授課老師 : 陳建源研究室 : 法 401 網站

: Crazy King ★★★☆☆ 題組： Contest Archive with Online Judge 題號： 11352: Crazy King 解題者：李重儀解題日期： 2008 年 12 月 8 日題意：簡單的說，給你一個 M × N 的西洋棋盤，上有兩點 A 、 B.

: Problem G e-Coins ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 10306: Problem G e-Coins 解題者：陳瀅文解題日期： 2006 年 5 月 2 日題意：給定一個正整數 S (0

Management Abstracts Retrieval System; MARS 檢索操作.

Section 4.2 Probability Models 機率模式. 由實驗看機率實驗前先列出所有可能的實驗結果。 – 擲銅板：正面或反面。 – 擲骰子： 1~6 點。 – 擲骰子兩顆： (1,1),(1,2),(1,3),… 等 36 種。決定每一個可能的實驗結果發生機率。 – 實驗後所有的實驗結果整理得到。

Analyzing Case Study Evidence

儀器設備中心報告人：陳淑慧主任化學系教授兼任 97 年 6 月 19 日. 中心架構儀器設備中心任務對全校師生提供在專業諮詢、教學與研究等活動上所需之高技能服務架設和經營成功大學共用設備網站，提供所有全校共用儀器設備的資料瀏覽和直接用戶導引連結與預約系統儀器設備中心任務.

T H O M S O N S C I E N T I F I C ISI Web of Knowledge 新功能與提升 2005 年第 3 季.

Open Services Gateway Initiative (OSGi) Service Platform 1 OSGi Architecture.

憂鬱小王子抗逆之旅 Class One - 關於憂鬱小王子抗逆之旅. 目的：向同學介紹「憂鬱小王子抗逆之旅」這個課程目標：認識「憂鬱小王子抗逆之旅」課程的作用學習參與這個課程應有的行為.

第九章了解工作團隊現代管理學林建煌編譯.

Knowledge Management 人力資源的策略管理指導教授：李富民博士報告人：陳君豪使用時間： 10 分鐘.

Biological Science Database 個人化服務設定步驟. Biological Science Database 僅提供專題選粹服務專題選粹 (Alerts) ：查詢後，提供儲存檢索策略的功能，日後每週將符合條件的更新資料，採方式通知。每筆設定最多每週可收到.

SQL 進階查詢.

期末報告扭力扳手指導老師：陳定宇教授指導老師：陳定宇教授組員姓名：陳金成組員姓名：陳金成陳聰文陳聰文廖則剛廖則剛.

INFORMATION RETRIEVAL AND EXTRACTION 作業： Program 1 第十四組組員：林永峰、洪承雄、謝宗憲.

 Prentice Hal Chapter 151 新興的領導議題.  Prentice Hal Chapter 152 學習目標比較交換型與轉換型領導定義魅力領導者的特質確認有願景領導者所展現出來的技能解釋構思如何影響領導描述追隨者如何依賴領導者.

Knowledge Management System -for Agenda 顏亦笛陳靖宜指噵教授 : 劉寶均教授指噵公司 : 安捷達.

: Place the Guards ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 11080: Place the Guards 解題者：陳盈村解題日期： 2008 年 3 月 26 日題意：有一個國王希望在他的城市裡佈置守衛，

UPC 分析幫助分析者就 UPC 的技術分類定義，找出技術的戰火區或利基區，作為技術研發方向的重要參考指標。提供分析者對專案內重要技術研發情況，利用時間點來觀測整體技術的趨勢，充分掌握技術資訊。了解各競爭國家間主要發展的技術領域差異性，以及各國主要研發的重點方向。了解各競爭公司間主要發展的技術領域差異性，以及.

學期報告 ( 上台 ) 須知吳槐桂. 2 關於分組預定日期 : 預定學期結束前第四週開始每組人數 : 最多 4 人報告時間 : 15~25 分鐘報告前必須給我相關資料以利評分 (see p.5) 可印講義給同學 (optional) 分組名單請於 2008/03/20 前交給服務股長, 服務股長整理後.

全國奈米科技人才培育推動計畫辦公室中北區奈米科技Ｋ -12 教育發展中心計畫簡報報告人：楊鏡堂教授計畫執行單位：國立清華大學動力機械工程學系計畫種子學校：教育部顧問室 94 年度奈米科技人才培育先導型計畫年度成果報告中華民國九十四年十月十四日.

宏碁未來十年如何走出成功的經營方向一、聯網組織之分析. 組織的演進經濟發展產業方式組織型態工業資訊知識垂直整合分工整合超分工整合層級式扁平式網路式.

「面對惡靈」紀錄片觀賞與討論與談人：尤素芬義守大學醫務管理系助理教授

教學原理什麼是教學 ? 認識各種教學理論的基礎各種的教學法如何從教學中成長. 什麼是教學是誰 : 老師學生是什麼 : 教學教與學的互動 ( 有計畫有組織的指導學習 ) 怎麼做 : 各種方法 I T S.

IEEE Computer Society 長亨文化事業有限公司. 大綱學會背景內容查詢功能.

第12章團體溝通情境中的領導者.

田野實察：參訪南洋台灣姐妹會授課教師：林津如第十一堂課 96 年度教育部補助推動新移民之原生社會文化、公民與人權及健康醫療教學發展計畫高雄醫學大學性別研究所林津如教授及陳麒文執行.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Information Retrieval and Extration 期末專題實驗 — Relevant Sentence Detection.

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

資料庫檢索簡介及操作北醫附醫實證醫學中心蔡龍文

WBIA Project 2 – Retrieval & Evaluation LI Geng Nov.10, 2008.

Homework Implementing the spread-spectrum watermarking system.

Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰＆許名宏.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

義守大學資訊工程學系作者：郭東黌, 張佑康報告人：徐碩利 Date: 2006/11/01

Presentation transcript:

Information Retrieval and Extraction 2010 Term Project – Modern Web Search Advisor: 陳信希 TA: 許名宏 & 王界人

Overview (in English) Goal Goal –Using advanced approaches to enhance Okapi-BM25 Group Group –1~3 person(s) per group; the name list to the TA Approach Approach –No limitations; Any resources on the Web is usable. Date of system demo and report submission Date of system demo and report submission –6/24 Thursday (provisional) Grading criteria Grading criteria –Originality and reasonableness of your approach –Effort for implementation / per person –Retrieval performance (training & testing) –Completeness of the report ( 分工、結果分析 )

Overview (in Chinese) 專題目標專題目標 – 以進階 IR 技術提升 Okapi-BM25 的效能分組分組 –1~3 人 / 組，請組長將組員名單 ( 學號、姓名 ) 給 TA 方法方法 – 不限，可使用任何 toolkit or resource on Web Demo 及報告繳交 Demo 及報告繳交 –6/25 Friday 評分標準評分標準 – 所採用的方法創意、合理性 –Effort of implementation / per person – 檢索效能 (training 、 testing) – 報告完整性、分工及檢索結果分析

Content of Report Detail description about your approach Detail description about your approach Parameter setting (if parametric) Parameter setting (if parametric) System performance on the training topics System performance on the training topics –The baseline (Okapi-BM25) performance –The performance of your approach Division of the work ( 如何分工 ) Division of the work ( 如何分工 ) What you have learned ( 心得 ) What you have learned ( 心得 ) Others (optional) Others (optional)

Baseline Implementation: Okapi-BM25 Parametric probabilistic model Parametric probabilistic model Parameter setting Parameter setting –k 1 =1.2, k 2 =0, k 3 =0, b =0.75, R =r =0 (initial guess) Stemming: Porter ’ s stemmer Stemming: Porter ’ s stemmerPorter ’ s stemmerPorter ’ s stemmer

Possible Approaches Pseudo relevance feedback (PRF) Pseudo relevance feedback (PRF) –Supported by Lemur API Simple and effective, but no originality Simple and effective, but no originality Query expansion Query expansion –Using external resources ex: WordNet, Wikipedia, query log (AOL)...etc AOL Word sense disambiguation in docs/query Word sense disambiguation in docs/query Combining Results from 2 or more IR systems Combining Results from 2 or more IR systems Latent semantic analysis (LSI) Latent semantic analysis (LSI) Others Others –learning to rank, clustering/classification, …

Experimental Dataset A partial collection of TREC WT10g A partial collection of TREC WT10g –~10k documents –Link information is provided 30 topics for system development (training) 30 topics for system development (training) Another 20 topics in demo (testing) Another 20 topics in demo (testing)

Topic Example <top> Number: 476 Number: 476 Jennifer Aniston Jennifer Aniston Description: Description: Find documents that identify movies and/or television programs that Jennifer Aniston has appeared in. Narrative: Narrative: Relevant documents include movies and/or television programs that Jennifer Aniston has appeared in. </top>

Document Example <DOC><DOCNO>WTX010-B01-2</DOCNO><DOCOLDNO>IA B </DOCOLDNO><DOCHDR> text/html 264 HTTP/ OK Date: Sunday, 16-Feb-97 18:19:32 GMT Server: NCSA/SMI-1.0 MIME-version: 1.0 Content-type: text/html Last-modified: Friday, 02-Feb-96 19:51:15 GMT Content-length: 82 </DOCHDR> 1 Mr. Delleney did not participate in deliberation of this candidate. 1 Mr. Delleney did not participate in deliberation of this candidate.</DOC>

Link Information For approaches with PageRank/HITS For approaches with PageRank/HITS In-links In-links –“ A B C ”  B and C contain links to A ex: WTX010-B WTX010-B WTX010-B Out-links Out-links –“ A B C ”  A contains links pointed to B or C ex: WTX010-B WTX010-B01-89 WTX010-B01-119

Evaluation Evaluate top 100 retrieved documents Evaluate top 100 retrieved documents Evaluation metrics Evaluation metrics –Mean average precision (MAP) Use the program “ trec_eval” to evaluate system performance Use the program “ trec_eval” to evaluate system performance –Usage of trec_eval Usage of trec_evalUsage of trec_eval

Example Result for Evaluation (topic-num) (dummy) (docno) (rank) (score) (run-tag) 465Q0WTX017-B test 465 Q0WTX017-B test 465Q0WTX017-B test 465 Q0WTX017-B test 465 Q0WTX017-B test 465 Q0WTX018-B test 465 Q0WTX018-B test 465 Q0WTX012-B test 465 Q0WTX019-B test 465 Q0WTX019-B test 474 Q0WTX012-B test 474 Q0WTX017-B test 474 Q0WTX018-B test 474 Q0WTX013-B test 474 Q0WTX018-B test 474 Q0WTX015-B test 474 Q0WTX019-B test 474 Q0WTX014-B test 474 Q0WTX018-B test

Example of Relevance Judgments (topic-num) (dummy) (docno) (relevance) 4650WTX017-B WTX017-B WTX018-B WTX019-B WTX012-B WTX013-B WTX014-B WTX015-B WTX018-B WTX018-B

Summary of What to Do 1. Okapi-BM25 implementation (baseline) –With the fixed settings 2. Evaluate the baseline approach with training topics –using terms in as query 3. Survey or design your enhanced approach 4. Evaluate and optimize your approach with training topics 5. Submit report and demo with testing topics 6. Evaluate Okapi-BM25 and your approach with testing topics

Dataset Description (1/2) “ training_topics.txt” (file) “ training_topics.txt” (file) –30 topics for system development “ qrels_training_topics.txt” (file) “ qrels_training_topics.txt” (file) –Relevance judgments for training topics “ documents ” (directory) “ documents ” (directory) –Including 10.rar files of raw documents “ in_links.txt” (file) “ in_links.txt” (file) –In-link information “ out_links.txt ” (file) “ out_links.txt ” (file) –Out-link information

Dataset Description (2/2) “ trec_eval.exe ” (file) “ trec_eval.exe ” (file) –Binary evaluation program “ trec_eval.8.1.rar” (file) “ trec_eval.8.1.rar” (file) –Source of trec_eval for making in UNIX