Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Search Engines
Retrieving Location-based Data on the Web Andrei Tabarcea,
Search Engine Optimization
Databases & Data Warehouses Chapter 3 Database Processing.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
Electronic CommerceNonhlanhla Shongwe  Introduction  Mission statement  Product  Business model  SWOT Analysis  Conclusion.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Search Engine Architecture
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
© 2010 Deep Web Technologies, Inc. Taking the Library Back from Google Abe Lederman, President and CTO Deep Web Technologies May 12, 2010.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Search Engine Architecture
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Lecture 24: NER & Entity Linking
Introduction to Information Retrieval
Intent-Aware Semantic Query Annotation
Search Engine Architecture
Introduction to Search Engines
Presentation transcript:

Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework for Extracting Social Events from Press News

Outline Introduction SEED: social event discovery Named Entity Recognizer Relation Extractor Experiment Conclusion 2

Propose a novel solution to a real problem raised up by a Web company, namely to detect structured information about social events from unstructured press news. Introduction 3 person : Larry Page, Sergey Brin company : Google Inc place : Palo Alto, California

Consider two tasks: Named-entity recognition(NER) : extract and classify entities from unstructured text Relation extraction (RE) : to identify relations between entities. SEED : social entertainment event detection It turns out to detecting the following classes of entities from press news: DATE, LOCATOIN, PLACE, ARTIST. Introduction 4

Outline Introduction SEED: social event discovery Named Entity Recognizer Relation Extractor Experiment Conclusion 5

class : abstract entity: concrete class mention: instance of a concrete class EX: mention(m): Google Inc, Big G entity (e): Google class(c): COMPANY Definition 6

function : Text document: Named-entity: Definition 7

SEED : social entertainment event detection SEED : social event discovery 8

C = { DATE, LOCATION, PLACE, ARTIST } Rule-based approach : regular expressions Date tagger (ex: DD-MM-YYYY) Knowledge-based : dictionary N-Gram tokenizer : n consecutive words( n = 1…8) Location tagger : cities, districts, and municipalities in Italy from wikipedia Place tagger : company’s database of place Artist tagger : well-known artist from wikipedia Named entity recognizer 9

Relations may span over multiple sentences and even across several press news Achieve the relations by exploiting the so-called “wisdom of the crowd”. Social Web source : social networks, blogs, wiki, search engines Relation extractor 10

Candidate Extractor generate a set of candidate 3-ary tuples Candidate Ranking Select the final set of 3-ary tuples to refer to actual events An external Fresh Social Knowledge (FSK) module is used to rank candidate events Relation extractor 11

Three external source Encyclopedic knowledge : Wikipedia drawback : a fact or an event has already happened Social networks : Facebook drawback: the only field indexed on such event table, is just a free-text field Web search engines : Google Given a user query, a search engine returns a ranked list Candidate Ranking 12

Web search engines : Google Candidate event t = (a, l p, d) Query q t : the concatenation of the mentions corresponding to the entities of the tuple itself Ranked list : Candidate Ranking 13

Relevance score rank Candidate Ranking 14 f tit (e, r):frequency counts of the mentions of entity e in the title f sni (e, r):frequency counts of the mentions of entity e in the snippet α, β:weights γ (r):a score assigned to each web result Linear SEED: γ (r)=1 Nonlinear SEED:γ (r) = 1 / rank(r) S(t1)=0.9 S(t2)=0.1 S(t3)=0.6 T’ D ={t1, t3, t2}

EX: Query: Larry Page Google Inc Big G Palo Alto R1: Title:Larry Page(1) Google Inc /Big G(3) Palo Alto(1) Entity:Larry Page(3) Google Inc /Big G(10) Palo Alto(3) R2: Title:Larry Page(1) Google Inc /Big G(1) Palo Alto(1) Entity:Larry Page(3) Google Inc /Big G(5) Palo Alto(1) S(t) = 1/1[α*(1*3*1)+β(3*10*3)]+1/2[α*(1*1*1)+β(3*5*1)] Candidate Ranking 15

Outline Introduction SEED: social event discovery Named Entity Recognizer Relation Extractor Experiment Conclusion 16

Manually-labeled news 100 real Italian press news 1222 entities 198 events(relations) Experiments 17

Result Baseline 1 : the tuple in the same sentence of a press new Baseline 2 : the frequency of tuple is higher than the others Experiments 18

α & β Experiments 19

Outline Introduction SEED: social event discovery Named Entity Recognizer Relation Extractor Experiment Conclusion 20

Proposed SEED, a framework to automatically discovery social events from a collection of unstructured press news. SEED infers and disambiguates relations between previously discovered entities by exploiting the so- called “wisdom of the crowd”. Possible future works are interested in exploring concern designing a more effective NER solution and exploiting other social media to improve the performance of the RE task. Conclusion 21