Automatic Acquisition of Fuzzy Footprints Steven Schockaert, Martine De Cock, Etienne E. Kerre.

Slides:



Advertisements
Similar presentations
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Advertisements

A Vector Space Model for Automatic Indexing
Employing structural representation for symbol detection, symbol spotting and indexation in line drawing document images Muhammad Muzzamil Luqman
Jeopardy Themes of Q $100 Q $100 Q $100 Q $100 Q $100 Q $200 Q $200
Pointing at Places in a Geospatial Theory Richard Waldinger and Peter Jarvis Artificial Intelligence Center SRI International Jennifer Dungan Ecosystem.
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
Chicago Transit AuthorityJune, 2007 Chicago Transit Authority Regional South Metro Transportation Summit June 16, 2007 Matteson, Illinois.
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Maps Page 6 in your GREEN BOOKS—This will begin on Page 3 of your Student Notebook.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger.
Mr. Reed.  What is a hemisphere?  What is the line of latitude that divides the earth in half (north and south)?  What is the line of longitude that.
Image Renaissance Using Discrete Optimization Cédric AllèneNikos Paragios ENPC – CERTIS ESIEE – A²SI ECP - MAS France.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Image Annotation and Feature Extraction
AD-HOC GEOREFERENCING OF WEB-PAGES USING STREET-NAME PREFIX TREES Andrei Tabarcea, Ville Hautamäki, Pasi FräntiAndrei Tabarcea, Ville Hautamäki, Pasi Fränti.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University,
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Universit at Dortmund, LS VIII
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc.
Image interpretation by using conceptual graph: introducing complex spatial relations Aline Deruyver, AFD LSIIT UMR7005 CNRS ULP.
Latitude Latitude lines run east and west and measure north or south. The Equator is at 0 degrees latitude.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
Feature Detection in Ajax-enabled Web Applications Natalia Negara Nikolaos Tsantalis Eleni Stroulia 1 17th European Conference on Software Maintenance.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
CHAPTER 1 THE TOOLS OF GEOGRAPHERS. Globe A globe is a 3-D representation of the Earth. Sometimes globes are not very practical because they are TOO BULKY!
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
 Explains what each symbol on the map represents.
SCIENTIFIC METHOD.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Find that place..  Find that content  Longitude, East is to the right, West is to the left of the Prime Meridian or 0 degrees  Latitude, North is going.
Geographic Perspective.  On a piece of paper, quick write what comes to your mind when you think about “geographic perspective”
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Yoon kyoung-a A Semantic Match Algorithm for Web Services Based on Improved Semantic Distance Gongzhen Wang, Donghong Xu, Yong Qi, Di Hou School.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Correlation Clustering
Mikko Harju*, Juuso Liesiö**, Kai Virtanen*
North America Sub-Saharan Africa India Caribbean North Africa
Consistency Methods for Temporal Reasoning
An Automatic Construction of Arabic Similarity Thesaurus
Qiuping JIN Dr. Raysen, Cheung APCDA 2016, Taipei
The Five Themes of Geography
On the Scale and Performance of Cooperative Web Proxy Caching
Dude, where’s that IP? Circumventing measurement-based geolocation
Presentation 王睿.
COMPASS DIRECTIONS (page ).
Block Matching for Ontologies
You will be given the answer. You must give the correct question.
Topic: Semantic Text Mining
Presentation transcript:

Automatic Acquisition of Fuzzy Footprints Steven Schockaert, Martine De Cock, Etienne E. Kerre

Workshop on SEmantic Based Geographic Information Systems 1.Introduction 2.Constructing fuzzy footprints 3.Experimental results

Workshop on SEmantic Based Geographic Information Systems Geographical Question Answering WWW Give a list of Italian Restaurants in the neighborhood of Agia Napa. La Strada Italian Restaurant, Bosko’s ristorante, …

Workshop on SEmantic Based Geographic Information Systems Geographic Question Answering Resources –Linguistic resources for question analysis, answer extraction, … –A traditional search engine to locate relevant documents –Geographic background knowledge Footprints provided by gazetteers are often inadequate –We need a more fine-grained representation than a bounding box –Questions may involve vague regions such as the Alpes, the Highlands, … Our solution: construct footprints automatically –Use the web the collect relevant information –Use a digital gazetteer to map location names to co- ordinates –Use fuzzy sets to represent footprints

Workshop on SEmantic Based Geographic Information Systems Fuzzy Sets A fuzzy set A in a universe U is a mapping from U to [0,1] (Zadeh, 1965) –u belongs to A  A(u)=1 –u doesn’t belong to A  A(u)=0 –u more or less belongs to A  0 < A(u) < 1 Old

Workshop on SEmantic Based Geographic Information Systems We represent footprints as fuzzy sets in the universe of co-ordinates Fuzzy Footprints “South of France”

Workshop on SEmantic Based Geographic Information Systems 1.Introduction 2.Constructing fuzzy footprints 3.Experimental results

Workshop on SEmantic Based Geographic Information Systems Obtaining relevant locations the Ardeche region - Located in the north of the Ardeche region, - (,)* and other cities in the Ardeche region - is situated in the heart of the Ardeche region - … St-Félicien, Lamastre, St-Agrève,… ADL gazetteer

Workshop on SEmantic Based Geographic Information Systems Disambiguation of location names based on –the country the region is located in –the distance to the other locations Obtaining relevant locations

Workshop on SEmantic Based Geographic Information Systems Existing approaches –Use the convex hull of the locations  web data is too noisy  not suitable for vague regions –Use the density of the locations (Purves et al., 2005)  reflects popularity rather than the extent of a region Our solution: search for additional constraints to filter out noise Constructing a footprint

Workshop on SEmantic Based Geographic Information Systems Constructing a footprint x is in the north of the Ardeche region

Workshop on SEmantic Based Geographic Information Systems Constructing a footprint x is in the north of the Ardeche region inconsistent consistent ???

Workshop on SEmantic Based Geographic Information Systems Modelling constraints x is located in the north of the Ardeche Gradual transition Consistent Inconsistent

Workshop on SEmantic Based Geographic Information Systems Modelling constraints x is located in the north of the Ardeche Gradual transition Consistent Inconsistent Based on the average difference in y co- ordinates

Workshop on SEmantic Based Geographic Information Systems In a similar way: –x is located in the south of the Ardeche –x is located in the west of the Ardeche –x is located in the east of the Ardeche –x is located in the north-west of the Ardeche  x is located in the north of the Ardeche  x is located in the west of the Ardeche –x is located in the heart of the Ardeche Modelling constraints

Workshop on SEmantic Based Geographic Information Systems Modelling constraints the Ardeche is located in the south of France Gradual transition Consistent Inconsistent

Workshop on SEmantic Based Geographic Information Systems Modelling constraints the Ardeche is located in the south of France Gradual transition Consistent Inconsistent Based on the minimal bounding box for France (ADL gazetteer)

Workshop on SEmantic Based Geographic Information Systems In a similar way: –R is located in the north of France –R is located in the east of France –R is located in the west of France –R is located in the north-west of France  R is located in the north of France  R is located in the west of France –R is located in the heart of France Modelling constraints

Workshop on SEmantic Based Geographic Information Systems Modelling constraints Heuristic: points that are too far from the median are likely to be noise Inconsistent Gradual transition Consistent

Workshop on SEmantic Based Geographic Information Systems Modelling constraints Heuristic: points that are too far from the median are likely to be noise Inconsistent Gradual transition Consistent Based on the average distance to the median

Workshop on SEmantic Based Geographic Information Systems Example Constraints satisfied to degree 1 Constraints satisfied to degree 0.6 Constraints satisfied to degree 0.4 Constraints satisfied to degree 0

Workshop on SEmantic Based Geographic Information Systems Example Constraints satisfied to degree 1

Workshop on SEmantic Based Geographic Information Systems Example Constraints satisfied to degree 0.6

Workshop on SEmantic Based Geographic Information Systems Example Constraints satisfied to degree 0.4

Workshop on SEmantic Based Geographic Information Systems If the set of constraints is inconsistent (i.e. no point satisfies all constraints), we remove a minimal set of constraints such that: –As many constraints as possible are preserved –The area of the fuzzy footprint is as high as possible Imposing constraints is used to improve precision, not recall Some remarks

Workshop on SEmantic Based Geographic Information Systems Bordering regions Footprint can be constructed using the ADL gazetteer

Workshop on SEmantic Based Geographic Information Systems 1.Introduction 2.Constructing fuzzy footprints 3.Experimental results

Workshop on SEmantic Based Geographic Information Systems Evaluation metric Precision: degree to which the fuzzy footprint F is included in the correct footprint G Recall: degree to which the correct footprint G is included in the fuzzy footprint F

Workshop on SEmantic Based Geographic Information Systems 81 political subregions of France, Italy, Canada, Australia and China Divided into three groups: –Regions for which we found more than 30 candidate cities –Regions for which we found less than 10 candidate cities –Regions for which we found between 10 and 30 candidate cities Gold standard: convex hull of the locations that are known to lie in the region according to the ADL gazetteer Test data

Workshop on SEmantic Based Geographic Information Systems Precision Without bordering regions With bordering regions

Workshop on SEmantic Based Geographic Information Systems Without bordering regions With bordering regions Recall

Workshop on SEmantic Based Geographic Information Systems New approach to approximate the footprint of an unknown region Also suitable for vague regions Search for constraints on the web to improve precision Search for bordering regions on the web to improve recall Experimental results confirm this hypothesis Conclusions Thank you for your attention!