Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Introduction to push technology © 2009 Research In Motion Limited.
Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu ( ) Supervisor: Robert Dale.
Search Engines and Information Retrieval
A reactive location-based service for geo-referenced individual data collection and analysis Xiujun Ma Department of Machine Intelligence, Peking University.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
MOBIGUIDE MOBIGUIDE CS 8803 – ADVANCED INTERNET APPLICATION DEVELOPMENT Project Presentation By: Ashwin Pallikarana Tirumala Lalanthika Vasudevan Sneha.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
SMS-Based web Search for Low- end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University
Retrieving Location-based Data on the Web Andrei Tabarcea,
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Is Mobile the Future of GIS? Matt Sheehan WebMapSolutions.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Finding Nearby Wireless Hotspots CSE 403 LCA Presentation Team Members: Chris Scoville Tessa MacDuff Matt Mohebbi Aiman Erbad Khalil El Haitami.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Intranet and internet based software components. 2 Overview  What are intranet and internet based map applications?  System Requirements  Architecture.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
AD-HOC GEOREFERENCING OF WEB-PAGES USING STREET-NAME PREFIX TREES Andrei Tabarcea, Ville Hautamäki, Pasi FräntiAndrei Tabarcea, Ville Hautamäki, Pasi Fränti.
Location-Based API 1. 2 Location-Based Services or LBS allow software to obtain the phone's current location. This includes location obtained from the.
MOBIGUIDE MOBIGUIDE CS 8803 – ADVANCED INTERNET APPLICATION DEVELOPMENT Project Presentation By: Ashwin Pallikarana Tirumala ( ) Lalanthika Vasudevan( )
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Survey of Semantic Annotation Platforms
TWIRL Twinning virtual World (on- line) Information with Real world (off-Line) data sources Kick-Off Meeting Cassidian 08 & 09 October 2012, Paris - France.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 Tradedoubler & Mobile Mobile web & app tracking technical overview.
Ontology-Based Information Extraction: Current Approaches.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
A Language Independent Method for Question Classification COLING 2004.
Problem Statement: Users can get too busy at work or at home to check the current weather condition for sever weather. Many of the free weather software.
User Behavior Analysis of Location Aware Search Engine Third international Conference of MDM, 2002 Takahiko Shintani, Iko Pramudiono NTT Information Sharing.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Kyoungryol Kim Extracting Schedule Information from Korean .
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Location Aware Information System (LAIS) Neftali Alverio Bryan Halter Jeff Cardillo Brian Reed Advisor: Prof. Tilman Wolf.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Human Tracking System Using DFP in Wireless Environment 3 rd - Review Batch-09 Project Guide Project Members Mrs.G.Sharmila V.Karunya ( ) AP/CSE.
Mobile Search Engine Based on idea presented in paper Data mining for personal navigation, Hariharan, G., Fränti, P., Mehta S. (2002)
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
 Architectural overview  Main APIs. getGames.php getGroupsLocations.php getGroupsScores.php getMessage.php getStreet.php getTime.php login.php sendMessage.php.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
GEO PLACES EXPLORER PRESENTED BY KHUSHBOO BAGHADIYA SUMANA VENKATESH.
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
Library Online Resource Analysis (LORA) System Introduction Electronic information resources and databases have become an essential part of library collections.
Kyoungryol Kim Meeting Information Extraction from Meeting Announcement in Korean.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
A Software Energy Analysis Method using Executable UML for Smartphones Kenji Hisazumi System LSI Research Center Kyushu University.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
INTRODUCING HYBRID APP KAU with MICT PARK IT COMPANIES Supported by KOICA
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
LOGO Supervisor: Mr.Huỳnh Anh Dũng Students: Nguyễn Công Tuyến Nguyễn Cảnh Phương Phạm Thị Hằng Bùi Thị Huệ Trần Đức Bình Nguyễn.
SOURCE:2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING AUTHER: MINGLIU LIU, DESHI LI, HAILI MAO SPEAKER: JIAN-MING HONG.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Social Knowledge Mining
Presentation transcript:

Mapping of Geographical Entity with Meeting Location from Text for Mobile Kyoungryol Kim

Table of Contents 1.Introduction 2.Background and Related Work 3.The Proposed System 4.Experimentation 5.Conclusion 2

1. Introduction 1) Motivation 2) Problem Definition 3) Contribution

Motivation : IE Techniques on Smartphone 4 Apple iPhone Google Android RIM Blackberry MS Windows Phone Time(Text) Recognition Phone No. Recognition Location(Text) Recognition Adding event by recognized time May 21, 2011 Address Recognition (Captured from Apple iPhone) People start to pay attention to ‘Location Extraction’ technique

Motivation : Characteristics of Mobile Device  Memory Issue  Android : 16MB heap size limit for each app.  iPhone : No memory limit, but totally 512MB of RAM (iPhone4)  Speed Issue  People who use mobile devices usually feel uncomfortable when it delays.  IE System  Usually general Information Extraction system consists of many NLP modules which consume more than 1GB memory, at least.  Client-Server model  Client and server communicating model that every processing is done in server -side.  Need internet connection (3G or Wifi).  If many clients request to the server at once, there will be overloading delays or the server dies. 5 IE Method Specialized on Mobile Device is Needed

Goal of this Research  Mapping Meeting Location text to the Geographical Location and update it to online calendar in mobile device 6 The team meeting for the evaluation of first half of Univcast will be held. Date : July 19 (Sat) PM 2 Location : Myeong-dong Dandelion Territory Directions to Dandelion Territory At Myeong-dong station gate number 8, take a walk following the downtown then there it is on the first floor of YMCA building. Meeting Location NameMyeong-dong Dandelion Territory Address 1-1, Myeong-dong 1-ga, Jung-gu, Seoul, Korea Geocode( , ) Meeting Announcement Extract Meeting Location Update Calendar startTime T14:00 Extract Time

Problem Definition 1. Extract meeting location from meeting announcement 2. Disambiguate the extracted meeting location 7 회의는 오후 5 시 학생회관 101 호에서 열립니다. (Meeting will be held 5 PM at Room 101, Student Union.)

Contribution 8

2. Background and Related Works 1) Information Extraction 2) Geocoding 3) Linked Open Data 4) Local-Grammar Graph

Information Extraction  Information Extraction  The objective is to construct structured database from free text or semi-stru ctured text (J. H. Kim 2004)  Related Work  CMU Seminar Announcement Corpus  485 semi-structured seminar announcements  Types : stime, etime, location, speaker  Focus only on 4 types of information extraction, not on Geocoding. 10 Examples of seminar announcement

Geocoding  Geocoding  The process of finding associated geographic coordinates, often expressed as l atitude and longitude, from other geographic data such as street addresses or zip codes (Geocoding, Wikipedia)  Related Work  Geocode from the address (Manov 2003; Jones 2003; Peng 2006; Pouliquen 2006; Volz 2007; Overell 20 07; Goldberg 2007; Kauppinen 2008)  The big issue of the research is disambiguation of address (Pouliquen et al. 2006) 1.Multi-referent ambiguity  two different geographic locations share the same name,  e.g. "Cambridge" is it Cambridge, UK or Cambridge, Massachusetts? 2.Name variant ambiguity  the same location has different names, 3.Geoname-Non Geoname ambiguity  where a location name could also stand for some other word such as a person name or nouns,  e.g. Metro as the city in Indonesia vs. Metro as the subway system  Focus only on Geocoding address, not all location entity  e.g. "Room 101, Student Union, Hanyang University" 11

Linked Open Data  Linked Open Data  URL :  The project aims to identify data sets that are available under open licenses, re-publi sh these in RDF on the Web and interlink them with each other  Geographic Datasets are growing rapidly  For only few Korean Geographical data included in LOD, we regard set of open geo graphical data as Linked Data, in this research. 12 March 2009September 2010September 2011

Local-Grammar Graph  Local-Grammar Graph  The language description model which is to perform automatic analysis an d generation of natural language text, information extraction, using local la nguage information in the form of Finite-State Automata. (J. Nam 2006)  Help to increase  efficiency and accuracy by lexicalizing the knowledge forming grammar  readability by consisting grammar as Directed Acyclic Graphs.  Various omission and permutation can be described which cannot be done by rules or specific features. 13 Example of LGG for 176 kinds of French wine un vin rouge de Bordeaux un vin de Bordeaux rouge un rouge de Bordeaux un Bordeaux rouge un Bordeaux un rouge.... du vin d'Alsace blanc du vin blanc d'Alsace du blanc d'Alsace de l'Alsace de l'Alsace blanc du blanc

Finite State Transducer  Finite State Transducer for IE 14

3. The Proposed System 1) Preliminaries 2) Overall Architecture 3) Extraction Module 4) Disambiguation Module

Preliminaries  Meeting Location  Definition of “Meeting Location” :  A location where the meeting will be held 16 ::= | | | | ::= | ::= | | | | | ::= | | | | ::= ::= 수도권 | 부산 | 대구 | 광주 | 대전 ::= 인천선 | 분당선 | 중앙선 | 공항철도 | 경의선 | 경춘선 | 1 호선 | 2 호선 | 3 호선 | 4 호선 | 5 호선 | 6 호선 | 7 호선 | 8 호선 | 9 호선

Overall Architecture 17 Extraction Module Disambiguation Module Query Disambiguated Result Mobile Device Server Linked Data Finite-State Transducers INPUT OUTPUT 제목 : 팀장회의 공지 2008 년도의 마지막 팀장회의가 11 월 22 일 토요일 오 후 2 시 종로 토즈에서 열립니다. 재계약 그리고 명함 배 부가 이뤄질 예정이니 팀장님, 그리고 차기팀장님들 모 두 와주시기 바랍니다. 오시는 길 : 종로 종각역 4 번 출구에서 내려서 100m 정 도 걸어오시면 오른쪽에 있습니다. 팀장회의 공지 장소장소 명칭종로 토즈 주소 대한민국 서울특별시 종 로구 종로 가동 84-8 GPS 좌표 ( , ) Template Generator Personal GeoData

Extraction Module (1/2) 1.Construct Local-Grammar Graph (LGG)  Find local patterns around meeting location, inductively.  Scope of local patterns :  Previous/Next/Current sentence including meeting location.  Describe local patterns with 110 information types under 7 categories.  Location, Time, Title, Actor, Label, Connecting words, Etc.  e.g. ‘ 장소 : ‘ is ‘locLbl’ information type under ‘Label’ category. 2.Convert LGG to Finite-State Transducer (FST) 3.Extract Meeting Location by FST 학술대회 일정 : 2003 년 5 월 17 일 ( 토요일 ) 10:30 ~ 16:30 3. 학술대회 장소 : 성공회대학교 피츠버그관 4. 학술대회 순서

Extraction Module (2/2)  Category of LGG for Meeting Location 개최장소 1 개 1.1. 장소 장소 장소 1_1 | 장소 1_ 장소 1_1 | 장소 1_2 | 장소 1_ 장소 + 랜드마크 장소 | 랜드마크 장소 1_1 | 장소 1_2 | 랜드마크 장소 | 랜드마크 1 | 랜드마크 장소 + 주소 장소 | 주소 장소 1_1 | 장소 1_2 | 주소 장소 1_1 | 장소 1_2 | 장소 1_3 | 주소 장소 | 랜드마크 | 주소 2. 개최장소 N 개 (N>1) 2.1. 개최장소 2 개 2.2. 개최장소 3 개 2.3. 개최장소 4 개 1. 일시 및 장소 : ( 수 ) 14:00~16:00, 무역협회 중회의실 ( 삼성동 트레이드 타워 51 층 ) 3. 장 소 : 울산광역시 울주군 상북면 등억리 27 번지 먹고쉬었다가 ( )

Disambiguation Module (1/2)  Problem  Multi-reference ambiguity (Pouliquen et al. 2006)  two different geographic locations share the same name  e.g. "Cambridge" is it Cambridge, UK or Cambridge, Massachusetts?  Disambiguation by Linked Data  Personal Geo Data  Personalized OpenStreetMap  User can map and save geographical location to the ‘meeting location’  (should be applied, consulting by Claus at Leipzig Univ.)  Open Geo Data  Naver Local Search API  Yahoo! POI Search API  Seoul Bus-stop DB  Disambiguation by applying Ranking algorithm  (idea will be borrowed from meta-search researches)  disambiguate with 1st ranked geographical location 20

Disambiguation Module (2/2) 21 Personal Geo Data Query : 동측식당 Linked Data Personal Geo Data 동측식당 Naver Local API Yahoo! POI API Seoul Bus-stop Open Geo Data Disambiguation 동측식당 동측식당

4. Experimentation 1) Experiment Data 2) Extraction Module 3) Disambiguation Module

Experiment Data  Meeting announcement corpus  1101 meeting announcements  Collected from the web, with keyword ‘notice’  Annotation  10 types of term, 13 types of relation  3 human annotators with COAT annotation toolkit 23

Extraction Module  Exp1. Extraction speed/memory comparison  Baseline system : ML based system  Dataset :  already gathered corpus (training/test set)  Exp2. Extraction performance comparison  Baseline system : ML based system  Evaluation : Precision/Recall/F-measure  Dataset :  already gathered corpus (training/test set)  newly gathering corpus 24 (Experimentation should be followed)

Disambiguation Module  Exp1. Accuracy in distance  6 types of distance :  0≤x≤100m, 100m≤x<1km, 1km≤x<2km, 2km≤x<3km, 3km≤x<5km and 5km ≤x  Exp2. Accuracy Improvement with Personal Geo Data  Evaluation :  hard to show the performance  show some scenarios how can it be applied so that it can improve accuracy.  Exp3. Performance of Ranking Algorithm comparison  Exp4. Disambiguation speed/memory comparison  processing and communication speed/memory comparison  on Server vs. on Mobile device 25 (Experimentation should be followed)

5. Conclusion 1) Assessment of the Approach 2) Limitation and Future Work

References 27

Abstract  배경 : 연구배경, 문제점, 필요성 (1/2)  논문에서 제시한 해결방안  해결방안의 장점 28