Kyoungryol Kim Meeting Information Extraction from Meeting Announcement in Korean.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

IMAP: Discovering Complex Semantic Matches Between Database Schemas Ohad Edry January 2009 Seminar in Databases.
CSCI3170 Introduction to Database Systems
Atlas III Improvements Expands on Atlas II capabilities – Faceted Navigation – counts are displayed next to selectable attribute – Lunar Map interface.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
OO Process Steps  Define requirements Allocate and Derive requirements Map requirements to use cases Map requirements to classes  Define use cases Draw.
1 Adaptive Management Portal April
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña,
Chapter 14 Getting to First Base: Introduction to Database Concepts.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
IELM 230: Industrial Data Systems Course topics: - Relational Database Design - DB development and optimized usage - DB backed web-applications.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Application Process USAJOBS – Application Manager USA STAFFING ® —OPM’S AUTOMATED HIRING TOOL FOR FEDERAL AGENCIES.
Software Development, Programming, Testing & Implementation.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Designing Classes OO Software Design and Construction Computer Science Dept Va Tech January 2002 ©2002 McQuain WD & Keller BJ 1 Designing the Classes Once.
Microsoft Outlook Web Application (OWA)
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
1 The BT Digital Library A case study in intelligent content management Paul Warren
How to Create Shapefiles For NiJel Using QGIS: Before you start creating shapefiles make sure you have OpenOffice install, QGIS, and File Transfer Protocol.
Mapping of Geographical Entity with Meeting Location from Text for Mobile Kyoungryol Kim.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
Footer Text A Tool for Environmental Scheduling, Accountability and Performance Measurement TxECOS.
Mastering Char to ASCII AND DOING MORE RELATED STRING MANIPULATION Why VB.Net ?  The Language resembles Pseudocode - good for teaching and learning fundamentals.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.
27/03/01CROSSMARC kick-off meeting LTG Background XML-based Processing –Several years of experience in developing XML-based software –LT XML Tools –Pipeline.
FI-WARE Points of Interest (POI) Data Provider Short Introduction Nonprofit educational material. Fair use of copyrighted content, if any, is assumed.
Chapter 17 Creating a Database.
ECE 3553 Final Project by Brian Robl. What is Eventor? A simple, yet effective, website for event planning and searching.  Influence from Facebook Events.
Kyoungryol Kim Extracting Schedule Information from Korean .
Kyoungryol Kim Meeting Information Extraction from Meeting Announcement in Korean.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Search Engine Architecture
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
1 CSE 2337 Introduction to Data Management Textbook: Chapter 1.
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
National Institute of Advanced Industrial Science and Technology Query Processing for Distributed RDF Databases Using a Three-dimensional Hash Index Akiyoshi.
Portal 사이트 구축 진행 상황 2011 년 2 월 17 일 박광희, 박상원. Contents Goal Overall Plan Server Integration –Problems –Current Progress Developing New Portal –Important.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
XML and Database.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
June 27-29, DC2 Software Workshop - 1 Tom Stephens GSSC Database Programmer GSSC Data Servers for DC2.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
A Look at Creating & Updating Point Files
Miscellaneous Excel Combining Excel and Access.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Creating Databases Local storage. join & split
Single Sample Registration
Dynamic SQL: Writing Efficient Queries on the Fly
ODS API Suite APIs to Organisation Reference Data
Dynamic SQL: Writing Efficient Queries on the Fly
Electronic Field Study Advanced User Training
Laura Bright David Maier Portland State University
Computer Science Projects Database Theory / Prototypes
Getting to First Base: Introduction to Database Concepts
Getting to First Base: Introduction to Database Concepts
Getting to First Base: Introduction to Database Concepts
Introduction to Search Engines
Presentation transcript:

Kyoungryol Kim Meeting Information Extraction from Meeting Announcement in Korean

Table of Contents 1.Introduction  Motivation  Goal  Problem Definition 2.The Proposed Method  Problem Modeling / Checklist  Overall Architecture  Normalization Process 2

Introduction 3

Motivation  Everyday we receive a lot of Meeting Announcement  Conference, Seminar, Workshop, Meeting, Appointment…  Meeting announcement accounts for 17% (30,201 out of 183,022) of s in Enron Dataset.  Smartphone era  Many people manage schedule using online-calendar via smartphone e.g. Google Calendar  But, typing by touch screen keyboard make many errors and even it’s difficult. 4 * Enron Dataset, August 21, 2009 version,

Goal  Extracting schedule information from meeting announcement, and update them to the calendar, automatically. 5 무더운 날씨가 본격적으로 시작 되는 즈음하여 유니브캐스트의 상반기 평가와 하반기 운영을 위 한 정기팀장회의를 개최합니다. 날짜 : 7 월 19 일 ( 토 ) 오후 2 시 장소 : 민들레영토 민들레영토 오는길 지도와 같이 명동역 8 번 출구로 나오셔서 쭉 상가 끼고 걸어가시 면 저기 YMCA 빌딩 1 층에 있습 니다. startTime T14:00 isHeldAt Administrative Address 대한민국 서울특별시 중구 명동 1 가 1-1 민들레영토 명동점 Geocode( , ) Semantic TypeCafé Meeting Announcement Extract Update

Problem Definition To find Meeting Location, the problem divided into 2 parts : 1. Finding locations from the text for each type of predefined complexity. 2. Named entity disambiguation on found locations. 6 무더운 날씨가 본격적으로 시 작되는 즈음하여 유니브캐스트 의 상반기 평가와 하반기 운영 을 위한 정기팀장회의를 개최 합니다. 날짜 : 7 월 19 일 ( 토 ) 오후 2 시 장소 : 민들레영토 기본 안건 - 제작지원비 지급 지연에 대한 설명 - 기금 조정 운영안 - 가을 워크샵 준비위 구성 - 기타 ( 기타 안건으로 상정할 것 이 있으면 각 팀장들은 제안해 주시기 바랍니다 ) 민들레영토 오는길 지도와 같이 명동역 8 번 츨구로 나오셔서 쭉 상가 끼고 걸어가 시면 저기 YMCA 빌딩 1 층에 있습니다. 참고하세요 1. Finding Target Locations 무더운 날씨가 본격적으로 시작되는 즈음하여 유니브캐스트의 상반기 평가와 하반기 운영을 위한 정기팀장회의를 개최합니다. 날짜 : 7 월 19 일 ( 토 ) 오후 2 시 장소 : 민들레영토 기본 안건 - 제작지원비 지급 지연에 대한 설명 - 기금 조정 운영안 - 가을 워크샵 준비위 구성 - 기타 ( 기타 안건으로 상정할 것이 있으면 각 팀장들은 제안해 주시기 바랍니다 ) 민들레영토 오는길 지도와 같이 명동역 8 번 츨구로 나오셔서 쭉 상가 끼고 걸어가시면 저기 YMCA 빌딩 1 층에 있습니다. 참고하세요 2. Disambiguation

The Proposed Method 7

Problem Modeling 8 Meeting Announcement TextLocation on the Map Extract location strings Extract address information and limit the boundary 2. How to extract address information? 1. How to extract location string? 7. How to represent Location? Search the location from the DB 3. What kind of DB can we use? 4. How to manipulate the query? Search the location from external resources 5. What kind of external resources can we use? Disambiguation among found locations 6. What are the measures to find desired location?

Problem List (1/2) 1.How to extract location strings from the given text? 2.How to extract address information from location strings? 3.To search the location, what kind of database can we use? 4.To search the location, how to manipulate the query? 5.To search the location, what kind of external resources can we use? 6.What are the measures to find desired locations among candidates? 7.How to represent the location ? 9

Problem List (2/2) - Reorganized 1.How to extract location strings from the given text? 2.How to extract address information from location strings? 1)How to check whether address information is included or not? 2)How to construct database which can limits boundary of address 3)boundary 를 가리키는 지역이 여러군대라면 ? 3.To search the location, what resources can we use? 1)Internal database : How to construct internal database? 2)External resources : What external resources available? 4.To search the location, how to manipulate the query? 5.What are the measures to find desired locations among candidates? 6.How to represent the location ? 1)To store the location to the DB 2)To represent the location on the map 10

Problem Checklist : (6/6) How to represent the location ? 1)To store the location to the DB  Uses OpenStreetMap representation  Node / Way / Relation 2)To represent the location on the map  WGS84 (standard) : ( latitude, longitude [, altitude] ) 11

Representation of Meeting Location  Follows basic representations of the Node in OpenStreetMap to represent location.  Regard the meeting location as Point-of-Interest  Variable attributes (key-value pair)  used_as_meeting_location=true  search_query=user’s query (comma separated)  Meeting location can be imported to OSM server (interoperability) <node id=" " lat=" " lon=" " user="cyana" uid="74529" visible="true" version="3" changeset=" " timestamp=" T02:26:19Z"> <node id=" " lat=" " lon=" " user="cyana" uid="74529" visible="true" version="2" changeset=" " timestamp=" T08:09:50Z">

node idint latdouble lngdouble uservarchar(100) : versionint changesetint timestampvarchar(20) changeset node_idint idint created_atvarchar(20) num_changesint closed_atvarchar(20) openboolean uservarchar(100) : changeset_tag node_idint changeset_idint idint keyvarchar(100) valuevarchar(100) node_tag node_idint idint keyvarchar(100) valuevarchar(100) bounds idint country_codechar(2) : ISO-3166 admin_div1varchar(100) admin_div2varchar(100) admin_div3varchar(100) admin_div4varchar(100) southwest_latdouble southwest_lngdouble northeast_latdouble northeast_latdouble

Example : bounds  Bounds information constructed by using Google Maps API  Closed-world is South Korea area (possibly can be expanded)

Corpus Expansion Overall Architecture 15 Input Document OUTPUT Finding Target Locations Training Corpus Adding Document to Corpus Trained Models (CRFs,SVMs) Train Models Gazetteer Expand Gazetteer Document Annotation Location NER Relation-type Classification OpenAPI Map Services Disambiguation Normalization Personal Information Testing SystemTraining System

Pre-Processing : Input Query: 프란치스코교육회관 2 층 Split the Query into 2 parts : Main Part / Extra-Part Main : Chunks include Main location information. Extra : Chunks include Floor/room information. { “query” : { “full” : “ 프란치스코교육회관 2 층 ”, “main” : “ 프란치스코교육회관 ”, “extra” : “2 층 ” } { “query” : { “full” : “ 프란치스코교육회관 2 층 ”, } Remove HTML-tag/URL/ ㈜ Replace (),[],{} with space Input Document OUTPUT Finding Target Locations Location NER Relation-type Classification OpenAPI Map Services Disambiguation Normalization Trained Models (CRFs,SVMs) Gazette er Personal Informati on Normalization Normalization Process

Extract Address Information include House no? Bounds DB Yes No Get Bounds info from Address (SW, NE) Geocoding by Query 1. if query doesn’t have Address information: Without boundary limitation, just do search from the databases and APIs has Address info? 1) main query 를 space 단위로 chunking 하고 2) 각 chunk 를 iteration 하면서 - chunk 가 “- 시 ”, “- 시 /- 구 /- 군 ”, “- 동 /- 가 /- 면 /- 읍 ”, “- 리 ” 로 끝나는지, - DB 의 시 / 구 / 동 / 리 칼럼의 값으로 시작되는지 확인하여, 찾아진 칼럼과 값을 저장한다. 3) 주소정보가 포함되어 있다면, 뒤에 번지수까지 포함하고 있는지 확인한다. [0-9]+, [0-9]+\-[0-9]+, [0-9]+ 번지, [0-9]+\-[0-9]+ 번지 - 번지수까지 포함되어 있으면, 바로 geocoding. - 번지수는 없으면, 해당지역까지의 bounds 를 db 에서 가져옴. has Address Info? Yes No { “query” : { “full” : “ 프란치스코교육회관 2 층 ”, “main” : “ 프란치스코교육회관 ”, “extra” : “2 층 ” } { “query” : { “full” : “ 서울시 강남구 삼성동 무역회관 2001 호 ”, “main” : “ 서울시 강남구 삼성동 무역회관 ”, “extra” : “2001 호 ” }, found_locations : [ { “title” : “ 대한민국 서울특별시 강남구 삼성동 159-1”, “administrative_address” : “ 대한민국 서울특별시 강남구 삼성동 159-1”, “geometry_location” : { “lat” : , “lng” : } ] } { “query” : { “full” : “ 소공동 코리아나 호텔 ”, “main” : “ 소공동 코리아나 호텔 ”, “extra” : “” }, “limited_bound” : { “name” : “ 대한민국 서울특별시 중구 소공동 ”, “southwest” : { lat : , lng : }, “northeast” : { lat : , lng : } } Input Document OUTPUT Finding Target Locations Location NER Relation-type Classification OpenAPI Map Services Disambiguation Normalization Trained Models (CRFs,SVMs) Gazette er Personal Informati on Normalization

Extract Address Information include House no? Bounds DB Yes No Get Bounds info from Address (SW, NE) Geocoding by Query 2. if the query have address information (with house number) : Geocode the address information and return. (Disambiguation finished) has Address Info? Yes No { “query” : { “full” : “ 프란치스코교육회관 2 층 ”, “main” : “ 프란치스코교육회관 ”, “extra” : “2 층 ” } { “query” : { “full” : “ 서울시 강남구 삼성동 무역회관 2001 호 ”, “main” : “ 서울시 강남구 삼성동 무역회관 ”, “extra” : “2001 호 ” }, found_locations : [ { “title” : “ 대한민국 서울특별시 강남구 삼성동 159-1”, “administrative_address” : “ 대한민국 서울특별시 강남구 삼성동 159-1”, “geometry_location” : { “lat” : , “lng” : } ] } { “query” : { “full” : “ 소공동 코리아나 호텔 ”, “main” : “ 소공동 코리아나 호텔 ”, “extra” : “” }, “limited_bound” : { “name” : “ 대한민국 서울특별시 중구 소공동 ”, “southwest” : { lat : , lng : }, “northeast” : { lat : , lng : } } Input Document OUTPUT Finding Target Locations Location NER Relation-type Classification OpenAPI Map Services Disambiguation Normalization Trained Models (CRFs,SVMs) Gazette er Personal Informati on Normalization

Extract Address Information include House no? Bounds DB Yes No Get Bounds info from Address (SW, NE) Geocoding by Query 3. if the query have address information (no house number) : Get bound information and search the location in the bound. has Address Info? Yes No { “query” : { “full” : “ 프란치스코교육회관 2 층 ”, “main” : “ 프란치스코교육회관 ”, “extra” : “2 층 ” } { “query” : { “full” : “ 서울시 강남구 삼성동 무역회관 2001 호 ”, “main” : “ 서울시 강남구 삼성동 무역회관 ”, “extra” : “2001 호 ” }, found_locations : [ { “title” : “ 대한민국 서울특별시 강남구 삼성동 159-1”, “administrative_address” : “ 대한민국 서울특별시 강남구 삼성동 159-1”, “geometry_location” : { “lat” : , “lng” : } ] } { “query” : { “full” : “ 소공동 코리아나 호텔 ”, “main” : “ 소공동 코리아나 호텔 ”, “extra” : “” }, “limited_bound” : { “name” : “ 대한민국 서울특별시 중구 소공동 ”, “southwest” : { lat : , lng : }, “northeast” : { lat : , lng : } } Input Document OUTPUT Finding Target Locations Location NER Relation-type Classification OpenAPI Map Services Disambiguation Normalization Trained Models (CRFs,SVMs) Gazette er Personal Informati on Normalization

{ “query” : { “full” : “ 소공동 코리아나 호텔 ”, “main” : “ 소공동 코리아나 호텔 ”, “extra” : “” }, “limited_bound” : { “name” : “ 대한민국 서울특별시 중구 소공동 ”, “southwest” : { lat : , lng : }, “northeast” : { lat : , lng : } } Find Candidate Locations User Meeting Location DB (Priority 1) SWRC Meeting Location DB (Priority 2) Open API (OpenStreetMap, Naver) (Priority 3) Remove Duplicated Addresses { “query” : { “full” : “ 소공동 코리아나 호텔 ”, “main” : “ 소공동 코리아나 호텔 ”, “extra” : “” }, “limited_bound” : { “name” : “ 대한민국 서울특별시 중구 소공동 ”, “southwest” : { lat : , lng : }, “northeast” : { lat : , lng : } }, found_locations : [ { “query” : “ 밀레니엄 힐튼 서울 ”, “title” : “ 밀레니엄 힐튼 서울 ”, “administrative_address” : “ 대한민국 서울특별시 중구 태평로 1 가 61-1”, “geometry_location” : { “lat” : , “lng” : }, {..... } ] } Geocoding Coordinate Conversion KTM -> WGS84 Local Search SWRC DB User DB Open API WMS Input Document OUTPUT Finding Target Locations Location NER Relation-type Classification OpenAPI Map Services Disambiguation Normalization Trained Models (CRFs,SVMs) Gazette er Personal Informati on Normalization

Disambiguation 21 Input Document OUTPUT Finding Target Locations Location NER Relation-type Classification OpenAPI Map Services Disambiguation Normalization Trained Models (CRFs,SVMs) Gazetteer Personal Information 동강밀레니엄래프팅 밀레니엄 대한민국 강원도 영월군 영월읍 거운리 밀레니엄피시방 서현점 밀레니엄 대한민국 경기도 성남시 분당구 서현동 307 밀레니엄모텔 밀레니엄 대한민국 광주광역시 북구 오룡동 서울힐튼호텔 밀레니엄 힐튼 서울 대한민국 서울특별시 중구 남대문로 5 가 395 Disambiguation -Number of Matched characters query-title, query-original query, query-address -(Can be used ) Semantic Type / Personal Annotation DB / Distance between locationLandmark -Personal Address book/Search history/GPS log 서울힐튼호텔 : 대한민국 서울특별시 중구 남대문로 5 가 395 ( , ) (Hotel) Title | Query | Address 밀레니엄 힐튼 서울 Original Query