Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Web Data Mining and Applications Part I
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Urbana Champaign
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Rui Yan, Yan Zhang Peking University
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
revised CmpE 583 Fall 2006Discussion: OWL- 1 CmpE 583- Web Semantics: Theory and Practice DISCUSSION: OWL Atilla ELÇİ Computer Engineering.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Text mining.
Custom driven scientific information extraction from digital libraries using integrated text mining services Betim Çiço, Adrian Besimi, Visar Shehu 14th.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Survey of Semantic Annotation Platforms
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
HANU IT HONOR PROGRAM Nguyen Xuan Hoai, PhD
Web Mining: Phrase-based Document Indexing and Document Clustering Khaled Hammouda, Ph.D. Candidate Mohamed Kamel, Supervisor, PI PAMI Research Group University.
Text Mining: Fast Phrase-based Text Indexing and Matching Khaled Hammouda, Ph.D. Student PAMI Research Group University of Waterloo Waterloo, Ontario,
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
Amy Dai Machine learning techniques for detecting topics in research papers.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Natural language processing tools Lê Đức Trọng 1.
Topic Modeling using Latent Dirichlet Allocation
ICT-enabled Agricultural Science for Development Scenarios, Opportunities, Issues by ICTs transforming agricultural science, research & technology generation.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Automatic Labeling of Multinomial Topic Models
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Introduction to Machine Learning August, 2014 Vũ Việt Vũ Computer Engineering Division, Electronics Faculty Thai Nguyen University of Technology.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Automated Question Answering Suggestion Using User Expert and Semantic Information การแนะนำการตอบคำถามอัตโนมัติ โดยใช้ข้อมูลผู้เชี่ยวชาญ และข้อมูลเชิง.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
GRAPH BASED MULTI-DOCUMENT SUMMARIZATION Canan BATUR
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Clustering of Web pages
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Mining and Analyzing Data from Open Source Software Repository
Restrict Range of Data Collection for Topic Trend Detection
Information Retrieval
CSE 635 Multimedia Information Retrieval
Topic: Semantic Text Mining
Presentation transcript:

Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory of Data Mining & Knowledge Science

Content I.Vietnamese multi-document summarization 1.Vietnamese VNSEN search engine 2.Clustering 3.Semantic similarity 4.Multi-document summarization II.Semantic relation extraction 1.Vietnamese medical ontology 2.Object relation extraction 3.Cause-and-effect relations 4.Vietnamese entity search engine 9/4/20152 Laboratory of Data Mining & Knowledge Science

Vietnamese VNSEN search engine – Based on NUTCH – Integrated Vietnamese word segmentation tool JvnSegmenter – Indexed pages from vi.wikipedia.org 9/4/20153 Vietnamese multi-document summarization Laboratory of Data Mining & Knowledge Science

Vietnamese multi-document summarization Clustering – Integrated clustering to VNSEN search engine Using snippet results from VNSEN search engine Hierarchical Agglomerative Clustering (HAC) algorithm – Estimation with Clustering on Vivisimo search engine Cluster labeling Compactness of clusters Isolation of clusters 9/4/20154 Laboratory of Data Mining & Knowledge Science

Implementation of semantic similarity measures – Semantic similarity between words based on Semantic Network Path length (PL) Information content (IC) – Semantic similarity between sentences based on topic analysis – Word order similarity between sentences 9/4/20155 Vietnamese multi-document summarization Laboratory of Data Mining & Knowledge Science

Vietnamese multi-document summarization Building Vietnamese semantic corpus – Hidden topic corpus Using Latent Dirichlet Allocation (LDA) model Using JgibbsLDA tool to analyze topic – Vietnamese Wikipedia corpus Using category graph model Result 120/150/200 hidden topics corpus based on Vnexpress/Wikipedia data set Category graph with category nodes and articles 9/4/20156 Laboratory of Data Mining & Knowledge Science

Vietnamese multi-document summarization Multi-document summarization – Maximal Marginal Relevance (MMR) method Improving with Semantic Similarity Measures based on Hidden topic analysis 9/4/20157 List of sentences List of documents Label Pre-processing Sentences weights S 1 …. … …. S k …. Hidden topic Cosine measure Documents Weights D 1 … …. … D k … Cluster Summary document Laboratory of Data Mining & Knowledge Science

Vietnamese multi-document summarization Multi-document summarization for simple Vietnamese Medical Q&A system – Semantic Similarity Measures based on Vietnamese Wikipedia corpus – Medical Ontology – Hidden topic analysis – Clustering 9/4/20158 Laboratory of Data Mining & Knowledge Science

Vietnamese multi-document summarization 9/4/20159 Laboratory of Data Mining & Knowledge Science

Table-of-Contents generation – Using some solutions of Text Segmentation and Title Generation for automatically generating a Table-of-Contents. 9/4/ Vietnamese multi-document summarization Laboratory of Data Mining & Knowledge Science

Vietnamese multi-document summarization Some our Vietnamese language processing utilities – Nguyen Cam Tu, Phan Xuan Hieu. JvnSegmenter. A Java- based Vietnamese Word Segmentation – Nguyen Cam Tu. JVnTextpro: A Java-based Vietnamese Text Processing Toolkit – Nguyen Cam Tu. JGibbsLDA: A Java and Gibbs Sampling based Implementation of Latent Dirichlet Allocation (LDA) – VNSEN Search Engine (Implementers: Nguyen Thu Trang, Nguyen Cam Tu, Nguyen Viet Cuong, Tran Mai Vu, Nguyen Minh Tuan etc.) 9/4/ Laboratory of Data Mining & Knowledge Science

Semantic Relation Extraction Vietnamese Medical Ontology – 23 classes entity – 14 relations – 200 entities Technique to improve ontology – Named Entity Recognition – Relation extraction – … 9/4/ Laboratory of Data Mining & Knowledge Science

Semantic Relation Extraction 9/4/ Laboratory of Data Mining & Knowledge Science

Semantic Relation Extraction Object relation extraction – Product domain – Medical domain Technique – Using Wrapper technique for structured data (HTML/XML/Table) – NLP for unstructured data (Text) HMM Model CRF Model … 9/4/ Laboratory of Data Mining & Knowledge Science

Semantic Relation Extraction Cause-and-effect relations Using the researching result by Corina Roxana Girju to investigated some cause-and-effect relations such as : Adverbial causal link Preposition causal link Subordination causal link Clause integrated link [Rox08] Corina Roxana Girju (2008). Semantic Relation Extraction and its Applications, Invited tutorial at the European Summer School in Logic, Language and Information (ESSLLI 2008), Hamburg, Germany, August /4/ Laboratory of Data Mining & Knowledge Science

Semantic Relation Extraction Vietnamese entity search engine on the field of Medical Healthy Care – Using Medical Ontology, Object relation extraction, Cause-and-effect relation extraction… – Associating UIUC-DB&IS Lab (University of Illinois at Urbana-Champaign) Object Search Query Log Mining Object Extraction [Cha08] Kevin C. Chang (2008). Data-Aware Search on the Web, Act. 2: Entity Search, Technical Report, University of Illinois at Urbana-Charmpaign (a talking at College of Technology, Vietnam National University, Hanoi, July 08, 2008). 9/4/ Laboratory of Data Mining & Knowledge Science

Some articles in 2008 [LNH08] Dieu-Thu Le, Cam-Tu Nguyen, Quang-Thuy Ha, Xuan-Hieu Phan, and Susumu Horiguchi (2008). Matching and Ranking with Hidden Topics towards Online Contextual Advertising, The 2008 IEEE/WIC/ACM International Conference on Web Intelligence (WI-08), University of Technology, Sydney, Australia, December , 2008 (accepted)WI-08University of Technology, Sydney, Australia [PNL08] Xuan-Hieu Phan, Cam-Tu Nguyen, Dieu-Thu Le, Le-Minh Nguyen, Susumu Horiguchi, and Quang-Thuy Ha (2008). Classification and Contextual Match on the Web with Hidden Topics from Large Data Collections, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (Submitted) [VUH08] Tran Mai Vu, Pham Thi Thu Uyen, Hoang Minh Hien, Ha Quang Thuy (2008). Semantic Similarity of sentences and application for multi-document summarization to evalute on clustering component of Vietnamese search engine, Workshop on Information Communication Technology (ICTFIT08), College of Science, Vietnam National University, Ho Chi Minh City, November 14, 2008 (in Vietnamese, accepted). 9/4/ Laboratory of Data Mining & Knowledge Science

THANK YOU 9/4/ Laboratory of Data Mining & Knowledge Science