Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Tao-Hsing Chang Chia-Hoang Lee 國立雲林科技大學 National Yunlin University.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A 24-h forecast of solar irradiance using artificial neural.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Web-Page Summarization Using Clickthrough Data Advisor.
1 94 學年度碩士班新生座談 擬定 修正. 2 李之中 Chi-Chung Lee Assistant professor Department of Information Management, Chung Hwa University Office.
台灣的新移民從何而來? 授課教師:林津如 第六堂課 96 年度教育部補助推動新移民之原生社會文化、公民與人權及健康醫療教學發展計畫 高雄醫學大學性別研究所林津如教授及陳麒文執行.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text classification based on multi-word with support vector.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Probabilistic Model for Definitional Question Answering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Empirical Study of a 3D Visualization for Information.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Javier Contreras Rosario Espinola Francisco J. Nogales Antonio J.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Web usage mining: extracting unexpected periods from web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction Presenter : Jiang-Shan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using term informativeness for named entity detection.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A quantitative stock prediction system based on financial news Presenter : Chun-Jung Shih Authors :Robert.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning Phonetic Similarity for Matching Named Entity.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Plagiarism Detection Technique for Java Program Using.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using the Web for Automated Translation Extraction in.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: YU-SHENG.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Extreme Visualization: Squeezing a Billion Records into a Million Pixels Presenter : Jiang-Shan Wang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Translation of Web Queries Using Anchor Text Mining Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study of Learning a Merge Model for Multilingual Information.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Predicting corporate bankruptcy using a self-organizing map: An empirical study to improve the forecasting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Concept Frequency Distribution in Biomedical Text Summarization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text Classification, Business Intelligence, and Interactivity:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Prediction model building and feature selection with support.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Electricity Based External Similarity of Categorical Attributes.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 f-information measures in medical image registration Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Investigating the Effect of Sampling Methods for Imbalanced.
Presentation transcript:

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Tao-Hsing Chang Chia-Hoang Lee 國立雲林科技大學 National Yunlin University of Science and Technology Automatic Chinese unknown word extraction using small-corpus-based method Natural Language Processing and Knowledge Engineering, Proceedings International Conference on, IEEE

Intelligent Database Systems Lab Outline Motivation Objective Introduction Extracting possible unknown words SPLR Modification Prefixed/suffixed, Compound word selection Experiment Conclusion Opinion N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M. Motivation any Chinese character can either represent a word or be a part of other words no blank between Chinese words for identifying the boundaries some drawbacks- Statistics and Rules Based “ 拍打皮卡丘 ” “ 觀光協會 ” 、 ” 神奇寶貝 ”

Intelligent Database Systems Lab Objective Extract Chinese unknown words efficiency accuracy words occur rarely small size of document for training N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 1-1.Introduction unknown words which don’t exist in dictionary or vocabulary Identifying the boundaries “ 拍打皮卡丘 ” “ 資料探勘非常有意思 ” Semantic ambiguity “ 觀光協會 ”,” 神奇寶貝 ” N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 1-2.Introduction Restrict scope for Particular types of the unknown words ‘Prefixes/suffixes’ identify proper name Hybrid method to estimate the probability Identifying general unknown words difficultly “ 熱鬧非凡 ” 、 ” 回味無窮 ” 、 ” 神奇寶貝 ” “ 發生什麼 ” 、 ” 老師問問題 ” N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 1-3.Introduction Statistics-based methods Small documents cause low accuracy Develop a method Advantage of the efficiency of statistics-based Accuracy of identify when small size of document N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 2.Previous Works The proper name can’t be identified (compound word) “ 中國國際商業銀行 ” “ 中國 ” , ” 國際 ” , ” 商業 ” , ” 銀行 ” Statistics-based method occur frequency PLU-based likelihood ration (PLR) Not only efficient but also fast Occur rarely can’t be extracted N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-1.Extracting Possible Unknown Words Preprocessing Retrieving possible character sequences Maximum length of character sequences is limited Eliminate stop words from character sequences The frequently occurring character sequences are then regarded as possible unknown words. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-2.Extracting Possible Unknown Words sequence occur follows the subsequence, the sequence should not be unknown words “ 去福利社 ” occur follow “ 福利社 ”, so “ 去福利社 ” isn’t a possible unknown word N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-3.Extracting Possible Unknown Words Defined: N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-4.Extracting Possible Unknown Words “ 去福利社 ” 200 times “ 福利社 ” 1000 times SPLR(tp)= = N.Y.U.S.T. I.M. Tolerate error coefficients

Intelligent Database Systems Lab 4.Modification 1.one-charactered prefix( 前綴 ) or suffix( 字尾 ) “ 導師室 ” “ 導師 ” results in low SPLR of “ 導師室 ” 2.Familiar sequences “ 從教室裡衝出來 ” isn’t an unknown word but would be identified by simple SPLR method N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Prefixed/Suffixed Word Revising Some words which contain the prefixed or suffixes have been collected by dictionaries which are available. For example, an unknown word : “ 總領隊 ” includes the prefix, “ocw + mcw” “ 導師室 ” includes the suffix, “mcw + ocw” N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Prefixed/Suffixed Word Revising The one-charactered prefixes/suffixes can be extracted in advance from available dictionaries. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Compound Word Selection Familiar sequence in the document: includes one or more common words while the compound words consists of particular words “ 從教室裡衝出來 ” consists of the common words “ 教室 ” and “ 出來 ” “ 文具用品 ” 100 times “ 文具 ” 100 times “ 用品 ” 100 times N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Compound Word Selection ts is the word included by tp and not a one-charactered word is the threshold A sequences consist of the common words, should not be possible unknown words N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Compound Word Selection Familiar sequences and compound words can be differentiated efficiently “ 神奇寶具 ” 200 times “ 神奇 ” 230 times “ 寶貝 ” 250 times “ 發生什麼 ” 200 times “ 發生 ” 2000 times “ 什麼 ” 4000 times N.Y.U.S.T. I.M. 200/ /2000

Intelligent Database Systems Lab 5.Experimtents Data set : 1,285 students essays Theme: “Recess at School” Characters: 470,665 N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 5-1.Experimtents-SPLR N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 5-2.Experimtents-Familiar N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 5-3.Experimtents-prefixed/suffixed Prefixed or suffixed pattern in CKIP lexicon ( 中央研究院資訊科學研究所 - 中文知識庫小組 ) N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 6.Conclusion efficiency accuracy words occur rarely small set of training corpus N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Opinion Information Retrieval unknown Word compound word Semantic web N.Y.U.S.T. I.M.